and currently I am a Postdoctoral researcher at the Data Science and Mining (DaSciM) group, Computer Science Laboratory (LIX), Ecole Polytechnique, Paris, France. I received my PhD from the Department of Computer Science & Engineering of University of Ioannina in Greece. I received my MSc and BSc degrees from the same institution, in 2010 and 2007 respectively. My research interests include Reinforcement Learning, Decision Making under Uncertainty, Machine Learning, Artificial Intelligence and Robotics.
PhD in Computer Science & Engineering
PhD Thesis: Machine Learning for Intelligent Agents
Department of Computer Science & Engineering, University of Ioannina
Master in Computer Science
MSc Thesis: Autonomous Mobile Robot Navigation using Reinforcement Learning
Department of Computer Science, University of Ioannina
Bachelor in Computer Science
Department of Computer Science, University of Ioannina
In this article we introduce AngryBER, an intelligent agent architecture on the Angry Birds domain that employs a Bayesian ensemble inference mechanism to promote decision making abilities. It is based on an efficient tree-like structure for encoding and representing game screenshots, where it exploits its enhanced modeling capabilities. This has the advantage to establish an informative feature space and translate the task of game playing into a regression analysis problem. A Bayesian ensemble regression framework is presented by considering that every combination of objects’ material and bird type has its own regression model. We address the problem of action selection as a multi-armed bandit problem, where the Upper Confidence Bound (UCB) strategy has been used. An efficient online learning procedure has been also developed for training the regression models. We have evaluated the proposed methodology on several game levels, and compared its performance with published results of all agents that participated in the 2013 and 2014 Angry Birds AI competitions. The superiority of the new method is readily deduced by inspecting the reported results.
This dissertation studies the problem of developing intelligent agents, which are able to acquire skills in an autonomous way, simulating human behaviour. An autonomous intelligent agent acts e ectively in an unknown environment, directing its activity to- wards achieving a specific goal based on some performance measure. Through this interaction, a rich amount of information is received, which allows the agent to per- ceive the consequences of its actions, identify important behavioural components, and adapt its behaviour through learning. In this direction, the present dissertation con- cerns the development, implementation and evaluation of machine learning techniques for building intelligent agents. Three important and very challenging tasks are consid- ered: i) approximate reinforcement learning, where the agent’s policy is evaluated and improved through the approximation of the value function, ii) Bayesian reinforcement learning, where the reinforcement learning problem is modeled as a decision-theoretic problem, by placing a prior distribution over Markov Decision Processes (MDPs) that encodes the agent’s belief about the true environment, and iii) Development of intel- ligent agents on games, which constitute a really challenging platform for developing machine learning methodologies, involving a number of issues that should be resolved, such as the appropriate choice of state representation, continuous action spaces, etc..
We propose the use of the Least Absolute Shrinkage and Selection Operator (LASSO) regression method in order to predict the Cumulative Mean Squared Error (CMSE), incurred by the loss of individual slices in video transmission. We extract a number of quality-relevant features from the H.264/AVC video sequences, which are given as input to the LASSO. This method has the benefit of not only keeping a subset of the features that have the strongest effects towards video quality, but also produces accurate CMSE predictions. Particularly, we study the LASSO regression through two different architectures; the Global LASSO (G.LASSO) and Local LASSO (L.LASSO). In G.LASSO, a single regression model is trained for all slice types together, while in L.LASSO, motivated by the fact that the values for some features are closely dependent on the considered slice type, each slice type has its own regression model, in an effort to improve LASSO’s prediction capability. Based on the predicted CMSE values, we group the video slices into four priority classes. Additionally, we consider a video transmission scenario over a noisy channel, where Unequal Error Protection (UEP) is applied to all prioritized slices. The provided results demonstrate the efficiency of LASSO in estimating CMSE with high accuracy, using only a few features.
This paper proposes an online tree-based Bayesian approach for reinforcement learning. For inference, we employ a generalised context tree model. This defines a distribution on multivariate Gaussian piecewise-linear models, which can be updated in closed form. The tree structure itself is constructed using the cover tree method, which remains efficient in high dimensional spaces. We combine the model with Thompson sampling and approximate dynamic programming to obtain effective exploration policies in unknown environments. The flexibility and computational simplicity of the model render it suitable for many reinforcement learning problems in continuous state spaces. We demonstrate this in an experimental comparison with a Gaussian process model, a linear model and simple least squares policy iteration.
Reinforcement learning is one of the most general problems in artificial intelligence. It has been used to model problems in automated experiment design, control, economics, game playing, scheduling and telecommunications. The aim of the reinforcement learning competition is to encourage the development of very general learning agents for arbitrary reinforcement learning problems and to provide a test-bed for the unbiased evaluation of algorithms.
The issues with the use of Approximate Bayesian Computation in Reinforcement Learning is the following. Firstly, that the model set may comprise simulators which are purely deterministic. Secondly, that there is a dependence between the policy used and the data collected, which necessitate maintaining a representation of the policy used as well as the data history. Thirdly, there is the question of the statistics used. Finally, there is the problem selecting a policy given the data observed so far. In this paper, we report some progress on using more sophisticated statistics and policy search algorithms and show that they have significant impact.
An ensemble inference mechanism is proposed on the Angry Birds domain. It is based on an efficient tree structure for encoding and representing game screenshots, where it exploits its enhanced modeling capability. This has the advantage to establish an informative feature space and modify the task of game playing to a regression analysis problem. To this direction, we assume that each type of object material and bird pair has its own Bayesian linear regression model. In this way, a multi-model regression framework is designed that simultaneously calculates the conditional expectations of several objects and makes a target decision through an ensemble of regression models. The learning procedure is performed according to an online estimation strategy for the model parameters. We provide comparative experimental results on several game levels that empirically illustrate the efficiency of the proposed methodology.
Reinforcement Learning (RL) algorithms have been promising methods for designing intelligent agents in games. Although their capability of learning in real time has been already proved, the high dimensionality of state spaces in most game domains can be seen as a significant barrier. This paper studies the popular arcade video game Ms. Pac-Man and outlines an approach to deal with its large dynamical environment. Our motivation is to demonstrate that an abstract but informative state space description plays a key role in the design of efficient RL agents. Thus, we can speed up the learning process without the necessity of Q-function approximation. Several experiments were made using the multiagent MASON platform where we measured the ability of the approach to reach optimum generic policies which enhances its generalization abilities.
We introduce a simple, general framework for likelihood-free Bayesian reinforcement learning, through Approximate Bayesian Computation (ABC). The advantage is that we only require a prior distribution on a class of simulators. This is useful when a probabilistic model of the underlying process is too complex to formulate, but where detailed simulation models are available. ABC-RL allows the use of any Bayesian reinforcement learning technique in this case. It can be seen as an extension of simulation methods to both planning and inference. We experimentally demonstrate the potential of this approach in a comparison with LSPI. Finally, we introduce a theorem showing that ABC is sound.
This paper proposes a simple linear Bayesian approach to reinforcement learning. We show that with an appropriate basis, a Bayesian linear Gaussian model is sufficient for accurately estimating the system dynamics, and in particular when we allow for correlated noise. Policies are estimated by first sampling a transition model from the current posterior, and then performing approximate dynamic programming on the sampled model. This form of approximate Thompson sampling results in good exploration in unknown environments. The approach can also be seen as a Bayesian generalisation of least-squares policy iteration, where the empirical transition matrix is replaced with a sample from the posterior.
In recent years, video delivery over wireless visual sensor networks (VSNs) has gained increasing attention. The lossy compression and channel errors that occur during wireless multimedia transmissions can degrade the quality of the transmitted video sequences. This paper addresses the problem of cross-layer resource allocation among the nodes of a wireless direct-sequence code division multiple access (DS-CDMA) VSN. The optimal group of pictures (GoP) length during the encoding process is also considered, based on the motion level of each video sequence. Three optimization criteria that optimize a different objective function of the video qualities of the nodes are used. The nodes' transmission parameters, i.e., the source coding rates, channel coding rates and power levels can only take discrete values. In order to tackle the resulting optimization problem, a reinforcement learning (RL) strategy that promises efficient exploration and exploitation of the parameters' space is employed. This makes the proposed methodology usable in large or continuous state spaces as well as in an online mode. Experimental results highlight the efficiency of the proposed method.
A significant issue in representing reinforcement learning agents in Markov decision processes is how to design efficient feature spaces in order to estimate optimal policy. This particular study addresses this challenge by proposing a compact framework that employs an on-line clustering approach for constructing appropriate basis functions. Also, it performs a state-action trajectory analysis to gain valuable affinity information among clusters and estimate their transition dynamics. Value function approximation is used for policy evaluation in a least-squares temporal difference framework. The proposed method is evaluated in several simulated and real environments, where we took promising results.
Value function approximation is a critical task in solving Markov decision processes and accurately modeling reinforcement learning agents. A significant issue is how to construct efficient feature spaces from samples collected by the environment in order to obtain an optimal policy. The particular study addresses this challenge by proposing an on-line kernel-based clustering approach for building appropriate basis functions during the learning process. The method uses a kernel function capable of handling pairs of state-action as sequentially generated by the agent. At each time step, the procedure either adds a new cluster, or adjusts the winning cluster’s parameters. By considering the value function as a linear combination of the constructed basis functions, the weights are optimized in a temporal-difference framework in order to minimize the Bellman approximation error. The proposed method is evaluated in numerous known simulated environments.
In this study we present a sparse Bayesian framework for value function approximation. The proposed method is based on the on-line construction of a dictionary of states which are collected during the exploration of the environment by the agent. A linear regression model is established for the observed partial discounted return of such dictionary states, where we employ the Relevance Vector Machine (RVM) and exploit its enhanced modeling capability due to the embedded sparsity properties. In order to speed-up the optimization procedure and allow dealing with large-scale problems, an incremental strategy is adopted. A number of experiments have been conducted on both simulated and real environments, where we took promising results in comparison with another Bayesian approach that uses Gaussian processes.
In this work we present an advanced Bayesian formulation to the task of control learning that employs the Relevance Vector Machines (RVM) generative model for value function evaluation. The key aspect of the proposed method is the design of the discount return as a generative linear model that constitutes a well-known probabilistic approach. This allows to augment the model with advantegeous sparse priors provided by the RVM's regression framework. We have also taken into account the significant issue of selecting the proper parameters of the kernel design matrix. Experiments have shown that our method produces improved performance in both simulated and real test environments.