Reinforcement learning with biased models

Reinforcement learning is the problem of learning how to act within an unknown environment solely through interaction. Rather than knowing a priori which states of the environment are desirable, a reinforcement signal is observed, which can be used to evaluate actions previously taken. However, it is never certain what the actual environment model is, or whether any given plan would be optimal. Thus, the environment must be explored continuously in order to obtain knowledge so as to be able to formulate better plans.

The reinforcement learning problem can be formally solved via a Bayesian statistical framework, where a probability distribution is used to express our belief about which is the true environment model. Unfortunately, this necessitates performing planning in the space of all possible beliefs, something which is usually intractable. In addition, there is a set of many possible models to choose from, none of which may be true in reality, and which may have different complexity of simulating.

In this project, the student will develop methodologies for using available data to perform planning in uncertain environments, combining information from models at multiple levels of details and accuracy. The main problem is how much computation time to devote to each model as we get more data. It is to be expected that the more complex models will become more useful the more data we have.

Background: Probability, Bayesian inference, Good Programming Skills

Emneord: reinforcement learning, Bayesian inference

Publisert 25. sep. 2019 16:50 - Sist endret 25. sep. 2019 16:50

Veileder(e)

Christos Dimitrakakis Universitetet i Oslo

Reinforcement learning with biased models

Veileder(e)

Omfang (studiepoeng)