Hierarchical Reinforcement Learning for AUVs

Different types of mobile sensor platforms such as Autonomous Underwater Vehicles (AUVs) are used for a range of monitoring activities. The hardware involved is still quite costly to acquire and deploy, so various ways to make them more cost-effective is an active area of research. One way to increase the usefulness of autonomous robots, is enabling them to perform more complex tasks, or multiple tasks. In this project, you will explore Hierarchical Reinforcement Learning (HRL) with the aim of training a simulated AUV to detect and map plumes of dissolved gas.

The need for autonomous, robotic solutions is increasing along with a larger focus on sustainable ocean management and the emerging of marine industries such as offshore CO2 storage and deep-sea mining. AUVs have excellent marine monitoring capabilities but have an unfulfilled potential in their ability to use sensor input in real-time for autonomous and optimized mapping of emissions. This master project will explore HRL to train an AUV for environmental surveys.

HRL extends traditional RL with the concept of options, where different options contain different realizations of the action space and/or the state space. Switching between options during training can be seen as attacking a problem in a new way with different tools or with a different perspective, and HRL has the potential to help agents function in complex or time-variant environments. For environmental monitoring of gas seepage, examples of goals include locating the plume, finding the location inside the plume with the highest concentration of gas, or finding the extent of the plume in the four cardinal directions.

At the beginning of the project, you should familiarize yourself with HRL in a simple grid environment containing a body of water and a dissolved gas plume. The possible actions the agent can take will simply be to move around in the water, but you can experiment with different moving patterns e.g. moving one point in the grid at the time or multiple, or define turn movements as a single action.

The next step is to continue training and testing in more realistic environments. Norwegian Geotechnical Institute (NGI) and partners is currently running the project ‘Smart AUVs for detection and quantification of greenhouse gas seepage in the oceans’ (SmartAUVs). The SmartAUVs project will deliver several oceanographic simulations of gas seepage. Each simulation will constitute a database where for each point in space and time, the concentration of dissolved gas or presence of gas bubbles can be queried and act as a simulated sensor input to an HRL algorithm.

Publisert 6. okt. 2023 07:57 - Sist endret 6. okt. 2023 07:57

Veileder(e)

Ivar-Kristian Waarum Universitetet i Oslo
Kai Olav Ellefsen Universitetet i Oslo

Hierarchical Reinforcement Learning for AUVs

Veileder(e)

Omfang (studiepoeng)