The reinforcement learning team works on both fundamental and applied AI research, with a particular focus on reinforcement learning. Reinforcement learning is the study of sequential decision making under uncertainty, and hence encompasses many problems, from robotic manipulation to coordinating multiple goal-based agents. Our aim is to improve our understanding of biological and artificial intelligence, and use these insights to develop AI systems that can be applied to complex tasks in the real world.
Many advances in reinforcement learning have been tied to board or video games, where success can easily be measured. Unfortunately, success is hard to quantify in many tasks in the real world, like whether you can drive a car safely and efficiently to a given location. One solution is to perform imitation learning on data collected from other agents, including humans. The Diamond and Basalt MineRL competitions provide a dataset of humans performing various tasks in the Minecraft video game, from mining diamonds to building houses, and challenges participants to create AI agents that can solve these tasks too. Minecraft is a third-person, partially-observed environment, with a complex action space, and so many algorithms that work on simpler environments may not work on this domain. We are investigating robust imitation learning methods that can solve these tasks, and currently Araya has qualified for Round 2 of the Diamond competition.
Benchmarking Imitation Learning
Researchers have developed many imitation learning method over the last few years, each claiming state-of-the-art performance on various tasks. However, results can be easily influenced by factors outside of the claimed contributions of a method, such as data preprocessing, extensive hyperparameter tuning, or simply more computation. In A Pragmatic Look at Deep Imitation Learning, we broke down the contributions of many methods, and developed a unified framework for imitation learning methods, allowing us to perform a fair comparison between algorithms. We have also released an open source library for other researchers to research or apply imitation learning methods.
Diversity-based Trajectory and Goal Selection
Many tasks have a sparse reward function, i.e., the agent is only rewarded upon completing the task. This makes exploration and credit assignment difficult, as the agent does not have much signal to use. For certain tasks, a method known as Hindsight Experience Replay (HER) ameliorates this issue, by pretending that the goals that the agent achieved were actually the desired goals, and training on this rewritten data. In Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay, accepted at PRICAI, we achieved state-of-the-art performance on sparse reward benchmarks by using determinantal point processes to improve the diversity of the data selected for training in HER. Our code has also been open sourced.