The reinforcement learning team works on both fundamental and applied AI research, with a particular focus on reinforcement learning. Reinforcement learning is the study of sequential decision making under uncertainty, and hence encompasses many problems, from robotic manipulation to coordinating multiple goal-based agents. Our aim is to improve our understanding of biological and artificial intelligence, and use these insights to develop AI systems that can be applied to complex tasks in the real world.


Kai Arulkumaran
Research Team Leader

Kai is a Research Team Leader at Araya and Visiting Researcher at Imperial College London. He received his B.A. in Computer Science at the University of Cambridge in 2012 and his Ph.D. in Bioengineering at Imperial College London in 2020. He has previously worked at DeepMind, Microsoft Research, Facebook AI Research, Twitter and NNAISENSE. His research interests are deep learning, reinforcement learning, evolutionary computation and theoretical neuroscience.

Google Scholar

Manuel Baltieri

Manuel is a Researcher at Araya and a Visiting Researcher at the University of Sussex. After graduating with a B.Eng. in Computer Engineering and Business Administration at the University of Trento, he received an M.Sc. in Evolutionary and Adaptive Systems and a Ph.D. in Computer Science and AI, both from the University of Sussex. Following that, he was awarded a JSPS/Royal Society postdoctoral fellowship, and worked in the Lab for Neural Computation and Adaptation at RIKEN CBS with Taro Toyoizumi, until he joined Araya at the end of 2021. His research interests include artificial intelligence and artificial life, theories of agency and individuality, origins of life, embodied cognition and decision making.

Google Scholar

Dan Ogawa Lillrank
Junior Researcher

Dan is a Junior Researcher at Araya. He received his B.A. in Engineering Physics at Royal Institute of Technology Sweden in 2015, and his M.S. in Robotics at 2018. He has previously worked at Telexistence Inc, and as a Researcher at AIST. His research interests are mainly in robotics learning, the intersection of machine learning & robotics.

Roberto Gallotta
Junior Researcher

Roberto is a Junior Researcher in Araya. He obtained his B.S. in Computer and Electronic Engineering at Università di Pavia and his M.S. in Artificial Intelligence and Robotics at La Sapienza University of Rome. His topics of interest are artificial life, evolutionary strategies and open-endedness.

Shogo Akiyama

Shogo is an Engineer at Araya. He received his B.S. in Computer Science in 2019. He has previously worked as an AI and web application engineer. His research interests are reinforcement learning and natural language processing.



MineRL Competitions

Many advances in reinforcement learning have been tied to board or video games, where success can easily be measured. Unfortunately, success is hard to quantify in many tasks in the real world, like whether you can drive a car safely and efficiently to a given location. One solution is to perform imitation learning on data collected from other agents, including humans. The Diamond and Basalt MineRL competitions provide a dataset of humans performing various tasks in the Minecraft video game, from mining diamonds to building houses, and challenges participants to create AI agents that can solve these tasks too. Minecraft is a third-person, partially-observed environment, with a complex action space, and so many algorithms that work on simpler environments may not work on this domain. We are investigating robust imitation learning methods that can solve these tasks, and currently Araya has qualified for Round 2 of the Diamond competition.

Benchmarking Imitation Learning

Researchers have developed many imitation learning method over the last few years, each claiming state-of-the-art performance on various tasks. However, results can be easily influenced by factors outside of the claimed contributions of a method, such as data preprocessing, extensive hyperparameter tuning, or simply more computation. In A Pragmatic Look at Deep Imitation Learning, we broke down the contributions of many methods, and developed a unified framework for imitation learning methods, allowing us to perform a fair comparison between algorithms. We have also released an open source library for other researchers to research or apply imitation learning methods.

Diversity-based Trajectory and Goal Selection

Many tasks have a sparse reward function, i.e., the agent is only rewarded upon completing the task. This makes exploration and credit assignment difficult, as the agent does not have much signal to use. For certain tasks, a method known as Hindsight Experience Replay (HER) ameliorates this issue, by pretending that the goals that the agent achieved were actually the desired goals, and training on this rewritten data. In Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay, accepted at PRICAI, we achieved state-of-the-art performance on sparse reward benchmarks by using determinantal point processes to improve the diversity of the data selected for training in HER. Our code has also been open sourced.