Learning navigation policies with deep reinforcement learning
Abstract: Humans learn that we first need to make efforts, take risks, and put ourselves in difficult positions in order to achieve long-term goals, as every decision we make does not only influence our immediate state but could also have future implications. In this thesis, we focus on studying methods for control problems that involve sequential decision making, in which the actions of intelligent agents would affect the environment they operate in. In particular, we focus on solutions to such problems that require the least amount of human interventions, seeking for general algorithms that could help to automate the process of developing intelligent decision-making agents.
Therefore, we build on the general framework of deep reinforcement learning to learn control policies through interactions with the environment. As navigation is an essential skill for autonomous intelligent systems, this thesis takes learning to navigate as the main running task, setting out to address several challenges that arise when learning optimal policies directly from sensory inputs.
This thesis initiates from asking the question of whether it is feasible to replace the traditional navigation pipeline with an end-to-end deep reinforcement learning system, then further proposes algorithms that facilitate transferring learned navigation policies to related task instances. Then the focus is turned to learn navigation in more exploration-challenging environments, where we interface a canonical agent with an external memory within a completely differentiable neural network. By learning to write to and read from the external memory, the agent is able to make informed decisions in hard navigation tasks. Afterwards, we target transferring deep reinforcement learning policies learned in simulation to the real world. Questioning the canonical sim-to-real approaches, we propose a real-to-sim algorithm as a lightweight and flexible alternative. Additionally, we propose a novel shift loss that is agnostic to the downstream task to impose consistency constraints, successfully adapting single-frame domain adaption approaches to sequential problems. Finally, this thesis puts great interest in learning control policies in terminal reward settings, as this scenario requires the least amount of human priors and would thus largely automate the training of artificial decision-making agents. As structured and guided exploration becomes vital in this case, we again question the mainstream approaches of utilizing intrinsic motivation as reward bonuses, taking a hierarchical view on accelerating exploration. We argue that our proposed approach is a more suitable treatment for intrinsically-motivated exploration, as the behavior policy space is implicitly increased exponentially. Moreover, we propose a novel intrinsic reward that takes a temporally extended view on states, which facilitates exploration even further.
In summary, this thesis investigates several key aspects of learning control policies through deep reinforcement learning, with a focus on navigation tasks. We hope that our proposed methods could offer insights to the learning control community
- Standort
-
Deutsche Nationalbibliothek Frankfurt am Main
- Umfang
-
Online-Ressource
- Sprache
-
Englisch
- Anmerkungen
-
Universität Freiburg, Dissertation, 2021
- Schlagwort
-
Reinforcement learning
Navigation
Learning
Bestärkendes Lernen
Deep Learning
Navigation
- Ereignis
-
Veröffentlichung
- (wo)
-
Freiburg
- (wer)
-
Universität
- (wann)
-
2021
- Urheber
- Beteiligte Personen und Organisationen
- DOI
-
10.6094/UNIFR/218235
- URN
-
urn:nbn:de:bsz:25-freidok-2182353
- Rechteinformation
-
Kein Open Access; Der Zugriff auf das Objekt ist unbeschränkt möglich.
- Letzte Aktualisierung
-
14.08.2025, 10:58 MESZ
Datenpartner
Deutsche Nationalbibliothek. Bei Fragen zum Objekt wenden Sie sich bitte an den Datenpartner.
Beteiligte
Entstanden
- 2021