Efficient and practical neural architecture search
Abstract: The success of deep learning in tasks such as image recognition, speech recognition, machine translation or reasoning in games is largely due to the automation of the feature engineering process: hierarchical feature extractors are learned in an end-to-end fashion from data rather than manually designed. This success has been accompanied, however, by a rising demand for neural network architecture engineering, where increasingly more complex neural architectures are designed manually. This is a time-consuming and error-prone process requiring human experts. Neural architecture search (NAS), the process of automating architecture engineering, is thus a logical next step in automated machine learning. In this thesis, we address several current challenges in the area of NAS. As a result of the rising demand for neural network architecture engineering, NAS has become more and more popular. This led to a substantial growth in publications, making it hard to keep track, compare, and assess existing work. For this reason, we provide the first modern survey on NAS, aiming at making this field of research more easily accessible for other researchers. We discuss current problems and limitations to guide future research. We then study NAS with evolutionary algorithms. Prior work on NAS often required vast computational resources, in the order of hundreds or even thousands of GPU days, as thousands of neural architectures needed to be trained from scratch. These extreme costs made an application of NAS impracticable for most use cases. To overcome this problem, we propose to employ network morphisms, which are function-preserving operations, as mutation operations in the context of the evolutionary search. By doing so, we avoid costly training from scratch. We extend this approach by considering multiple objectives in the architecture search process, allowing for not only searching for high-performing architectures but also considering other objectives, such as small model size, low memory consumption or error resilience, which are of high practical relevance. We furthermore study gradient-based architecture search methods, which have gained popularity due to their simplicity and small search costs. However, many researchers reported that such methods do not work robustly for new problems. We study the failure modes and find that degenerate performance is related to sharp minima in the space of neural architectures. Based on these findings we propose more robust alternatives. Lastly, we consider NAS in combination with meta-learning, or learning to learn. We show that gradient-based meta-learning and gradient-based NAS can naturally be combined, allowing to meta-learn not only the weights of a given, fixed architecture but to also meta-learn an architecture. The meta-learned architecture can then be adapted to a novel task with just a few iterations of NAS
- Standort
-
Deutsche Nationalbibliothek Frankfurt am Main
- Umfang
-
Online-Ressource
- Sprache
-
Englisch
- Anmerkungen
-
Universität Freiburg, Dissertation, 2021
- Schlagwort
-
Architecture
Deep Learning
Maschinelles Lernen
Neuronales Netz
- Ereignis
-
Veröffentlichung
- (wo)
-
Freiburg
- (wer)
-
Universität
- (wann)
-
2021
- Urheber
- Beteiligte Personen und Organisationen
- DOI
-
10.6094/UNIFR/218389
- URN
-
urn:nbn:de:bsz:25-freidok-2183897
- Rechteinformation
-
Kein Open Access; Der Zugriff auf das Objekt ist unbeschränkt möglich.
- Letzte Aktualisierung
-
14.08.2025, 10:57 MESZ
Datenpartner
Deutsche Nationalbibliothek. Bei Fragen zum Objekt wenden Sie sich bitte an den Datenpartner.
Beteiligte
Entstanden
- 2021