Pruning and quantization for efficient deep neural networks

to related objects

Abstract: In recent years, deep neural networks have proven to outperform classical methods on several machine learning tasks. Such deep networks make predictions based on pattern matching and receive training based on experience. By leveraging large amounts of data, they are capable of learning hierarchical representations of raw input data and thus combine both feature learning and classification. However, a high initial model capacity as well as floating-point operations are required to successfully train a deep neural network from scratch. As a result, trained models are usually over-parameterized after training and require powerful processing units.

In contrast, both mobile and embedded devices have finite resources regarding their memory, energy, and computational capacity. This severely limits the complexity of neural networks. Nevertheless, to make them available for use on devices with limited capacity, reduction methods are employed to reduce the complexity of trained models. On the one hand, quantization methods reduce the bit sizes of operands and operations, which immediately decreases the memory requirements. Furthermore, fixed-point quantization methods additionally reduce the computational and energy requirements on dedicated hardware. On the other hand, pruning reduces the number of operands and operations by removing redundant network connections. Furthermore, pruning entire filters and neurons from the network architecture directly reduces the memory, energy, and computational complexity without the need for specialized hardware. A common approach is therefore to first train a large and over-parameterized network and then reduce it using appropriate
reduction methods.

However, there are several problems with previous approaches. First, many of them are either complicated to implement or must be solved outside the standard optimization procedure of deep neural networks. Moreover, neglecting fixed-point constraints is a common problem with quantization approaches. On the other hand, it is usually not possible to directly specify the critical limitations of the target hardware, which makes iterative reduction procedures necessary.

In the underlying thesis, we address these problems by different contributions to both pruning and quantization. Each of our approaches consists of a reduction loss that can be integrated into the common training procedure of deep neural networks with little implementation effort. Minimizing the reduction loss during training reduces the model complexity either by fixed-point quantization, filter pruning, or a combination of both.

At first, we propose a simple and efficient reduction loss to train deep neural networks with multi-modal weight distributions and minimal quantization error. Consequently, the weights can be quantized into fixed-point representations after training with no significant loss in accuracy. Thus, we present an approach that is very easy to implement and yields excellent performance even for small bit sizes. Furthermore, we extend our approach by taking into account both the batch-normalization layers and activation functions. In this way, it is possible to train deep neural networks that can be evaluated without floating-point operations after training.

Next, we propose a novel filter pruning method that is capable of reducing the number of parameters and multiplication of a deep neural network based on a given target size. Therefore, the user is able to define maximum values for both the number of parameters and multiplications according to the memory and computational resources of the target device. During training, the reduction loss calculates the difference between the actual model size and the target size in terms of the number of parameters and required multiplications. Furthermore, the reduction loss is minimized by pruning whole filters and neurons via the channel-wise affine transformation of the batch-normalization layers. In this way, a global selection of filters and neurons can be found that, on the one hand, solves the learning task in the best possible way and, on the other hand, fulfills the constraints of the target device.

Finally, we propose a novel and highly efficient combination of filter pruning and fixed-point quantization. Here, we define complexity as an aggregation of four essential metrics: the memory requirement, the computational complexity resulting from the number of bit operations, the bandwidth resulting from the communication between the processing unit and the memory, and the maximum storage cost of the activations. Based on these four metrics, the reduction loss calculates the difference between the actual model complexity and the resources available on the target device. The reduction loss can be minimized during training by using pruning and quantization layers specially developed by us for this purpose. The trained model is thus highly efficient: it runs without batch-normalization layers, has all parameters and activations in fixed-point representation, and fulfills the complexity metrics of the target device

Location: Deutsche Nationalbibliothek Frankfurt am Main

Extent: Online-Ressource

Language: Englisch

Notes: Universität Freiburg, Dissertation, 2022

Keyword: Pruning
Quantisierung
Festkommarechnung
Deep Learning

Event: Veröffentlichung

(where): Freiburg

(who): Universität

(when): 2022

Creator: Enderich, Lukas

Contributor: Burgard, Wolfram
Yang, Bin
Albert-Ludwigs-Universität Freiburg. Fakultät für Angewandte Wissenschaften

DOI: 10.6094/UNIFR/228535

URN: urn:nbn:de:bsz:25-freidok-2285359

Rights: Open Access; Der Zugriff auf das Objekt ist unbeschränkt möglich.

Last update: 15.08.2025, 7:38 AM CEST

Data provider

This object is provided by:
Deutsche Nationalbibliothek. If you have any questions about the object, please contact the data provider.

Show original at data provider

Associated

Enderich, Lukas
Burgard, Wolfram
Yang, Bin
Albert-Ludwigs-Universität Freiburg. Fakultät für Angewandte Wissenschaften
Universität

Time of origin

2022

Other Objects (12)

Hochschulschrift

Goal-directed forward chaining for logic programs

Goal directed forward chaining : a linear resolution strategy

Application of practice-based learning and improvement in standardized training of general practitioners

Dramatic Response of Pulmonary Sarcomatoid Carcinoma to Nivolumab Combined with Anlotinib: A Case Report

A Case of Delayed Diagnostic Pulmonary Tuberculosis during Targeted Therapy in an EGFR Mutant Non-Small Cell Lung Cancer Patient

Text

Look-ahead proposals for robust grid-based SLAM with rao-blackwellized particle filters

Text

A tutorial on graph-based SLAM

Text

Large scale graph-based SLAM using aerial images as prior information

Konferenzschrift | Kongress

Advances in artificial intelligence : proceedings

Text

Activity-based estimation of human trajectories

Hochschulschrift

Accurate and compact surface models for mobile robots : = Genaue und kompakte Oberflächenmodelle für mobile Roboter

Hochschulschrift

Highly accurate mobile robot navigation

Hochschulschrift

Goal-directed forward chaining for logic programs

Goal directed forward chaining : a linear resolution strategy

Highly accurate mobile robot navigation

Cultural heritage institutions wishing to register will find more information here.

Fields marked * need to be filled in.

Username*

Please enter your username

Email*

Please enter your email address

Please do not fill this field

First name

Last name

Password*

Please enter your password

Confirm password*

Please enter the same password

I have read the terms of use and the privacy policy for the collection of personal data and accept them. *

This field is required.

I would like to subscribe to the newsletter of the Deutsche Digitale Bibliothek. See newsletter subscription info.

Account created

Your "My DDB" account has been successfully created. Before you can log in to your account, you must click the confirmation link in the message we just sent to the email address you provided.

Pruning and quantization for efficient deep neural networks

Object Details

Classification and Topics

Contributors, Places and Time

Further information

Data provider

Associated

Time of origin

Other Objects (12)

Goal-directed forward chaining for logic programs

Goal directed forward chaining : a linear resolution strategy

Application of practice-based learning and improvement in standardized training of general practitioners

Dramatic Response of Pulmonary Sarcomatoid Carcinoma to Nivolumab Combined with Anlotinib: A Case Report

A Case of Delayed Diagnostic Pulmonary Tuberculosis during Targeted Therapy in an EGFR Mutant Non-Small Cell Lung Cancer Patient

Look-ahead proposals for robust grid-based SLAM with rao-blackwellized particle filters

A tutorial on graph-based SLAM

Large scale graph-based SLAM using aerial images as prior information

Advances in artificial intelligence : proceedings

Activity-based estimation of human trajectories

Accurate and compact surface models for mobile robots : = Genaue und kompakte Oberflächenmodelle für mobile Roboter

Highly accurate mobile robot navigation

Goal-directed forward chaining for logic programs

Goal directed forward chaining : a linear resolution strategy

Application of practice-based learning and improvement in standardized training of general practitioners

Dramatic Response of Pulmonary Sarcomatoid Carcinoma to Nivolumab Combined with Anlotinib: A Case Report

A Case of Delayed Diagnostic Pulmonary Tuberculosis during Targeted Therapy in an EGFR Mutant Non-Small Cell Lung Cancer Patient

Look-ahead proposals for robust grid-based SLAM with rao-blackwellized particle filters

A tutorial on graph-based SLAM

Large scale graph-based SLAM using aerial images as prior information

Advances in artificial intelligence : proceedings

Activity-based estimation of human trajectories

Accurate and compact surface models for mobile robots : = Genaue und kompakte Oberflächenmodelle für mobile Roboter

Highly accurate mobile robot navigation

Goal-directed forward chaining for logic programs

Goal directed forward chaining : a linear resolution strategy

Application of practice-based learning and improvement in standardized training of general practitioners

Dramatic Response of Pulmonary Sarcomatoid Carcinoma to Nivolumab Combined with Anlotinib: A Case Report

A Case of Delayed Diagnostic Pulmonary Tuberculosis during Targeted Therapy in an EGFR Mutant Non-Small Cell Lung Cancer Patient

Look-ahead proposals for robust grid-based SLAM with rao-blackwellized particle filters

A tutorial on graph-based SLAM

Large scale graph-based SLAM using aerial images as prior information

Advances in artificial intelligence : proceedings

Activity-based estimation of human trajectories

Accurate and compact surface models for mobile robots : = Genaue und kompakte Oberflächenmodelle für mobile Roboter

Highly accurate mobile robot navigation

Related objects

Reset password