The DeepChip Project - Deep Learning for Resource-Constrained Systems


DeepChip focusses on deep learning techniques for resource-constrained systems. Many processes require evaluation of complex numerical functions close to the machine or structure of interest, to avoid the effort of data transfer or to enable small reaction times. Although computing performance of embedded platforms is increasing, it is often significantly lower than the requirements of state-of-the-art algorithms. With the advent of Deep Neural Networks (DNN), the achievable classification performance has been pushed to new levels. The high cost of execution, however, renders them unusable to many real-world applications. A possible approach is the use of hybrid processors (ARM+FPGA or similar), but this raises the question on how to auto-generate optimized DNN classifier implementations. In the DeepChip project, we tackle this problem by optimizing deep models in terms of sparsity, asynchrony and reduced precision, and by extending machine learning languages with a hybrid back-end that is responsible for code generation, automated partitioning and integration.

The DeepChip Project is a FWF/DFG co-funded D-A-CH project, run by Graz University of Technology and Ruprecht-Karls University of Heidelberg. While the first partner contributes expertise and experience from the machine learning area and applications, the second partner has a strong background on application-specific computing systems of various scale. Within the DeepChip project, the partners jointly pursue the objective of designing a productive and easy-to-use tool chain to design custom hardware for deep learning purposes, thereby contributing to bringing advanced machine learning techniques and principles to tiny embedded devices like mobile chips, Internet of Things and more.

This site is still at a preliminary stage - stay tuned for updates!

For questions or comments, please contact: Holger Fröning, holger.froening (at)

Workshop mini-series on Embedded Machine Learning (WEML)

We are frequently hosting workshops that gather experts and interested people in machine learning, particulary deep learning, for embedded or other resource-constrained systems. More informations about recent incarnations can be found here:

Software architecture

DeepChip essentially relies on removing redundancy that is typically found in DNNs, under the main constraint that test accuracy does not degrade. Opposed to related work, we combine multiple techniques in a single concept: DeepChip extends TensorFlow with a quantizer that allows to use reduced-precision operations instead of floating-point computations and to increase the sparsity of weight matrices. Additionally, Huffman coding is used to reduce the memory footprint, effectively reducing requirements on memory bandwidth. A special operator library (OP) addresses the specifics of different hardware targets, so that DNNs can efficiently be deployed on various hardware platforms.

Quantization and sparsity can be controlled manually at the current point in time, which allows for various trade-offs among disjunct targets like accuracy, computational requirements and hard constraints like real-time.


The DeepChip project is co-run by the

Current people:

  • Franz Pernkopf, Co-PI, Graz University of Technology, Austria
  • Holger Fröning, Co-PI, Ruprecht-Karls University of Heidelberg, Germany
  • Günther Schindler, Ruprecht-Karls University of Heidelberg, Germany
  • Wolfgang Roth, Graz University of Technology, Austria

Associated partners

  • Manfred Mücke, Materials Center Leoben, Austria

Former people:

  • Matthias Zöhrer, Graz University of Technology, Austria


[ECML2019] Wolfgang Roth, Günther Schindler, Holger Fröning, Franz Pernkopf, Training Discrete-Valued Neural Networks with Sign Activations Using Weight Distributions, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2019), Sept. 16-20, Würzburg, Germany. (acceptance rate: 17.7%, 130/734) (accepted for publication)

[CoRR.cs] Günther Schindler, Wolfgang Roth, Franz Pernkopf, Holger Fröning, Parameterized Structured Pruning for Deep Neural Networks, arXiv:1906.05180 [CoRR.cs], June 2019.

[HiPEAC2019EDLA] Christoph Gratl, Manfred Mücke, Günther Schindler and Holger Fröning, Towards efficient mapping of BNNs onto embedded targets using Tensorflow/XLA, 1st Workshop on Emerging Deep Learning Accelerators (EDLA), co-located with the HiPEAC 2019 Conference, January 21-23, 2019, Valencia, Spain.

[CoRR.cs] Franz Pernkopf, Wolfgang Roth, Matthias Zoehrer, Lukas Pfeifenberger, Günther Schindler, Holger Fröning, Sebastian Tschiatschek, Robert Peharz, Matthew Mattina, Zoubin Ghahramani, Efficient and Robust Machine Learning for Real-World Systems, arXiv:1812.02240 [CoRR.cs], December 2018

[ECML2018] Günther Schindler, Matthias Zöhrer, Franz Pernkopf, and Holger Fröning, Towards Efficient Forward Propagation on Resource-Constrained Systems, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2018), Sept 10-14, Dublin, Ireland. (acceptance rate: 26%, 92/354)

[ICASSP2018] Matthias Zöhrer, Lukas Pfeifenberger, Günther Schindler, Holger Fröning, and Franz Pernkopf, Resource Efficient Deep Eigenvector Beamforming, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 15–20 April 2018, Calgary, Alberta, Canada.

[UCHPC2017] Günther Schindler, Manfred Mücke, Holger Fröning, Linking Application Description with Efficient SIMD Code Generation for Low-Precision Signed-Integer GEMM, 10th Workshop on UnConventional High Performance Computing 2017 (UCHPC 2017), in conjunction with EuroPAR 2017, August 28/29, 2017, Santiago de Compostela, Spain.


Quantization for efficient DL inference

While most related work on quantization results in a reduced test accuracy, in this work we show that quantization on general-purpose processors without any loss in accuracy is possible. Please refer to the ECML2018 paper for a detailed coverage and discussion of the concept. Reproducibility repo can be found at:

Custom precision operators for ARM processors

In this work we explore the possibilities of ARM embedded processors for linear algebra operations of custom precision. We extend the Eigen BLAS library for different quantizations, ranging from one bit to 32 bit. We demonstrate how performance scales with an increasingly low precision. For more information, see [UCHPC2017].

DeepChip Online

DeepChip Online is a web-based experimenting platform to explore time and energy of deep learning applications on embedded processors. We provide a frontend to Theano, and monitor the execution on different embedded platforms including Jetson TK1, Jetson TX1, and Red Pitaya. We extend Theano with monitoring functions that allow to distinguish different phases of the execution, for instance initialization, storage accesses and the actual training and verification of neural networks. DeepChip Online is currently in alpha phase, so contact us if you're interested.


We gratefully acknowledge the sponsoring we receive from the Austrian FWF and German DFG.