3rd Workshop on Embedded Machine Learning - WEML2019/20

Heidelberg University, Feb 13, 2020

Hidden Location

Holger Fröning, ZITI, Heidelberg University, Germany (holger.froening (at) ziti.uni-heidelberg.de)

Franz Pernkopf, Graz University of Technology, Austria (pernkopf (at) tugraz.at)

Manfred Mücke, Materials Center Leoben GmbH, Leoben, Austria (Manfred.Muecke (at) mcl.at)


The workshop series on embedded machine learning (WEML) is jointly organized by Heidelberg University, Graz University of Technology, and Materials Center Leoben, and embraces our joint interest in bringing complex machine learning models and methods to resource-constrained devices like edge devices, embedded devices, and IoT. The workshop is rather informal, without proceedings, and is organized around a set of invited talks on topics associated with this interest.

Topics of interest include in general:

    • Compression of neural networks for inference deployment, including quantization, pruning, knowledge distillation and neural architecture search
    • Trading among prediction quality (accuracy), efficiency of representation (model parameters, data types for arithmetic operations and memory footprint in general), and computational efficiency (complexity of computations)
    • Automatic code generation from high-level descriptions, including linear algebra and stencil codes, targeting existing and future instruction set extensions
    • New ML models and methods
    • Future emerging processors and technologies
    • Applications demanding for deploying ML models on resource-constrained hardware

In this regard, the workshop gears to gather experts from the various domains included, from both academia and industry, and to stimulate discussions on recend advances in this area.


09:15-ish, Workshop opens

09:30 - 09:45, Holger Fröning/Heidelberg University: Workshop Introduction

09:45 - 10:20, Wolfgang Roth/TU Graz: An Overview of Resource-Efficiency in Deep Learning

While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology from a scientific environment with virtually unlimited computing resources into every day's applications. In this talk, I will provide an overview of the current state of the art of deep learning facilitating these real-world requirements. We give an overview of the vast literature that can be mainly split into three non-mutually exclusive categories: (i) quantized neural networks, (ii) network pruning, and (iii) structural efficiency. These techniques can be applied during training or as post-processing, and they are widely used to reduce the computational demands in terms of memory footprint, inference speed, and energy efficiency. More information can be found under https://arxiv.org/abs/2001.03048.

10:20 - 10:55, Günther Schindler/Heidelberg University: Pruning Neural Networks for Specific Hardware

Domain-specific accelerators perform the computations of neural networks orders of magnitude better in terms of performance and energy efficiency than general-purpose CPUs and GPUs. Domain-specific acceleration for neural networks is achieved by mapping tensors of high-dimensional data arrays onto dedicated matrix-computation units. These dedicated units require highly structured computations, which is in contrast to popular neural-network compression techniques such as weight pruning, as they produce highly unstructured computations. In this talk, I will give some insights into the importance of domain-specific accelerators as well as their capabilities and requirements. I will talk about a structured pruning technique that allows an efficient mapping onto these domain-specific units, including some use cases and experimental validation.

10:55 - 11:30, Coffee break (provided to registered attendees)

11:30 - 12:05, Christoph Gratl & Manfred Mücke/Materials Center Leoben: Comparing three Tensorflow Options to Execute SSD MobileNet Object Detection on ARM v8 / Kunbus Rev Pi PLC

Tensorflow models and graphs are by default evaluated by the Tensorflow execution engine. But the the framework offers two other options to execute a graph: The Tensorflow Lite interpreter is aimed at inference on mobile and embedded devices. XLA AOT is a compiler that transforms the graph / model into standalone executable code and uses LLVM as a backend. In my talk, I will compare the three options with regard to the execution of a popular object detection model (SSD-MobileNet) on an industrial PLC with ARMv8 CPU. Apart from execution time and memory footprint, I will also detail the setup and workflow of the three variants.

12:05 - 12:40, Dennis Rieber/Robert Bosch GmbH Renningen: An Overview of ML Optimization Tools and their Underlying Concepts

To fully utilize the capabilities of a hardware accelerator, careful tuning of every application is necessary. This task is tedious and it can take a long time, even for human experts, to come close to the maximum throughput of a given hardware target. A popular alternative is the use of so called "auto tuners", which explore a space of possible schedules by themselves in order to find the best solution for an individual workload. The target metric is defined by the needs of the application, but it usually comes down to total execution time or throughput. One of the main challenges is to reduce the number of samples one has to actually execute on hardware in order to find the best schedule. Different strategies are employed for this purpose. In this talk we present an overview of concepts used in state-of-the-art research. We classify the concepts by (1) how a schedule is evaluated (data vs. model driven) and (2) the scope of the evaluation mechanism (hardware vs. application oriented). Data driven approaches use measurements on the target hardware to evaluate a schedule. Model driven approaches can evaluate schedules without execution. They are either developed by an expert or learnt form measurements. Hardware oriented approaches employ one model of the hardware that can be utilized for evaluation of multiple workloads, whereas application oriented approaches are based on a specific model for every workload on a specific hardware.

12:40 - 13:50, Lunch (provided to registered attendees)

13:50 - 14:25, Johannes Schemmel & Yannick Stradmann/Heidelberg University: Deep Learning with Analog Neuromorphic Hardware

Albeit their rising success and wide usage throughout industry, Deep Neural Networks are not yet widely applied in embedded edge computing devices. To bridge the gap between powerful computing centers and energy efficient embedded devices, a common strategy is the utilization of modern process nodes to implement efficient digital edge accelerators. Within this talk, BrainScaleS-2 will be presented as an alternative approach: an analog neural network accelerator manufactured in an affordable 65nm CMOS process. This hybrid neuromorphic system embeds a powerful acceleration engine for brain-inspired spiking neural networks, which can additionally be used as an analog multiply-accumulate unit for the computation of classical machine learning tasks. It therefore allows co-designed artificial and spiking neural network implementations to run on a single microchip. This talk will introduce BrainScaleS-2 as a highly scalable, power efficient neural network accelerator and present experimental results from an initial prototype system.

14:25 - 15:00, Nicholas Fraser/XILINX Research: Brevitas: Quantization-aware Training in PyTorch

In this talk we introduce Brevitas, a PyTorch-based quantization-aware training library for deep neural networks. Brevitas is part of the FINN ecosystem and will be the frontend for the upcoming FINN release. Brevitas allows machine learning engineers to accurately model their inference datapath during neural network training, allowing architecture exploration to occur during the training process while ensuring high accuracy is attainable. Furthermore, Brevitas defines gradients for all quantization parameters (scale factors, bitwidths) allowing the quantization format itself to also be learned using standard backpropagation. Finally, we share some latest quantization results of what accuracies can be achieved including details of Xilinx Research Labs' recent submission to the Neurips MicroNet challenge.

15:00 - 15:35, Mathias Niepert/NEC Laboratories Europe: An Intro To Neural Networks for Graph-Structured Data

Graph-structured data is ubiquitous and occurs in several application domains. The talk will provide a brief introduction to recent developments in representation learning for graphs. Moreover, the talk will touch on some of the lab’s research at the intersection of ML and systems.

15:35 - 16:10, Coffee break (provided to registered attendees)

16:10 - 16:45, Martin Trapp/TU Graz: Learning Sum-Product Networks

In several real-world scenarios, decision making involves advanced reasoning under uncertainty, i.e., the ability to answer complex probabilistic queries. However, answering probabilistic queries exactly often becomes intractable in complex probabilistic models. Tractable probabilistic models, such as Sum-Product Networks (SPNs), are a recent and quickly growing field, promising a remedy for this problem. In contrast to traditional complex probabilistic models, SPNs are able to represent highly complex variable dependencies, while at the same time guaranteeing that many inference scenarios have costs linear in their representation size. In this talk, I’ll give an introduction to SPNs, discuss parameter learning of such models and introduce principled structure learning of SPNs. Lastly, I’ll highlight a few recent success stories of SPNs ranging from computer vision to time-series forecasting.

16:45 - 17:20, Alex Fuchs/TU Graz: Opportunities and Obstacles in Capsule Networks

Nature has inspired human ingenuity for many centuries. Capsule Networks (CapsNets) are a good example for this and show how a biological concept can be mimiced and implemented for machine learning applications. CapsNets take the concept of the cortical microcolumns, which are part of the human visual cortex, and try to leverage this concept to create a more robust and interpretable approach for computer vision. Cortical microcolums are small collections of neurons specialized to detect specific visual features. Similar, CapsNets use specialized vector representations, called capsules, to represent objects present in the input. The most present feature in the input image is then dynamically routed to the output and used for predicting the class. CapsNets have been shown to be less sensitve to changes in pose and orientation of objects than other deep neural network approaches. This makes them a good alternative for task with superimposed sources of information. However, CapsNets also face difficulties with respect to their applicabilty to large and complex datasets. This issues are rooted in the necessity of an apropriate routing algorithm and the kind of vector representations used. We propose solutions to these issues by using a routing algorithm with a Wasserstein objective and redesigning the vector representations, solving the most pressing issues in CapsNets.

17:20 - 17:35, Co-Organizers: Concluding remarks and future directions

Post-Workshop Summary

The 3rd Workshop on Embedded Machine Learning (WEML) took place at Heidelberg University on 13 February 2020, attracting about 60 attendees. This workshop series is jointly organized by Heidelberg University (Holger Fröning), Graz University of Technology (Franz Pernkopf) and Materials Center Leoben (Manfred Mücke), and embraces joint interest in bridging the gap between complex machine learning models and methods to resource-constrained devices like edge devices, embedded devices, and the Internet of Things (IoT). The workshop focuses on invited presentations, with ample time for discussions and other interactions.

This time, the program included speakers from Robert Bosch GmbH, XILINX Research, NEC Laboratories Europe, Graz University of Technology, Materials Center Leoben, and Heidelberg University:

  • The workshop started with a focus on tooling, including an overview about resource-efficiency in deep learning, with methods such as pruning, quantization and others, followed by a deep dive into network pruning for specific hardware.
  • Further contributions included an update on code generation for embedded targets, and an overview of machine learning optimization tools for specialized architectures. The following part focused on hardware, with presentations on using neuromorphic hardware for deep learning, and quantization-aware training for field-programmable gate arrays (FPGAs).
  • The last part was dedicated to “Beyond-CNN” models, including Graph-based Neural Networks, Sum-Product Networks, and Capsule Networks.

The attendees leveraged the workshop’s philosophy on interactions, and in various discussions a couple of trends were observed. Particularly, the community agrees on an increasing gap between ML application and hardware capability, with convolutional neural networks as a best-case scenario, as “Beyond-CNN” models will substantially push requirements in terms of structure and computational intensity. In this regard, it is also no surprise that ML and its infrastructure is trending, even though the mileage with existing tooling might vary dramatically.