title: Deployment of Deep Neural Networks on Dedicated Hardware Accelerators
creator: Rieber, Dennis Sebastian
subject: ddc-004
subject: 004 Data processing Computer science
description: Deep Neural Networks (DNNs) have established themselves as powerful tools for  a wide range of complex tasks, for example computer vision or natural language  processing. DNNs are notoriously demanding on compute resources and as a  result, dedicated hardware accelerators for all use cases are developed. Different  accelerators provide solutions from hyper scaling cloud environments for the  training of DNNs to inference devices in embedded systems. They implement  intrinsics for complex operations directly in hardware. A common example  are intrinsics for matrix multiplication. However, there exists a gap between  the ecosystems of applications for deep learning practitioners and hardware  accelerators. HowDNNs can efficiently utilize the specialized hardware intrinsics  is still mainly defined by human hardware and software experts.  Methods to automatically utilize hardware intrinsics in DNN operators are a  subject of active research. Existing literature often works with transformationdriven  approaches, which aim to establish a sequence of program rewrites and  data-layout transformations such that the hardware intrinsic can be used to  compute the operator. However, the complexity this of task has not yet been  explored, especially for less frequently used operators like Capsule Routing. And  not only the implementation of DNN operators with intrinsics is challenging,  also their optimization on the target device is difficult. Hardware-in-the-loop  tools are often used for this problem. They use latency measurements of implementations  candidates to find the fastest one. However, specialized accelerators  can have memory and programming limitations, so that not every arithmetically  correct implementation is a valid program for the accelerator. These invalid  implementations can lead to unnecessary long the optimization time.  This work investigates the complexity of transformation-driven processes to  automatically embed hardware intrinsics into DNN operators. It is explored  with a custom, graph-based intermediate representation (IR). While operators  like Fully Connected Layers can be handled with reasonable effort, increasing  operator complexity or advanced data-layout transformation can lead to scaling issues.   Building on these insights, this work proposes a novel method to embed  hardware intrinsics into DNN operators. It is based on a dataflow analysis.  The dataflow embedding method allows the exploration of how intrinsics and  operators match without explicit transformations. From the results it can derive  the data layout and program structure necessary to compute the operator with  the intrinsic. A prototype implementation for a dedicated hardware accelerator  demonstrates state-of-the art performance for a wide range of convolutions, while  being agnostic to the data layout. For some operators in the benchmark, the  presented method can also generate alternative implementation strategies to  improve hardware utilization, resulting in a geo-mean speed-up of ×2.813 while  reducing the memory footprint. Lastly, by curating the initial set of possible  implementations for the hardware-in-the-loop optimization, the median timeto-  solution is reduced by a factor of ×2.40. At the same time, the possibility to  have prolonged searches due a bad initial set of implementations is reduced,  improving the optimization’s robustness by ×2.35.
date: 2023
type: Dissertation
type: info:eu-repo/semantics/doctoralThesis
type: NonPeerReviewed
format: application/pdf
identifier: https://archiv.ub.uni-heidelberg.de/volltextserverhttps://archiv.ub.uni-heidelberg.de/volltextserver/32994/1/dissertationPDFA.pdf
identifier: DOI:10.11588/heidok.00032994
identifier: urn:nbn:de:bsz:16-heidok-329948
identifier:   Rieber, Dennis Sebastian  (2023) Deployment of Deep Neural Networks on Dedicated Hardware Accelerators.  [Dissertation]     
relation: https://archiv.ub.uni-heidelberg.de/volltextserver/32994/
rights: info:eu-repo/semantics/openAccess
rights: http://archiv.ub.uni-heidelberg.de/volltextserver/help/license_urhg.html
language: eng