title: Deployment of Deep Neural Networks on Dedicated Hardware Accelerators creator: Rieber, Dennis Sebastian subject: ddc-004 subject: 004 Data processing Computer science description: Deep Neural Networks (DNNs) have established themselves as powerful tools for a wide range of complex tasks, for example computer vision or natural language processing. DNNs are notoriously demanding on compute resources and as a result, dedicated hardware accelerators for all use cases are developed. Different accelerators provide solutions from hyper scaling cloud environments for the training of DNNs to inference devices in embedded systems. They implement intrinsics for complex operations directly in hardware. A common example are intrinsics for matrix multiplication. However, there exists a gap between the ecosystems of applications for deep learning practitioners and hardware accelerators. HowDNNs can efficiently utilize the specialized hardware intrinsics is still mainly defined by human hardware and software experts. Methods to automatically utilize hardware intrinsics in DNN operators are a subject of active research. Existing literature often works with transformationdriven approaches, which aim to establish a sequence of program rewrites and data-layout transformations such that the hardware intrinsic can be used to compute the operator. However, the complexity this of task has not yet been explored, especially for less frequently used operators like Capsule Routing. And not only the implementation of DNN operators with intrinsics is challenging, also their optimization on the target device is difficult. Hardware-in-the-loop tools are often used for this problem. They use latency measurements of implementations candidates to find the fastest one. However, specialized accelerators can have memory and programming limitations, so that not every arithmetically correct implementation is a valid program for the accelerator. These invalid implementations can lead to unnecessary long the optimization time. This work investigates the complexity of transformation-driven processes to automatically embed hardware intrinsics into DNN operators. It is explored with a custom, graph-based intermediate representation (IR). While operators like Fully Connected Layers can be handled with reasonable effort, increasing operator complexity or advanced data-layout transformation can lead to scaling issues. Building on these insights, this work proposes a novel method to embed hardware intrinsics into DNN operators. It is based on a dataflow analysis. The dataflow embedding method allows the exploration of how intrinsics and operators match without explicit transformations. From the results it can derive the data layout and program structure necessary to compute the operator with the intrinsic. A prototype implementation for a dedicated hardware accelerator demonstrates state-of-the art performance for a wide range of convolutions, while being agnostic to the data layout. For some operators in the benchmark, the presented method can also generate alternative implementation strategies to improve hardware utilization, resulting in a geo-mean speed-up of ×2.813 while reducing the memory footprint. Lastly, by curating the initial set of possible implementations for the hardware-in-the-loop optimization, the median timeto- solution is reduced by a factor of ×2.40. At the same time, the possibility to have prolonged searches due a bad initial set of implementations is reduced, improving the optimization’s robustness by ×2.35. date: 2023 type: Dissertation type: info:eu-repo/semantics/doctoralThesis type: NonPeerReviewed format: application/pdf identifier: https://archiv.ub.uni-heidelberg.de/volltextserverhttps://archiv.ub.uni-heidelberg.de/volltextserver/32994/1/dissertationPDFA.pdf identifier: DOI:10.11588/heidok.00032994 identifier: urn:nbn:de:bsz:16-heidok-329948 identifier: Rieber, Dennis Sebastian (2023) Deployment of Deep Neural Networks on Dedicated Hardware Accelerators. [Dissertation] relation: https://archiv.ub.uni-heidelberg.de/volltextserver/32994/ rights: info:eu-repo/semantics/openAccess rights: http://archiv.ub.uni-heidelberg.de/volltextserver/help/license_urhg.html language: eng