<> "The repository administrator has not yet configured an RDF license."^^ . <> . . "Deployment of Deep Neural Networks on Dedicated Hardware Accelerators"^^ . "Deep Neural Networks (DNNs) have established themselves as powerful tools for\r\na wide range of complex tasks, for example computer vision or natural language\r\nprocessing. DNNs are notoriously demanding on compute resources and as a\r\nresult, dedicated hardware accelerators for all use cases are developed. Different\r\naccelerators provide solutions from hyper scaling cloud environments for the\r\ntraining of DNNs to inference devices in embedded systems. They implement\r\nintrinsics for complex operations directly in hardware. A common example\r\nare intrinsics for matrix multiplication. However, there exists a gap between\r\nthe ecosystems of applications for deep learning practitioners and hardware\r\naccelerators. HowDNNs can efficiently utilize the specialized hardware intrinsics\r\nis still mainly defined by human hardware and software experts.\r\nMethods to automatically utilize hardware intrinsics in DNN operators are a\r\nsubject of active research. Existing literature often works with transformationdriven\r\napproaches, which aim to establish a sequence of program rewrites and\r\ndata-layout transformations such that the hardware intrinsic can be used to\r\ncompute the operator. However, the complexity this of task has not yet been\r\nexplored, especially for less frequently used operators like Capsule Routing. And\r\nnot only the implementation of DNN operators with intrinsics is challenging,\r\nalso their optimization on the target device is difficult. Hardware-in-the-loop\r\ntools are often used for this problem. They use latency measurements of implementations\r\ncandidates to find the fastest one. However, specialized accelerators\r\ncan have memory and programming limitations, so that not every arithmetically\r\ncorrect implementation is a valid program for the accelerator. These invalid\r\nimplementations can lead to unnecessary long the optimization time.\r\nThis work investigates the complexity of transformation-driven processes to\r\nautomatically embed hardware intrinsics into DNN operators. It is explored\r\nwith a custom, graph-based intermediate representation (IR). While operators\r\nlike Fully Connected Layers can be handled with reasonable effort, increasing\r\noperator complexity or advanced data-layout transformation can lead to scaling issues. \r\nBuilding on these insights, this work proposes a novel method to embed\r\nhardware intrinsics into DNN operators. It is based on a dataflow analysis.\r\nThe dataflow embedding method allows the exploration of how intrinsics and\r\noperators match without explicit transformations. From the results it can derive\r\nthe data layout and program structure necessary to compute the operator with\r\nthe intrinsic. A prototype implementation for a dedicated hardware accelerator\r\ndemonstrates state-of-the art performance for a wide range of convolutions, while\r\nbeing agnostic to the data layout. For some operators in the benchmark, the\r\npresented method can also generate alternative implementation strategies to\r\nimprove hardware utilization, resulting in a geo-mean speed-up of ×2.813 while\r\nreducing the memory footprint. Lastly, by curating the initial set of possible\r\nimplementations for the hardware-in-the-loop optimization, the median timeto-\r\nsolution is reduced by a factor of ×2.40. At the same time, the possibility to\r\nhave prolonged searches due a bad initial set of implementations is reduced,\r\nimproving the optimization’s robustness by ×2.35."^^ . "2023" . . . . . . . "Dennis Sebastian"^^ . "Rieber"^^ . "Dennis Sebastian Rieber"^^ . . . . . . "Deployment of Deep Neural Networks on Dedicated Hardware Accelerators (PDF)"^^ . . . "dissertationPDFA.pdf"^^ . . . "Deployment of Deep Neural Networks on Dedicated Hardware Accelerators (Other)"^^ . . . . . . "lightbox.jpg"^^ . . . "Deployment of Deep Neural Networks on Dedicated Hardware Accelerators (Other)"^^ . . . . . . "preview.jpg"^^ . . . "Deployment of Deep Neural Networks on Dedicated Hardware Accelerators (Other)"^^ . . . . . . "medium.jpg"^^ . . . "Deployment of Deep Neural Networks on Dedicated Hardware Accelerators (Other)"^^ . . . . . . "small.jpg"^^ . . . "Deployment of Deep Neural Networks on Dedicated Hardware Accelerators (Other)"^^ . . . . . . "indexcodes.txt"^^ . . "HTML Summary of #32994 \n\nDeployment of Deep Neural Networks on Dedicated Hardware Accelerators\n\n" . "text/html" . . . "004 Informatik"@de . "004 Data processing Computer science"@en . .