<> "The repository administrator has not yet configured an RDF license."^^ . <> . . "Parallel Asynchronous Matrix Multiplication for a Distributed Pipelined Neural Network"^^ . "Machine learning is an approach to devise algorithms that compute an output without a given rule set but based on a self-learning concept. This approach is of great importance for several fields of applications in science and industry where traditional programming methods are not sufficient. In neural networks, a popular subclass of machine learning algorithms, commonly previous experience is used to train the network and produce good outputs for newly introduced inputs. By increasing the size of the network more complex problems can be solved which again rely on a huge amount of training data. Increasing the complexity also leads to higher computational demand and storage requirements and to the need for parallelization.\r\nSeveral parallelization approaches of neural networks have already been considered. Most approaches use special purpose hardware whilst other work focuses on using standard hardware. Often these approaches target the problem by parallelizing the training data. In this work a new parallelization method named poadSGD is proposed for the parallelization of fully-connected, largescale feedforward networks on a compute cluster with standard hardware. poadSGD is based on the stochastic gradient descent algorithm. A block-wise distribution of the network's layers to groups of processes and a pipelining scheme for batches of the training samples are used. The network is updated asynchronously without interrupting ongoing computations of subsequent batches. For this task a one-sided communication scheme is used. A main algorithmic part of the batch-wise pipelined version consists of matrix multiplications which occur for a special distributed setup, where each matrix is held by a different process group.\r\nGASPI, a parallel programming model from the field of \"Partitioned Global Address Spaces\" (PGAS) models is introduced and compared to other models from this class. As it mainly relies on one-sided and asynchronous communication it is a perfect candidate for the asynchronous update task in the poadSGD algorithm. Therefore, the matrix multiplication is also implemented based GASPI. In order to efficiently handle upcoming synchronizations within the process groups and achieve a good workload distribution, a two-dimensional block-cyclic data distribution is applied for the matrices. Based on this distribution, the multiplication algorithm is computed by diagonally iterating over the sub blocks of the resulting matrix and computing the sub blocks in subgroups of the processes. The sub blocks are computed by sharing the workload between the process groups and communicating mostly in pairs or in subgroups. The communication in pairs is set up to be overlapped by other ongoing computations. The implementations provide a special challenge, since the asynchronous communication routines must be handled with care as to which processor is working at what point in time with which data in order to prevent an unintentional dual use of data.\r\nThe theoretical analysis shows the matrix multiplication to be superior to a naive implementation when the dimension of the sub blocks of the matrices exceeds 382. The performance achieved in the test runs did not withstand the expectations the theoretical analysis predicted. The algorithm is executed on up to 512 cores and for matrices up to a size of 131,072 x 131,072.\r\nThe implementation using the GASPI API was found not be straightforward but to provide a good potential for overlapping communication with computations whenever the data dependencies of an application allow for it. The matrix multiplication was successfully implemented and can be used within an implementation of the poadSGD method that is yet to come. The poadSGD method seems to be very promising, especially as nowadays, with the larger amount of data and the increased complexity of the applications, the approaches to parallelization of neural networks are increasingly of interest."^^ . "2017" . . . . . . . "Anke Mareike"^^ . "Schmidtobreick"^^ . "Anke Mareike Schmidtobreick"^^ . . . . . . "Parallel Asynchronous Matrix Multiplication for a Distributed Pipelined Neural Network (PDF)"^^ . . . "2017-11-20_thesis-Schmidtobreick_publication.pdf"^^ . . . "Parallel Asynchronous Matrix Multiplication for a Distributed Pipelined Neural Network (Other)"^^ . . . . . . "lightbox.jpg"^^ . . . "Parallel Asynchronous Matrix Multiplication for a Distributed Pipelined Neural Network (Other)"^^ . . . . . . "preview.jpg"^^ . . . "Parallel Asynchronous Matrix Multiplication for a Distributed Pipelined Neural Network (Other)"^^ . . . . . . "medium.jpg"^^ . . . "Parallel Asynchronous Matrix Multiplication for a Distributed Pipelined Neural Network (Other)"^^ . . . . . . "small.jpg"^^ . . . "Parallel Asynchronous Matrix Multiplication for a Distributed Pipelined Neural Network (Other)"^^ . . . . . . "indexcodes.txt"^^ . . "HTML Summary of #23737 \n\nParallel Asynchronous Matrix Multiplication for a Distributed Pipelined Neural Network\n\n" . "text/html" . . . "004 Informatik"@de . "004 Data processing Computer science"@en . . . "510 Mathematik"@de . "510 Mathematics"@en . .