eprintid: 23954 rev_number: 15 eprint_status: archive userid: 3535 dir: disk0/00/02/39/54 datestamp: 2018-01-12 08:50:43 lastmod: 2018-02-27 11:09:48 status_changed: 2018-01-12 08:50:43 type: doctoralThesis metadata_visibility: show creators_name: Klenk, Benjamin title: Communication Architectures for Scalable GPU-centric Computing Systems subjects: ddc-004 subjects: ddc-600 divisions: i-110001 divisions: i-720000 adv_faculty: af-11 cterms_swd: Distributed System cterms_swd: Graphics Processing Unit abstract: In recent years, power consumption has become the main concern in High Performance Computing (HPC). This has lead to heterogeneous computing systems in which Central Processing Units (CPUs) are supported by accelerators, such as Graphics Processing Units (GPUs). While GPUs used to be seen as slave devices to which the main processor offloads computation, today’s systems tend to deploy more GPUs than CPUs. Eventually, the GPU will become a first-class processor, bearing increasing responsibilities. Promoting the GPU to a first-class processor comes with many challenges, such as progress guarantees, dynamic memory management, and scheduling. However, one of the main challenges is the GPU’s inability to orchestrate communication, which is currently entirely handled by the CPU. This work addresses that issue and presents solutions to allow GPUs to source and sink network traffic independently. Many important aspects are addressed, ranging from the application level to how networking hardware is accessed. First, important and large scale exascale applications are studied to further understand their communication behavior and applications’ requirements. Several metrics are presented, including time spent for communication, message sizes, and the length of queues that are required to match messages with receive requests. One aspect the analysis revealed is that messages are becoming smaller at scale, which renders the matching of messages and receive requests an important problem to address. The next part analyzes how the GPU can directly access the network with various communication models being presented and benchmarked. It is shown that a flat address space of distributed GPU memories shows superior bandwidth than put/get communication or CPU-controlled message passing, but less communication can be overlapped with computation. Overall, GPU-controlled communication is always superior, both in terms of time-to-solution and energy spending.
The final part addresses communication management on GPUs, which is required to provide high-level communication abstractions. Besides other fundamental building blocks, an algorithm for the message matching is presented that yields similar performance as CPUs. However, it is also shown that the messaging protocol can be relaxed to improve performance significantly, leveraging the massive amount of parallelism provided by the GPU’s architecture. date: 2018 id_scheme: DOI id_number: 10.11588/heidok.00023954 ppn_swb: 1659311454 own_urn: urn:nbn:de:bsz:16-heidok-239547 date_accepted: 2018-01-09 advisor: HASH(0x558eaa6c80b8) language: eng bibsort: KLENKBENJACOMMUNICAT2018 full_text_status: public citation: Klenk, Benjamin (2018) Communication Architectures for Scalable GPU-centric Computing Systems. [Dissertation] document_url: https://archiv.ub.uni-heidelberg.de/volltextserver/23954/1/bklenk_dissertation_final.pdf