eprintid: 23954
rev_number: 15
eprint_status: archive
userid: 3535
dir: disk0/00/02/39/54
datestamp: 2018-01-12 08:50:43
lastmod: 2018-02-27 11:09:48
status_changed: 2018-01-12 08:50:43
type: doctoralThesis
metadata_visibility: show
creators_name: Klenk, Benjamin
title: Communication Architectures for Scalable GPU-centric Computing Systems
subjects: ddc-004
subjects: ddc-600
divisions: i-110001
divisions: i-720000
adv_faculty: af-11
cterms_swd: Distributed System
cterms_swd: Graphics Processing Unit
abstract: In recent years, power consumption has become the main concern in High Performance Computing (HPC). This has lead to heterogeneous computing systems in which Central Processing Units (CPUs) are supported by accelerators, such as Graphics Processing Units (GPUs). While GPUs used to be seen as slave devices to which the main processor offloads computation, today’s systems tend to deploy more GPUs than CPUs. Eventually, the GPU will become a first-class processor, bearing increasing responsibilities. 
Promoting the GPU to a first-class processor comes with many challenges, such as progress guarantees, dynamic memory management, and scheduling. However, one of the main challenges is the GPU’s inability to orchestrate communication, which is currently entirely handled by the CPU. This work addresses that issue and presents solutions to allow GPUs to source and sink network traffic independently. Many important aspects are addressed, ranging from the application level to how networking hardware is accessed. 
First, important and large scale exascale applications are studied to further understand their communication behavior and applications’ requirements. Several metrics are presented, including time spent for communication, message sizes, and the length of queues that are required to match messages with receive requests. One aspect the analysis revealed is that messages are becoming smaller at scale, which renders the matching of messages and receive requests an important problem to address. 
The next part analyzes how the GPU can directly access the network with various communication models being presented and benchmarked. It is shown that a flat address space of distributed GPU memories shows superior bandwidth than put/get communication or CPU-controlled message passing, but less communication can be overlapped with computation. Overall, GPU-controlled communication is always superior, both in terms of time-to-solution and energy spending. The final part addresses communication management on GPUs, which is required to provide high-level communication abstractions. Besides other fundamental building blocks, an algorithm for the message matching is presented that yields similar performance as CPUs. However, it is also shown that the messaging protocol can be relaxed to improve performance significantly, leveraging the massive amount of parallelism provided by the GPU’s architecture.
date: 2018
id_scheme: DOI
id_number: 10.11588/heidok.00023954
ppn_swb: 1659311454
own_urn: urn:nbn:de:bsz:16-heidok-239547
date_accepted: 2018-01-09
advisor: HASH(0x558eaa6c80b8)
language: eng
bibsort: KLENKBENJACOMMUNICAT2018
full_text_status: public
citation:   Klenk, Benjamin  (2018) Communication Architectures for Scalable GPU-centric Computing Systems.  [Dissertation]     
document_url: https://archiv.ub.uni-heidelberg.de/volltextserver/23954/1/bklenk_dissertation_final.pdf