Communication Architectures for Scalable GPU-centric Computing Systems

Klenk, Benjamin

[thumbnail of bklenk_dissertation_final.pdf]

Vorschau

PDF, Englisch
Download (5MB) | Nutzungsbedingungen

Zitieren von Dokumenten: Bitte verwenden Sie für Zitate nicht die URL in der Adresszeile Ihres Webbrowsers, sondern entweder die angegebene DOI, URN oder die persistente URL, deren langfristige Verfügbarkeit wir garantieren. [mehr ...]

DOI: 10.11588/heidok.00023954
URN: urn:nbn:de:bsz:16-heidok-239547

Abstract

In recent years, power consumption has become the main concern in High Performance Computing (HPC). This has lead to heterogeneous computing systems in which Central Processing Units (CPUs) are supported by accelerators, such as Graphics Processing Units (GPUs). While GPUs used to be seen as slave devices to which the main processor offloads computation, today’s systems tend to deploy more GPUs than CPUs. Eventually, the GPU will become a first-class processor, bearing increasing responsibilities. Promoting the GPU to a first-class processor comes with many challenges, such as progress guarantees, dynamic memory management, and scheduling. However, one of the main challenges is the GPU’s inability to orchestrate communication, which is currently entirely handled by the CPU. This work addresses that issue and presents solutions to allow GPUs to source and sink network traffic independently. Many important aspects are addressed, ranging from the application level to how networking hardware is accessed. First, important and large scale exascale applications are studied to further understand their communication behavior and applications’ requirements. Several metrics are presented, including time spent for communication, message sizes, and the length of queues that are required to match messages with receive requests. One aspect the analysis revealed is that messages are becoming smaller at scale, which renders the matching of messages and receive requests an important problem to address. The next part analyzes how the GPU can directly access the network with various communication models being presented and benchmarked. It is shown that a flat address space of distributed GPU memories shows superior bandwidth than put/get communication or CPU-controlled message passing, but less communication can be overlapped with computation. Overall, GPU-controlled communication is always superior, both in terms of time-to-solution and energy spending. The final part addresses communication management on GPUs, which is required to provide high-level communication abstractions. Besides other fundamental building blocks, an algorithm for the message matching is presented that yields similar performance as CPUs. However, it is also shown that the messaging protocol can be relaxed to improve performance significantly, leveraging the massive amount of parallelism provided by the GPU’s architecture.

Dokumententyp:	Dissertation
Erstgutachter:	Fröning, Prof. Dr. Holger
Tag der Prüfung:	9 Januar 2018
Erstellungsdatum:	12 Jan. 2018 08:50
Erscheinungsjahr:	2018
Institute/Einrichtungen:	Fakultät für Mathematik und Informatik > Dekanat der Fakultät für Mathematik und Informatik ?? i-720000 ??
DDC-Sachgruppe:	004 Informatik 600 Technik, Medizin, angewandte Wissenschaften
Normierte Schlagwörter:	Distributed System, Graphics Processing Unit