High-Performance Graph Processing

Hauck, Matthias

[thumbnail of diss-matthias_hauck-publication-print.pdf]

Preview

PDF, English - main document
Download (1MB) | Lizenz:

Creative Commons Attribution - ShareAlike 4.0

Citation of documents: Please do not cite the URL that is displayed in your browser location input, instead use the DOI, URN or the persistent URL below, as we can guarantee their long-time accessibility.

DOI: 10.11588/heidok.00038128
URN: urn:nbn:de:bsz:16-heidok-381283

Abstract

With the increasing importance of graphs and algorithms that use them in many new use cases and the increasing availability of data, the number of applications relying on a graph data model is rising. Graphs are not only used in many sciences like biology, but also in the public and commercial sectors to represent, for example, social, business, or traffic networks. Applications use these graphs using queries that apply graph algorithms on them, while they run on systems shared by multiple users. For the performance, this means that the overall system throughput is more important than the elapsed time of a single isolated graph query. The issue is that parallelizing a graph algorithm efficiently is already complex due to the data-dependent nature of many graph algorithms and the broad range of graph sizes and properties. Likewise, there has been a trend to more complex systems with an ever-increasing number of cores, with more complex cache and memory structures and technologies. In order to deal with this complexity, graph processing engines have been proposed that provide primitives for efficient parallel graph processing. Unfortunately, these systems typically focus on the execution of a single query at a time, so they do not try to improve overall throughput during concurrent usage. In this work, we investigate high-performance throughput-oriented graph query processing on scale-up systems. Therefore, we first investigate the behavior of a state-of-the-art graph processing engine using different graph sizes and algorithms when multiple instances of it are used concurrently on the same system. These experiments provide insight into the potential, but also the issues of concurrent execution. Afterwards, we investigate parallel execution inside a single query and, in particular, contention on atomic update operations as they are common in many graph algorithms. We propose a simple buffering schema for atomic updates to reduce contention and analyze it in different scenarios that mimic common update patterns. Finally, based on the previous investigations, we propose a throughput-oriented runtime scheduler for graph queries that automatically controls the degree of parallelization to avoid inefficiencies. The underlying concept consists of multiple parts: (1) sampling is used to determine graph properties, (2) cost estimations and parallelization boundaries are derived from graph, algorithm, and system properties, (3) suitable work packages are generated based on the cost, and (4) executed controlled by a runtime component that controls the parallelism. We evaluate the proposed concept using different algorithms on synthetic and real-world datasets, with up to 16 concurrent sessions (queries). The results demonstrate robust performance despite these various configurations, which is always close to or even slightly ahead of manually optimized implementations.

Translation of abstract (German)

Mit der zunehmenden Bedeutung von Graphen und Algorithmen, die diese in verschiedenen Anwendungsszenarien nutzen und der zunehmenden Verfügbarkeit von Daten, steigt die Menge an Anwendungen, die auf Graphen als Datenmodel aufbauen. Graphen werden nicht nur in vielen Wissenschaften wie der Biologie verwendet, sondern auch im öffentlichen und privaten Sektor um z.B. Soziale-, Geschäfts- oder Verkehrsnetzwerke darzustellen. Die Anwendungen, die diese nutzen, laufen dabei auf Systemen, die von mehreren Nutzern geteilt werden. Das bedeutet, dass der Durchsatz des Systems wichtiger ist als die Ausführungszeit einer isolierten Abfrage. Das Problem hierbei ist, dass effiziente Parallelisierung von Graph Algorithmen komplex ist aufgrund von Datenabhängigkeiten und der großen Vielfalt von Größen und Eigenschaften von Graphen. Gleichzeitig werden Systeme immer komplexer durch eine steigende Menge an CPU-Kernen und einer wachsenden Vielfalt an Cache und Speicher Strukturen und Technologien. Um diese Komplexität zu bewältigen, wurden Graph Processing Engines entwickelt, die Konstrukte für effiziente parallele Graph Verarbeitung bereitstellen. Der Fokus dieser Engines ist die performante Ausführung einzelner Anfragen und nicht die Verbesserung des Durchsatzes mehrere gleichzeitig. In dieser Arbeit untersuchen wir die Durchsatz-orientierte Graph Anfragen Verarbeitung auf scale-up Systemen. Wir untersuchen dabei zuerst das Verhalten einer aktuellen Graph Processing Engine unter Verwendung verschiedener Graph Größen und Algorithmen, wenn mehrere Instanzen von dieser gleichzeitig ausgeführt werden. Diese Experimente geben uns einen Einblick in das Potenzial, aber auch die Probleme, die durch gleichzeitige Ausführung entstehen. Darauf untersuchen wir die Auswirkung von konkurrierenden Updates mit atomaren Update-Operationen auf die Performanz innerhalb einer Anfrage, wie sie häufig in verschieden Graph Algorithmen vorkommen. Wir schlagen dazu ein einfaches Puffer-Schema vor, um konkurrierenden Zugriff zu reduzieren und untersuchen dieses in verschiedenen Szenarien, die gängigen Update Muster entsprechen. Zum Schluss schlagen wir einen Durchsatz orientierten Scheduler für Graph Anfragen vor, der automatischen den Grad der Parallelität steuert, um ineffizienzen zu vermeiden. Das Konzept besteht aus mehreren Teilen: 1) Wir verwenden Sampling, um Eigenschaften des zu verarbeitenden Graphen zu bestimmen; 2) kosten Abschätzungen und Parallelität Grenzen werden von Graph-, Algorithmen- und Systemeigenschaften abgeleitet; 3) passende Arbeitspakete werden basierend auf den Kosten erzeugt und 4) von einer Laufzeit Komponente, die den Grad der Parallelität kontrolliert, ausgeführt. Das System wird anschließend unter Verwendung verschiedener Algorithmen auf synthetischen und realen Datensätzen evaluiert unter Verwendung von bis zu 16 Instanzen gleichzeitig. Die Resultate zeigen eine robuste Leistung trotz der verschiedenen Konfigurationen, welche immer nah bzw. leicht vor der Performanz einer manuell optimierten Lösung ist.

Document type:	Dissertation
Supervisor:	Fröning, Prof. Dr. Holger
Place of Publication:	Heidelberg
Date of thesis defense:	24 April 2024
Date Deposited:	25 Feb 2026 12:47
Date:	2026
Faculties / Institutes:	The Faculty of Mathematics and Computer Science > Department of Computer Science
DDC-classification:	004 Data processing Computer science
Controlled Keywords:	Database, Graph, Parallele Datenverarbeitung
Uncontrolled Keywords:	graph database, contention mitigation, concurrent query processing