eprintid: 5624
rev_number: 8
eprint_status: archive
userid: 1
dir: disk0/00/00/56/24
datestamp: 2005-07-07 12:15:14
lastmod: 2014-04-03 19:06:25
status_changed: 2012-08-14 15:15:22
type: doctoralThesis
metadata_visibility: show
creators_name: Wiebalck, Arne
title: ClusterRAID: Architecture and Prototype of a Distributed Fault-Tolerant Mass Storage System for Clusters
title_de: ClusterRAID: Architektur und Prototyp eines verteilten fehlertoleranten Massenspeicher-Systems für Cluster
ispublished: pub
subjects: ddc-510
divisions: i-130700
adv_faculty: af-11
cterms_swd: Cluster <Rechnernetz>
cterms_swd: Fehlertoleranz
cterms_swd: Massenspeicher
cterms_swd: Verfügbarkeit
cterms_swd: Betriebssystem
abstract: During the past few years clusters built from commodity off-the-shelf (COTS) components have emerged as the predominant supercomputer architecture.  Typically comprising a collection of standard PCs or workstations and an interconnection network, they have replaced the traditionally used integrated systems due to their better price/performance ratio. As paradigms shift from mere computing intensive to I/O intensive applications, mass storage  solutions for cluster installations become a more and more crucial aspect of these systems. The inherent unreliability of the underlying components is one of the reasons why no system has been established as a standard storage solution for clusters yet.  This thesis sets out the architecture and prototype implementation of a novel distributed mass storage system for commodity off-the-shelf clusters and addresses the issue of the unreliable constituent components. The key concept of the presented system is the conversion of the local hard  disk drive of a cluster node into a reliable device while preserving the block device interface. By the deployment of sophisticated erasure-correcting codes, the system allows the adjustment of the number of tolerable failures and thus the overall reliability. In addition, the applied data layout considers the access behaviour of a broad range of applications and minimizes the  number of required network transactions. Extensive measurements and functionality tests of the prototype, both stand-alone and in conjunction with local or distributed file systems, show the validity of the concept. 
abstract_translated_text: In den letzten Jahren haben sich Cluster aus Standard-Komponenten in vielen Bereichen als dominante Architektur für Hochleistungsrechner durchgesetzt. Wegen ihres besseren Preis-Leistungsverhältnisses haben diese Systeme, die typischerweise aus Standard-PCs oder Workstations und einem Verbindungsnetzwerk aufgebaut sind, die traditionell verwendeten, integrierten Supercomputer-Architekturen verdrängt. Aufgrund des zu beobachtenden Paradigmen-Wechsels von rein rechen-intensiven hin zu Eingabe/Ausgabe-intensiven Anwendungen werden die in Clustern verwendeten Massenspeichersysteme zu einer immer wichtigeren Komponente. Daß sich bisher kein Standard  für die Nutzung des verteilten Massenspeichers in Clustern durchsetzen konnte, ist vor allem der inhärenten Unzuverlässigkeit der zugrundeliegenden Komponenten zuzuschreiben.  Die vorliegende Arbeit beschreibt die Architektur und eine Prototypen-Implementierung eines verteilten, fehlertoleranten Massenspeichersystems für Cluster. Die grundlegende  Idee der Architektur ist es, die lokale Festplatte eines Clusterknotens zuverlässig zu machen, ohne dabei die Schnittstelle für das Betriebssystem oder die Anwendung zu verändern. Hierbei werden fehler-korrigierende Codes eingesetzt, die es ermöglichen, die Anzahl der zu tolerierenden Fehler und somit die Zuverlässigkeit des Gesamtsystems einzustellen. Das Anordnungsschema für die Datenblöcke innerhalb des Systems berücksichtigt das Zugriffsverhalten einer ganzen Klasse von Applikationen und kann so die erforderlichen Netzwerkzugriffe auf ein Minimum reduzieren. Gründliche Messungen und Funktionstests  des Prototypen, sowohl allein als auch im Zusammenwirken mit lokalen und  verteilten Dateisystemen, belegen die Validität des Konzeptes.
abstract_translated_lang: ger
date: 2005
date_type: published
id_scheme: DOI
id_number: 10.11588/heidok.00005624
ppn_swb: 1644085364
own_urn: urn:nbn:de:bsz:16-opus-56248
date_accepted: 2005-06-29
advisor: HASH(0x55b513136008)
language: eng
bibsort: WIEBALCKARCLUSTERRAI2005
full_text_status: public
citation:   Wiebalck, Arne  (2005) ClusterRAID: Architecture and Prototype of a Distributed Fault-Tolerant Mass Storage System for Clusters.  [Dissertation]     
document_url: https://archiv.ub.uni-heidelberg.de/volltextserver/5624/1/ClusterRAID_PhD_Wiebalck.pdf