Vorschau |
PDF, Englisch
Download (713kB) | Nutzungsbedingungen |
Abstract
Multi-core multi-socket distributed shared-memory com- puters (DSM computers, for short) have become an impor- tant node architecture in scientific computing as they provide substantial computational capacity with relatively low space and power requirements. Compared to conventional computer networks, inter-chip networks used in DSM computers feature higher bandwidth, lower latency and tighter integration with the CPU. The inter-chip network is a shared resource among the user application and many other services, which can lead to consid- erable variation of execution times of identical communication tasks. In this work, we explore traffic patterns resulting from MPI collective communication primitives and investigate the ques- tion whether inter-chip link load is a reliable indicator and predictor for the execution time of collective communication primitives on a DSM computer. Our experiments on a Sun Fire X4600 M2 DSM computer with 32 cores (eight quad-core CPUs) indicate that specific single link loads are positively correlated with the execution time of MPI ALLREDUCE. Ob- serving patterns over multiple links allows refinement of the single-link observation.
Dokumententyp: | Artikel |
---|---|
Erstellungsdatum: | 03 Feb. 2011 15:51 |
Erscheinungsjahr: | 2011 |
Institute/Einrichtungen: | ?? i-720000 ?? |
DDC-Sachgruppe: | 004 Informatik |
Normierte Schlagwörter: | Computerarchitektur |