Download (696Kb) | Lizenz: Print on Demand
Multi-core multi-socket distributed shared-memory com- puters (DSM computers, for short) have become an impor- tant node architecture in scientific computing as they provide substantial computational capacity with relatively low space and power requirements. Compared to conventional computer networks, inter-chip networks used in DSM computers feature higher bandwidth, lower latency and tighter integration with the CPU. The inter-chip network is a shared resource among the user application and many other services, which can lead to consid- erable variation of execution times of identical communication tasks. In this work, we explore traffic patterns resulting from MPI collective communication primitives and investigate the ques- tion whether inter-chip link load is a reliable indicator and predictor for the execution time of collective communication primitives on a DSM computer. Our experiments on a Sun Fire X4600 M2 DSM computer with 32 cores (eight quad-core CPUs) indicate that specific single link loads are positively correlated with the execution time of MPI ALLREDUCE. Ob- serving patterns over multiple links allows refinement of the single-link observation.
|Faculties / Institutes:||Service facilities > Institut f. Technische Informatik (ZITI)|
|Subjects:||004 Data processing Computer science|