Impact of Process Mapping on MPI Allgather Performance in Multi-core Cluster

摘要：

the rapid development of multi-core architecture,cluster computing has come into a multi-core era.MPI1is one of the most important parallel programming environments in cluster computing.Naturally,when MPI applications always run on multi-core clusters,processes are mapped onto intra-node and inter-node cores.The MPI library provides many collective communication functions among which MPI Allgather is one of the most frequently used function.So it is significant to do an in-deep study on the impact of multi--core architecture on MPI collective　functions behaviors and give a irection how to get optimal erformanceon multi-core clusters by mapping processes onto intra-node or inter-node　cores according to application behaviors.In this paper,a new concept　called Hierarchical Communication Latency and Bandwidth(HCLB)is　given.The HCLB indicates that we should map the processes performing most frequent communications into intra-node cores in a multi-core cluster.We take MPI Allgather function as an example to illustrate the　performance affected by different process mapping methods.The four　algorithms include three classic algorithms and a new algorithm called　Neighbor Exchange Algorithm proposed in our previous work2.We an alyze the communication behaviors of the four algorithms in detail.The　experimental results reveal that it is important to get optimal performance on multi-core clusters by mapping processes strategies according　to the communication behaviors of message passing algorithm.

作者: Yuxin Tang Yunquan Zhang

作者单位: Laboratory of Parallel Computing,Institute of Software,Chinese Academy of Sciences 100190 Beijing,Ch Laboratory of Parallel Computing,Institute of Software,Chinese Academy of Sciences 100190 Beijing,Ch

会议类型: 国际会议

会议名称: The Inaugural Symposium on Parallel Algorithms, Architectures and Programming(并行算法、结构和编程国际研讨会)

会议地点: 广州

会议语种:英文

页码: 66-79

在线出版日期: 2008-09-16（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Impact of Process Mapping on MPI Allgather Performance in Multi-core Cluster