Glenn R. Luecke, Ying Li and Martin Cuma
The purpose of this paper is to evaluate how to use nodes in a cluster efficiently by studying the NAS Parallel Benchmarks (NASPB) on Intel Xeon and AMD Opteron dual CPU Linux…
Abstract
Purpose
The purpose of this paper is to evaluate how to use nodes in a cluster efficiently by studying the NAS Parallel Benchmarks (NASPB) on Intel Xeon and AMD Opteron dual CPU Linux clusters.
Design/methodology/approach
The performance results of the NASPB are presented both with one MPI process per node (1 ppn) and with two MPI processes per node (2 ppn). These benchmark results were analyzed by considering the impact of cache effects, code scalability, memory bandwidth within nodes, and the impact of MPI and the MPI communication network. Memory bandwidth was benchmarked using MPI versions of the Streams benchmarks. The impact of MPI and the MPI communication network are evaluated by benchmarking the performance of MPI sends and receives, MPI broadcast, and the MPI all‐to‐all routines.
Findings
The performance results from running the NASPB and from the memory bandwidth benchmarks show that better performance can sometimes be achieved using 1 ppn. Performance results show that the AMD Opteron/Myrinet cluster is able to achieve significantly better utilization of the second processor than the Intel Xeon/Myrinet cluster.
Practical implications
Most Linux clusters are purchased with two processors per node. One would like to run all applications on a cluster with two processors per node using 2 ppn instead of 1 ppn in order to utilize the second processor on each node. However, our results show that this is not always the best choice. Users should always assess their program performance with both 1 ppn and 2 ppn before running production calculations. This issue becomes even more important with the emergence of multi‐core processors.
Originality/value
To the authors' best knowledge, this is the only detailed comparison of AMD Opteron and Intel Xeon dual processor node parallel performance on large Myrinet clusters. The paper should be of value to everybody considering running on or purchasing AMD or Intel‐based Linux cluster.