Performance effects of node mapping on the IBM BlueGene/L machine
Date
Authors
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Abstract
The IBM BlueGene/L (BG/L) supercomputer is a new machine consisting of up to 65536 relatively modest compute nodes connected with three application-level networks -- a high-performance point-to-point 3D torus network, a global combining/broadcast tree network for collective operations, and a global interrupt/barrier network for extremely fast global barriers. The BG/L control system allows the user to assign MPI logical ranks to physical torus coordinates at run-time in an arbitrary manner as long as all nodes are uniquely included in the mapping. This presents the possibility of increasing application performance with very little effort. This thesis investigates the performance effects of node mapping with several benchmarks and scientific codes using a variety of existing and new mapping strategies. The benchmarks are the NAS parallel benchmarks, the Ames Laboratory Classical Molecular dynamics code (ALCMD), and the General Atomic and Molecular Electronic Structure System (GAMESS) application. The NAS benchmarks are short, easy to understand, and fairly well known. ALCMD has an interesting communication pattern that should benefit from a good mapping strategy. GAMESS is one application that is not necessarily well-suited for running on BlueGene because it requires a large amount of compute power and memory per node. However, it provides an interesting data point for performance of applications that were not designed for a particular system and the possible benefits of mapping on such applications. The mappings investigated were the stock permutations (XYZ, XZY, etc), Gray-code based mesh mappings, random maps, variations on Gray-code maps for embedding 2D meshes in the 3D torus, and three maps designed for GAMESS. Performance results are presented for node mappings on several BG/L partition sizes.