Optimizing message-passing performance within symmetric multiprocessor systems
The Message Passing Interface (MPI) has been widely used in the area of parallel computing due to its portability, scalability, and ease of use. Message passing within Symmetric Multiprocessor (SMP) systems is an import part of any MPI library since it enables parallel programs to run efficiently on SMP systems, or clusters of SMP systems when combined with other ways of communication such as TCP/IP. Most message-passing implementations use a shared memory pool as an intermediate buffer to hold messages, some lock mechanisms to protect the pool, and some synchronization mechanism for coordinating the processes. However, the performance varies significantly depending on how these are implemented. The work here implements two SMP message-passing modules using lock-based and lock-free approaches for MPLi̲te, a compact library that implements a subset of the most commonly used MPI functions. Various optimization techniques have been used to optimize the performance. These two modules are evaluated using a communication performance analysis tool called NetPIPE, and compared with the implementations of other MPI libraries such as MPICH, MPICH2, LAM/MPI and MPI/PRO. Performance tools such as PAPI and VTune are used to gather some runtime information at the hardware level. This information together with some cache theory and the hardware configuration is used to explain various performance phenomena. Tests using a real application have shown the performance of the different implementations in real practice. These results all show that the improvements of the new techniques over existing implementations.