Performance analysis of pure MPI versus MPI+OpenMP for Jacobi Iteration and a 3D FFT on the Cray XT5
Today many high performance computers are collections of shared memory compute nodes with each compute node having one or more multi-core processors. When writing parallel programs for these machines, one can use pure MPI or various hybrid approaches using MPI and OpenMP. Since OpenMP threads are lighter weight than MPI processes, one would expect that hybrid approaches will achieve better performance and scalability than pure MPI. In practice this is not always the case. This paper investigates the performance and scalability of pure MPI versus hybrid MPI+OpenMP for Jacobi iteration and for a 3D FFT on the Cray XT5.