[mvapich-discuss] Cores are oversubscribed when running more than on mpirun instance

Wischert Raphael wischert at inorg.chem.ethz.ch
Thu Apr 12 18:23:07 EDT 2012


I have a 12-core Intel Xeon X5690 machine (2 sockets with 6 cores each) set up with NUMA and Hyperthreading. It thus appears as 24-core under Linux (SL 6.1, 2.6.32-220.4.1.el6.x86_64).

xxx$ mpiname -a
MVAPICH2 1.7 unreleased development copy ch3:mrail

Compilation
CC: icc    -DNDEBUG -DNVALGRIND -O2
CXX: c++   -DNDEBUG -DNVALGRIND -O2
F77: gfortran   -O2 
FC: ifort   -O2

Configuration
--prefix=/opt/mvapich2/1.8rc1/intel/11.1/075/ CC=icc FC=ifort

(the official release failed to compile so I had to use today's svn version, which worked fine)

I set MV2_CPU_BINDING_POLICY=scatter because this gives better performance for me. But with "bunch" the following problems are the same.

Running one instance of mpirun works fine:

mpirun -np 12 executable > outfile &

The "top" command shows that CPU 0 to 11 work at 100% ni.

Executing the command again (1st mpirun is still running) in a different directory oversubscribes the cores. The "top" command shows that CPU 0 to 11 work at 100% ni and CPU 12 to 24 are idle. Each process therefore uses 50% CPU, leading to a catastrophic performance, of course. The problem is also there when running 2 mpirun processes launched on 1 core each. It disappears more or less when setting MV2_ENABLE_AFFINITY=0. Thus, the linux scheduler is used instead of hwloc. However in this case the performance is signficantly worse (lot of swapping between cores), even when just one instance of mpirun is running. With 2 instances some cores still use only 60-70%.

Thanks a lot for your help,
Raphael




More information about the mvapich-discuss mailing list