[mvapich-discuss] Cores are oversubscribed when running more than on mpirun instance

Hung-Sheng Tsao (LaoTsao) Ph.D laotsao at gmail.com
Sat Apr 14 09:25:23 EDT 2012


hi
one way to avoid the oversubscribation is to use some workload manager
e.g.slurm, sge etc
regards


Sent from my iPad

On Apr 14, 2012, at 6:26, "Wischert  Raphael" <wischert at inorg.chem.ethz.ch> wrote:

> Hi Raphael
> 
> This is expected behavior in your case as you are running two different MPI jobs simultaneously. MVAPICH2 CPU binding policies are defined per MPI job.
> 
> The latest MVAPICH2 release (1.8rc1) supports a new run time parameter called MV2_CPU_BINDING_LEVEL.  The combination of MV2_CPU_BINDING_LEVEL=socket and MV2_CPU_BINDING_POLICY=scatter should resolve your issue at some level by evenly distributing the mpi ranks from both MPI jobs on both sockets.
> 
> Thanks for your quick reply. Unfortunately this has no effect on my currently installed version. The cores are still oversubscribed.
> 
> 
> You can also resolve the affinity issue by specifying CPU mapping manually with MV2_CPU_MAPPING for each MPI job.  For example, you can run one MPI job with MV2_CPU_MAPPING=0:1:2:3:4:5:6:7:8:9:10:11 and another job with MV2_CPU_MAPPING=12:13:14:15:16:17:18:19:20:21:22:23.
> 
> I am sure this would solve the problem, but several persons are going to use the machine and it would be very tedious to make them adjust the CPU-mapping manually for every job after looking for "free" CPUs.
> 
> By the way, sometimes pure OpenMP jobs run on the machine, in addition to MPI. It seems to me that this even worsens the scenario because the linux scheduler interferes with MPI jobs. That's at least what I found with openmpi-1.5.5.
> For example, when OpenMP uses 1600% CPU, running an additional mpirun process on 8 CPUs would totally mess up the MPI-calculation (each process would use 10-30% CPU with a lot of swapping).
> 
> I don't have the same problem with mvapich2 (although some processes still use less than 100%), but unfortunately the here-described core oversubscription.
> 
> 
> You can find more details about CPU affinity settings in user guide section at : http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.8rc1.html#x1-520006.3
> 
> You indicated that, you are not able to build official release. Is it mvapich2-1.8rc1? Can you give more details about this build issue?
> 
> I have the following problem, when attempting to build the mvapich2-1.8rc1 release with
> --prefix=/opt/mvapich2/1.8rc1/intel/11.1/075/ CC=icc FC=ifort --with-hwloc
> 
> mv -f .deps/libnodelist_a-nodelist_parser.Tpo .deps/libnodelist_a-nodelist_parser.Po
> /bin/sh ../../../../../confdb/ylwrap nodelist_scanner.l .c nodelist_scanner.c -- :
> make[7]: *** [nodelist_scanner.c] Error 1
> make[7]: Leaving directory `/home/rwischert/Downloads/mvapich2-1.8rc1/src/pm/mpirun/src/slurm'
> make[6]: *** [all] Error 2
> make[6]: Leaving directory `/home/rwischert/Downloads/mvapich2-1.8rc1/src/pm/mpirun/src/slurm'
> make[5]: *** [all-recursive] Error 1
> make[5]: Leaving directory `/home/rwischert/Downloads/mvapich2-1.8rc1/src/pm/mpirun/src'
> make[4]: *** [all-recursive] Error 1
> make[4]: Leaving directory `/home/rwischert/Downloads/mvapich2-1.8rc1/src/pm/mpirun'
> make[3]: *** [all] Error 2
> make[3]: Leaving directory `/home/rwischert/Downloads/mvapich2-1.8rc1/src/pm/mpirun'
> make[2]: *** [all-redirect] Error 1
> make[2]: Leaving directory `/home/rwischert/Downloads/mvapich2-1.8rc1/src/pm'
> make[1]: *** [all-redirect] Error 2
> make[1]: Leaving directory `/home/rwischert/Downloads/mvapich2-1.8rc1/src'
> make: *** [all-redirect] Error 2
> 
> This is similar to what is described in this post:
> http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2012-March/003804.html
> 
> 
> 
> -Devendar
> 
> On Thu, Apr 12, 2012 at 6:23 PM, Wischert Raphael <wischert at inorg.chem.ethz.ch<mailto:wischert at inorg.chem.ethz.ch>> wrote:
> I have a 12-core Intel Xeon X5690 machine (2 sockets with 6 cores each) set up with NUMA and Hyperthreading. It thus appears as 24-core under Linux (SL 6.1, 2.6.32-220.4.1.el6.x86_64).
> 
> xxx$ mpiname -a
> MVAPICH2 1.7 unreleased development copy ch3:mrail
> 
> Compilation
> CC: icc    -DNDEBUG -DNVALGRIND -O2
> CXX: c++   -DNDEBUG -DNVALGRIND -O2
> F77: gfortran   -O2
> FC: ifort   -O2
> 
> Configuration
> --prefix=/opt/mvapich2/1.8rc1/intel/11.1/075/ CC=icc FC=ifort
> 
> (the official release failed to compile so I had to use today's svn version, which worked fine)
> 
> I set MV2_CPU_BINDING_POLICY=scatter because this gives better performance for me. But with "bunch" the following problems are the same.
> 
> Running one instance of mpirun works fine:
> 
> mpirun -np 12 executable > outfile &
> 
> The "top" command shows that CPU 0 to 11 work at 100% ni.
> 
> Executing the command again (1st mpirun is still running) in a different directory oversubscribes the cores. The "top" command shows that CPU 0 to 11 work at 100% ni and CPU 12 to 24 are idle. Each process therefore uses 50% CPU, leading to a catastrophic performance, of course. The problem is also there when running 2 mpirun processes launched on 1 core each. It disappears more or less when setting MV2_ENABLE_AFFINITY=0. Thus, the linux scheduler is used instead of hwloc. However in this case the performance is signficantly worse (lot of swapping between cores), even when just one instance of mpirun is running. With 2 instances some cores still use only 60-70%.
> 
> Thanks a lot for your help,
> Raphael
> 
> 
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> 
> 
> 
> --
> Devendar
> 
> 
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss



More information about the mvapich-discuss mailing list