[mvapich-discuss] Jobs run slowly with >1 job on the same nodes

Dhabaleswar Panda panda at cse.ohio-state.edu
Thu Apr 30 11:25:47 EDT 2009


It looks like CPU affinity is `on' here. Thus, when you are submitting two
32-process jobs, they are exactly getting mapped to the same set of cores.
Thus, both jobs are running slower.

Try running your applications by disabling affinity (MV2_ENABLE_AFFINITY
=0). More details on this parameter are available from MVAPICH2 user guide
at the following location:

http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2.html#x1-10000011.16

Hope this helps.

DK

On Thu, 30 Apr 2009, Nick Holway wrote:

> Dear all.
>
> I'm running a 64bit Rocks 5.1 cluster (ie Centos 5.2) with Voltaire
> OFED 1.4 and SGE 6.1u5. I compiled MVAPICH 1.2 with ifort 10 and I
> configured it with F77 & F90 bindings. The nodes all have 2 quad core
> Xeon CPUs.
>
> We've compiled PMEMD and sander.MPI and see the same problem with
> both. When one job is run at a time (32 CPUs on 8 nodes) the job runs
> well with good performance. If two jobs (eg 32 on the same 8 nodes)
> are launched at the same time then both jobs run an order of magnitude
> slower. A single 64 CPU run on the same nodes runs normally.
>
> We're also seeing problems with jobs disapearing from SGE and qdel not
> deleting the jobs properly.
>
> Does anyone know what might be causing the above issues? FWIW I've run
> the osu benchmarks and subounce on the cluster without issue.
>
> I originally raised this on the Amber mailing list who suggested that
> it's more likely to be a system problem rather than with their
> software (http://structbio.vanderbilt.edu/archives/amber-archive/2009/1410.php).
>
> Regards
>
> Nick
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list