[mvapich-discuss] (no subject)

Jonathan Perkins perkinjo at cse.ohio-state.edu
Wed Dec 16 09:43:39 EST 2015


Hello Ryan:

The CPU affinity feature of MVAPICH2 was designed with only a single job
running on each node.  This is a more common case in HPC than allowing
multiple jobs running on each node.  If you're trying to use SLURM to
manage multiple jobs on each node it may be useful to explore cgroups as
you've mentioned in your 4th question.

Please note, for jobs using GPUs we recommend using the MVAPICH2-GDR
library as it uses many new advanced features for better performance and
scalability.

You can find out more about it via:
http://mvapich.cse.ohio-state.edu/overview/#mv2gdr

You can download via:
http://mvapich.cse.ohio-state.edu/downloads/#mv2gdr

On Tue, Dec 15, 2015 at 1:27 PM Novosielski, Ryan <novosirj at ca.rutgers.edu>
wrote:

> Hi all,
>
> I'm using MVAPICH2 with SLURM's PMI2 interface. I'm therefore not using
> mpirun/mpiexec at all. A user of mine is running some GPU jobs, which
> require very small numbers of CPU's. So he's frequently not using the whole
> node, and frequently running more than one job. MVAPICH2's affinity
> stubbornly forces the jobs to bind to the same processors. The solution is
> to turn affinity off.
>
> I have some questions about this:
>
> 1) Is there an imaginable scenario where, running with SLURM, I could ever
> want this feature enabled? Should I somehow look at disabling it
> system-wide or in the MVAPICH2 compile?
> 2) If MVAPICH2 can't tell that a processor is already being used at 100%,
> how can this feature ever work correctly? Just curious of the use case
> under a different setting. Is it not meant to co-exist, two nodes on the
> same job?
> 3) I'd like this to be easy for the users. Should I just turn it off in
> the module that is loaded for MVAPICH2 to prevent this from being an issue?
> 4) Any thought to whether integrating cgroups to SLURM might solve the
> problem (eg. SLURM won't even let MVAPICH2 see the other CPUs, so affinity
> is a non-issue)?
>
> I'd welcome any other advice other sites have about this.
>
> --
> ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
>  || \\UTGERS      |---------------------*O*---------------------
>  ||_// Biomedical | Ryan Novosielski - Senior Technologist
>  || \\ and Health | novosirj at rutgers.edu - 973/972.0922 (2x0922)
>  ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
>       `'
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151216/c4b06212/attachment-0001.html>


More information about the mvapich-discuss mailing list