[mvapich-discuss] (no subject)

Wed Dec 16 09:47:36 EST 2015

Thanks Jonathan. This is mostly the case with us too, but I think affinity is also managed by SLURM even in those cases. Unless there is a reason MVAPICH2 would do a better job?

Thanks for the information on MVAPICH2-GDR. Do I need a second copy of MVAPICH2 for that, or is it a superset of the regular MVAPICH2's features?

____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS      |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>- 973/972.0922 (2x0922)
||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
    `'

On Dec 16, 2015, at 09:43, Jonathan Perkins <perkinjo at cse.ohio-state.edu<mailto:perkinjo at cse.ohio-state.edu>> wrote:

Hello Ryan:

The CPU affinity feature of MVAPICH2 was designed with only a single job running on each node.  This is a more common case in HPC than allowing multiple jobs running on each node.  If you're trying to use SLURM to manage multiple jobs on each node it may be useful to explore cgroups as you've mentioned in your 4th question.

Please note, for jobs using GPUs we recommend using the MVAPICH2-GDR library as it uses many new advanced features for better performance and scalability.

You can find out more about it via:
http://mvapich.cse.ohio-state.edu/overview/#mv2gdr

You can download via:
http://mvapich.cse.ohio-state.edu/downloads/#mv2gdr

On Tue, Dec 15, 2015 at 1:27 PM Novosielski, Ryan <novosirj at ca.rutgers.edu<mailto:novosirj at ca.rutgers.edu>> wrote:
Hi all,

I'm using MVAPICH2 with SLURM's PMI2 interface. I'm therefore not using mpirun/mpiexec at all. A user of mine is running some GPU jobs, which require very small numbers of CPU's. So he's frequently not using the whole node, and frequently running more than one job. MVAPICH2's affinity stubbornly forces the jobs to bind to the same processors. The solution is to turn affinity off.

I have some questions about this:

1) Is there an imaginable scenario where, running with SLURM, I could ever want this feature enabled? Should I somehow look at disabling it system-wide or in the MVAPICH2 compile?
2) If MVAPICH2 can't tell that a processor is already being used at 100%, how can this feature ever work correctly? Just curious of the use case under a different setting. Is it not meant to co-exist, two nodes on the same job?
3) I'd like this to be easy for the users. Should I just turn it off in the module that is loaded for MVAPICH2 to prevent this from being an issue?
4) Any thought to whether integrating cgroups to SLURM might solve the problem (eg. SLURM won't even let MVAPICH2 see the other CPUs, so affinity is a non-issue)?

I'd welcome any other advice other sites have about this.

--
____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
 || \\UTGERS      |---------------------*O*---------------------
 ||_// Biomedical | Ryan Novosielski - Senior Technologist
 || \\ and Health | novosirj at rutgers.edu<mailto:novosirj at rutgers.edu> - 973/972.0922 (2x0922)
 ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
      `'

_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151216/e65dba32/attachment.html>