[mvapich-discuss] (no subject)

John Donners john.donners at surfsara.nl
Wed Dec 16 09:49:38 EST 2015


Hello Ryan,

have you tried to use srun with the --exclusive option?
The man page reads:

'This  option  can also be used when initiating more than one job step 
within an existing resource allocation, where you want separate 
processors to be dedicated to each job step.'

Cheers,
John

On 16-12-15 15:43, Jonathan Perkins wrote:
> Hello Ryan:
>
> The CPU affinity feature of MVAPICH2 was designed with only a single 
> job running on each node.  This is a more common case in HPC than 
> allowing multiple jobs running on each node.  If you're trying to use 
> SLURM to manage multiple jobs on each node it may be useful to explore 
> cgroups as you've mentioned in your 4th question.
>
> Please note, for jobs using GPUs we recommend using the MVAPICH2-GDR 
> library as it uses many new advanced features for better performance 
> and scalability.
>
> You can find out more about it via:
> http://mvapich.cse.ohio-state.edu/overview/#mv2gdr
>
> You can download via:
> http://mvapich.cse.ohio-state.edu/downloads/#mv2gdr
>
> On Tue, Dec 15, 2015 at 1:27 PM Novosielski, Ryan 
> <novosirj at ca.rutgers.edu <mailto:novosirj at ca.rutgers.edu>> wrote:
>
>     Hi all,
>
>     I'm using MVAPICH2 with SLURM's PMI2 interface. I'm therefore not
>     using mpirun/mpiexec at all. A user of mine is running some GPU
>     jobs, which require very small numbers of CPU's. So he's
>     frequently not using the whole node, and frequently running more
>     than one job. MVAPICH2's affinity stubbornly forces the jobs to
>     bind to the same processors. The solution is to turn affinity off.
>
>     I have some questions about this:
>
>     1) Is there an imaginable scenario where, running with SLURM, I
>     could ever want this feature enabled? Should I somehow look at
>     disabling it system-wide or in the MVAPICH2 compile?
>     2) If MVAPICH2 can't tell that a processor is already being used
>     at 100%, how can this feature ever work correctly? Just curious of
>     the use case under a different setting. Is it not meant to
>     co-exist, two nodes on the same job?
>     3) I'd like this to be easy for the users. Should I just turn it
>     off in the module that is loaded for MVAPICH2 to prevent this from
>     being an issue?
>     4) Any thought to whether integrating cgroups to SLURM might solve
>     the problem (eg. SLURM won't even let MVAPICH2 see the other CPUs,
>     so affinity is a non-issue)?
>
>     I'd welcome any other advice other sites have about this.
>
>     --
>     ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
>      || \\UTGERS |---------------------*O*---------------------
>      ||_// Biomedical | Ryan Novosielski - Senior Technologist
>      || \\ and Health | novosirj at rutgers.edu
>     <mailto:novosirj at rutgers.edu> - 973/972.0922 (2x0922)
>      ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
>           `'
>
>     _______________________________________________
>     mvapich-discuss mailing list
>     mvapich-discuss at cse.ohio-state.edu
>     <mailto:mvapich-discuss at cse.ohio-state.edu>
>     http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss


-- 
SURFdrive: de persoonlijke cloudopslagdienst voor het Nederlandse hoger onderwijs en onderzoek.

| John Donners | Senior adviseur | Operations, Support & Development | SURFsara | Science Park 140 | 1098 XG Amsterdam | Nederland |
T (31)6 19039023 | john.donners at surfsara.nl | www.surfsara.nl |

Aanwezig op | ma | di | wo | do | vr

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151216/c6d2cf7a/attachment-0001.html>


More information about the mvapich-discuss mailing list