[mvapich-discuss] (no subject)

Jonathan Perkins perkinjo at cse.ohio-state.edu
Wed Dec 16 10:23:21 EST 2015


You can setup a configuration file.  The settings placed here can be
overridden by the users.

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2b-userguide.html#x1-500006.3

The default system config file is located at /etc/mvapich2.conf.  This can
be overridden when building MVAPICH2 but there is not a configure option of
this at the time and you'll have to define a CFLAG.

Example:
./configure --prefix=/some/install/prefix
CFLAGS='-DMV2_SYSTEM_CONFIG="/some/nfs/share/mvapich2.conf"' ...

On Wed, Dec 16, 2015 at 10:13 AM Novosielski, Ryan <novosirj at ca.rutgers.edu>
wrote:

> Is there a recommended way to do this by default, allowing users to turn
> it back on if necessary?
>
> Thanks again.
>
> ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
> || \\UTGERS      |---------------------*O*---------------------
> ||_// Biomedical | Ryan Novosielski - Senior Technologist
> || \\ and Health | novosirj at rutgers.edu- 973/972.0922 (2x0922)
> ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
>     `'
>
> On Dec 16, 2015, at 10:02, Jonathan Perkins <perkinjo at cse.ohio-state.edu>
> wrote:
>
> I believe that with a default installation of SLURM affinity is not
> handled by SLURM.  If additional plugins are enabled then it may be better
> to just go with that and disable MVAPICH2's affinity
> (MV2_ENABLE_AFFINITY=0).
>
> Although the GDR build will work with host only applications, I would
> suggest using two installations where the GDR version is used by
> applications which take advantage of GPU transfers.
>
> On Wed, Dec 16, 2015 at 9:48 AM Novosielski, Ryan <novosirj at ca.rutgers.edu>
> wrote:
>
>> Thanks Jonathan. This is mostly the case with us too, but I think
>> affinity is also managed by SLURM even in those cases. Unless there is a
>> reason MVAPICH2 would do a better job?
>>
>> Thanks for the information on MVAPICH2-GDR. Do I need a second copy of
>> MVAPICH2 for that, or is it a superset of the regular MVAPICH2's features?
>>
>> ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
>> || \\UTGERS      |---------------------*O*---------------------
>> ||_// Biomedical | Ryan Novosielski - Senior Technologist
>> || \\ and Health | novosirj at rutgers.edu- 973/972.0922 (2x0922)
>>
>> ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
>>     `'
>>
>> On Dec 16, 2015, at 09:43, Jonathan Perkins <perkinjo at cse.ohio-state.edu>
>> wrote:
>>
>> Hello Ryan:
>>
>> The CPU affinity feature of MVAPICH2 was designed with only a single job
>> running on each node.  This is a more common case in HPC than allowing
>> multiple jobs running on each node.  If you're trying to use SLURM to
>> manage multiple jobs on each node it may be useful to explore cgroups as
>> you've mentioned in your 4th question.
>>
>> Please note, for jobs using GPUs we recommend using the MVAPICH2-GDR
>> library as it uses many new advanced features for better performance and
>> scalability.
>>
>> You can find out more about it via:
>> http://mvapich.cse.ohio-state.edu/overview/#mv2gdr
>>
>> You can download via:
>> http://mvapich.cse.ohio-state.edu/downloads/#mv2gdr
>>
>> On Tue, Dec 15, 2015 at 1:27 PM Novosielski, Ryan <
>> novosirj at ca.rutgers.edu> wrote:
>>
>>> Hi all,
>>>
>>> I'm using MVAPICH2 with SLURM's PMI2 interface. I'm therefore not using
>>> mpirun/mpiexec at all. A user of mine is running some GPU jobs, which
>>> require very small numbers of CPU's. So he's frequently not using the whole
>>> node, and frequently running more than one job. MVAPICH2's affinity
>>> stubbornly forces the jobs to bind to the same processors. The solution is
>>> to turn affinity off.
>>>
>>> I have some questions about this:
>>>
>>> 1) Is there an imaginable scenario where, running with SLURM, I could
>>> ever want this feature enabled? Should I somehow look at disabling it
>>> system-wide or in the MVAPICH2 compile?
>>> 2) If MVAPICH2 can't tell that a processor is already being used at
>>> 100%, how can this feature ever work correctly? Just curious of the use
>>> case under a different setting. Is it not meant to co-exist, two nodes on
>>> the same job?
>>> 3) I'd like this to be easy for the users. Should I just turn it off in
>>> the module that is loaded for MVAPICH2 to prevent this from being an issue?
>>> 4) Any thought to whether integrating cgroups to SLURM might solve the
>>> problem (eg. SLURM won't even let MVAPICH2 see the other CPUs, so affinity
>>> is a non-issue)?
>>>
>>> I'd welcome any other advice other sites have about this.
>>>
>>> --
>>> ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
>>>  || \\UTGERS      |---------------------*O*---------------------
>>>  ||_// Biomedical | Ryan Novosielski - Senior Technologist
>>>  || \\ and Health | novosirj at rutgers.edu - 973/972.0922 (2x0922)
>>>  ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
>>>       `'
>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151216/dada5bf2/attachment.html>


More information about the mvapich-discuss mailing list