[mvapich-discuss] (no subject)
Novosielski, Ryan
novosirj at ca.rutgers.edu
Wed Dec 16 10:11:44 EST 2015
Indeed, and this is what the users do when they want that. But some of them don't. Again, the most common example is when someone wants to use GPUs. It is common that you can't use more than one or two of them effectively with some software, making a pretty common to run more than one job. This doesn't have a negative impact on performance with affinity working properly.
____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>- 973/972.0922 (2x0922)
|| \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
`'
On Dec 16, 2015, at 09:50, John Donners <john.donners at surfsara.nl<mailto:john.donners at surfsara.nl>> wrote:
Hello Ryan,
have you tried to use srun with the --exclusive option?
The man page reads:
'This option can also be used when initiating more than one job step
within an existing resource allocation, where you want separate
processors to be dedicated to each job step.'
Cheers,
John
On 16-12-15 15:43, Jonathan Perkins wrote:
Hello Ryan:
The CPU affinity feature of MVAPICH2 was designed with only a single
job running on each node. This is a more common case in HPC than
allowing multiple jobs running on each node. If you're trying to use
SLURM to manage multiple jobs on each node it may be useful to explore
cgroups as you've mentioned in your 4th question.
Please note, for jobs using GPUs we recommend using the MVAPICH2-GDR
library as it uses many new advanced features for better performance
and scalability.
You can find out more about it via:
http://mvapich.cse.ohio-state.edu/overview/#mv2gdr
You can download via:
http://mvapich.cse.ohio-state.edu/downloads/#mv2gdr
On Tue, Dec 15, 2015 at 1:27 PM Novosielski, Ryan
<novosirj at ca.rutgers.edu<mailto:novosirj at ca.rutgers.edu> <mailto:novosirj at ca.rutgers.edu>> wrote:
Hi all,
I'm using MVAPICH2 with SLURM's PMI2 interface. I'm therefore not
using mpirun/mpiexec at all. A user of mine is running some GPU
jobs, which require very small numbers of CPU's. So he's
frequently not using the whole node, and frequently running more
than one job. MVAPICH2's affinity stubbornly forces the jobs to
bind to the same processors. The solution is to turn affinity off.
I have some questions about this:
1) Is there an imaginable scenario where, running with SLURM, I
could ever want this feature enabled? Should I somehow look at
disabling it system-wide or in the MVAPICH2 compile?
2) If MVAPICH2 can't tell that a processor is already being used
at 100%, how can this feature ever work correctly? Just curious of
the use case under a different setting. Is it not meant to
co-exist, two nodes on the same job?
3) I'd like this to be easy for the users. Should I just turn it
off in the module that is loaded for MVAPICH2 to prevent this from
being an issue?
4) Any thought to whether integrating cgroups to SLURM might solve
the problem (eg. SLURM won't even let MVAPICH2 see the other CPUs,
so affinity is a non-issue)?
I'd welcome any other advice other sites have about this.
--
____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
<mailto:novosirj at rutgers.edu> - 973/972.0922 (2x0922)
|| \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
`'
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
--
SURFdrive: de persoonlijke cloudopslagdienst voor het Nederlandse hoger onderwijs en onderzoek.
| John Donners | Senior adviseur | Operations, Support & Development | SURFsara | Science Park 140 | 1098 XG Amsterdam | Nederland |
T (31)6 19039023 | john.donners at surfsara.nl<mailto:john.donners at surfsara.nl> | www.surfsara.nl<http://www.surfsara.nl> |
Aanwezig op | ma | di | wo | do | vr
--------------080208040801090201000502
Content-Type: text/html; charset="windows-1252"
Content-Transfer-Encoding: quoted-printable
<html>
<head>
<meta content=3D"text/html; charset=3Dwindows-1252"
http-equiv=3D"Content-Type">
</head>
<body bgcolor=3D"#FFFFFF" text=3D"#000000">
<div class=3D"moz-cite-prefix">Hello Ryan,<br>
<br>
have you tried to use srun with the --exclusive option?<br>
The man page reads:<br>
<br>
'This=A0 option=A0 can also be used when initiating more than one j=
ob
step within an existing resource allocation, where you want
separate processors to be dedicated to each job step.'<br>
<br>
Cheers,<br>
John<br>
<br>
On 16-12-15 15:43, Jonathan Perkins wrote:<br>
</div>
<blockquote
cite=3D"mid:CAJdHTTZ8+v2HnOqF1DvNmgorxhKUsjDkfYhXBnMdcS7wAF-aug at mail.gmai<mailto:v2HnOqF1DvNmgorxhKUsjDkfYhXBnMdcS7wAF-aug at mail.gmai>=
l.com<http://l.com>"
type=3D"cite">
<div dir=3D"ltr"><span style=3D"font-size:13px;line-height:19.5px">=
Hello
Ryan:</span>
<div style=3D"font-size:13px;line-height:19.5px"><br>
</div>
<div style=3D"font-size:13px;line-height:19.5px">The CPU affinity
feature of MVAPICH2 was designed with only a single job
running on each node.=A0 This is a more common case in HPC than
allowing multiple jobs running on each node.=A0 If you're tryin=
g
to use SLURM to manage multiple jobs on each node it may be
useful to explore cgroups as you've mentioned in your 4th
question.</div>
<div style=3D"font-size:13px;line-height:19.5px"><br>
</div>
<div style=3D"font-size:13px;line-height:19.5px">Please note, for
jobs using GPUs we recommend using the MVAPICH2-GDR library as
it uses many new advanced features for better performance and
scalability.=A0</div>
<div><br>
</div>
You can find out more about it via:
<div><a moz-do-not-send=3D"true"
href=3D"http://mvapich.cse.ohio-state.edu/overview/#mv2gdr">h=
ttp://mvapich.cse.ohio-state.edu/overview/#mv2gdr</a></div>
<div><br>
</div>
<div>You can download via:</div>
<div><a moz-do-not-send=3D"true"
href=3D"http://mvapich.cse.ohio-state.edu/downloads/#mv2gdr">=
http://mvapich.cse.ohio-state.edu/downloads/#mv2gdr</a><br>
</div>
<div><br>
</div>
<div>
<div class=3D"gmail_quote">
<div dir=3D"ltr">On Tue, Dec 15, 2015 at 1:27 PM Novosielski,
Ryan <<a moz-do-not-send=3D"true"
href=3D"mailto:novosirj at ca.rutgers.edu">novosirj at ca.rutge<mailto:novosirj at ca.rutge>=
rs.edu<http://rs.edu></a>>
wrote:<br>
</div>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Hi all,<b=
r>
<br>
I'm using MVAPICH2 with SLURM's PMI2 interface. I'm
therefore not using mpirun/mpiexec at all. A user of mine
is running some GPU jobs, which require very small numbers
of CPU's. So he's frequently not using the whole node, and
frequently running more than one job. MVAPICH2's affinity
stubbornly forces the jobs to bind to the same processors.
The solution is to turn affinity off.<br>
<br>
I have some questions about this:<br>
<br>
1) Is there an imaginable scenario where, running with
SLURM, I could ever want this feature enabled? Should I
somehow look at disabling it system-wide or in the
MVAPICH2 compile?<br>
2) If MVAPICH2 can't tell that a processor is already
being used at 100%, how can this feature ever work
correctly? Just curious of the use case under a different
setting. Is it not meant to co-exist, two nodes on the
same job?<br>
3) I'd like this to be easy for the users. Should I just
turn it off in the module that is loaded for MVAPICH2 to
prevent this from being an issue?<br>
4) Any thought to whether integrating cgroups to SLURM
might solve the problem (eg. SLURM won't even let MVAPICH2
see the other CPUs, so affinity is a non-issue)?<br>
<br>
I'd welcome any other advice other sites have about this.<b=
r>
<br>
--<br>
____ *Note: UMDNJ is now Rutgers-Biomedical and Health
Sciences*<br>
=A0|| \\UTGERS=A0 =A0 =A0
|---------------------*O*---------------------<br>
=A0||_// Biomedical | Ryan Novosielski - Senior Technologis=
t<br>
=A0|| \\ and Health | <a moz-do-not-send=3D"true"
href=3D"mailto:novosirj at rutgers.edu" target=3D"_blank">no=
vosirj at rutgers.edu<mailto:vosirj at rutgers.edu></a>
- 973/972.0922 (2x0922)<br>
=A0||=A0 \\=A0 Sciences | OIRT/High Perf & Res Comp - M=
SB
C630, Newark<br>
=A0 =A0 =A0 `'<br>
<br>
_______________________________________________<br>
mvapich-discuss mailing list<br>
<a moz-do-not-send=3D"true"
href=3D"mailto:mvapich-discuss at cse.ohio-state.edu"
target=3D"_blank">mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu></a><=
br>
<a moz-do-not-send=3D"true"
href=3D"http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discus=
s"
rel=3D"noreferrer" target=3D"_blank">http://mailman.cse.o=
hio-state.edu/mailman/listinfo/mvapich-discuss<http://hio-state.edu/mailman/listinfo/mvapich-discuss></a><br>
</blockquote>
</div>
</div>
</div>
<br>
<fieldset class=3D"mimeAttachmentHeader"></fieldset>
<br>
<pre wrap=3D"">_______________________________________________
mvapich-discuss mailing list
<a class=3D"moz-txt-link-abbreviated" href=3D"mailto:mvapich-discuss at cse.=
ohio-state.edu<http://ohio-state.edu>">mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu></a>
<a class=3D"moz-txt-link-freetext" href=3D"http://mailman.cse.ohio-state.=
edu/mailman/listinfo/mvapich-discuss">http://mailman.cse.ohio-state.edu/m=
ailman/listinfo/mvapich-discuss</a>
</pre>
</blockquote>
<br>
<br>
<pre class=3D"moz-signature" cols=3D"72">--=20
SURFdrive: de persoonlijke cloudopslagdienst voor het Nederlandse hoger o=
nderwijs en onderzoek.
| John Donners | Senior adviseur | Operations, Support & Development =
| SURFsara | Science Park 140 | 1098 XG Amsterdam | Nederland |
T (31)6 19039023 | <a class=3D"moz-txt-link-abbreviated" href=3D"mailto:j=
ohn.donners at surfsara.nl<mailto:ohn.donners at surfsara.nl>">john.donners at surfsara.nl<mailto:john.donners at surfsara.nl></a> | <a class=3D"moz-t=
xt-link-abbreviated" href=3D"http://www.surfsara.nl">www.surfsara.nl<http://www.surfsara.nl></a> =
|
Aanwezig op | ma | di | wo | do | vr
</pre>
</body>
</html>
--------------080208040801090201000502--
--===============7971192428631966613==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
--===============7971192428631966613==--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151216/111effb8/attachment-0001.html>
More information about the mvapich-discuss
mailing list