<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:PMingLiU;
panose-1:2 2 5 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Aptos;
panose-1:2 11 0 4 2 2 2 2 2 4;}
@font-face
{font-family:"\@PMingLiU";
panose-1:2 1 6 1 0 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:12.0pt;
font-family:"Aptos",sans-serif;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
font-size:12.0pt;
font-family:"Aptos",sans-serif;}
p.xmsonormal, li.xmsonormal, div.xmsonormal
{mso-style-name:x_msonormal;
margin:0in;
font-size:12.0pt;
font-family:"Aptos",sans-serif;}
span.EmailStyle21
{mso-style-type:personal-reply;
font-family:"Aptos",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
mso-ligatures:none;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:466360672;
mso-list-type:hybrid;
mso-list-template-ids:-1924241466 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l1
{mso-list-id:1107508519;
mso-list-type:hybrid;
mso-list-template-ids:1738838076 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l1:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l1:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l1:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#467886" vlink="#96607D" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Hi Nat,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Thank you for the suggestion. I have a few questions:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<ol style="margin-top:0in" start="1" type="1">
<li class="MsoListParagraph" style="margin-left:0in;mso-list:l1 level1 lfo2"><span style="font-size:11.0pt">Does this message indicate that oversubscription is occurring, or is it simply a warning that appears every time a full-node job is run? In one user’s
case, I did not observe any oversubscription, although the warning was present.<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:0in;mso-list:l1 level1 lfo2"><span style="font-size:11.0pt">UCX is also the default for OpenMPI, but I did not see a similar warning when running a full-node job with OpenMPI. Why does this only happen with MVAPICH?<o:p></o:p></span></li></ol>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Thank you,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">ZQ<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<div id="mail-editor-reference-message-container">
<div>
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="color:black">From:
</span></b><span style="color:black">Shineman, Nat <shineman.5@osu.edu><br>
<b>Date: </b>Wednesday, October 16, 2024 at 2:16</span><span style="font-family:"Arial",sans-serif;color:black"> </span><span style="color:black">PM<br>
<b>To: </b>mvapich-discuss@lists.osu.edu <mvapich-discuss@lists.osu.edu>, You, Zhi-Qiang <zyou@osc.edu><br>
<b>Subject: </b>Re: Handling MVAPICH 3.0 full subscription warning<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="color:black">Hi ZQ,<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="color:black">You are probably seeing degraded performance because you are still running the application at full subscription and requesting that MVAPICH reserve 2 cores per process. The warning should probably more accurately
state that you should cap your runs at 1/2 subscription and set the listed environment variable. This would prevent you from oversubscribing cores. <o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="color:black">However, if you are seeing satisfactory performance with oversubscribed cores in full subscription, please feel free to ignore the warning.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="color:black">Thanks,<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="color:black">Nat<o:p></o:p></span></p>
</div>
<div class="MsoNormal" align="center" style="text-align:center">
<hr size="2" width="98%" align="center">
</div>
<div id="divRplyFwdMsg">
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"> Mvapich-discuss <mvapich-discuss-bounces@lists.osu.edu> on behalf
of You, Zhi-Qiang via Mvapich-discuss <mvapich-discuss@lists.osu.edu><br>
<b>Sent:</b> Wednesday, October 16, 2024 11:50<br>
<b>To:</b> mvapich-discuss@lists.osu.edu <mvapich-discuss@lists.osu.edu><br>
<b>Subject:</b> [Mvapich-discuss] Handling MVAPICH 3.0 full subscription warning</span>
<o:p></o:p></p>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="xmsonormal"><span style="font-size:11.0pt">Hi,</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt">We encountered the following warning message while running a full-node MPI job with MVAPICH 3.0:</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt"><br>
[][mvp_generate_implicit_cpu_mapping] WARNING: You appear to be running at full subscription for this job. UCX spawns an additional thread for each process which may result in oversubscribed cores and poor performance. Please consider reserving at least 2 cores
per node for the additional threads, enabling SMT, or setting MVP_THREADS_PER_PROCESS=2 to ensure that sufficient resources are available.<br>
<br>
The suggestion to set MVP_THREADS_PER_PROCESS=2 not only fails to improve performance but actually degrades it. Can this warning message be safely ignored, or is there any action I need to take to address it?</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt">Best,</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt">ZQ</span><o:p></o:p></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>