<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Aptos;
panose-1:2 11 0 4 2 2 2 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:10.0pt;
font-family:"Aptos",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#467886;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
font-size:10.0pt;
font-family:"Aptos",sans-serif;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
mso-ligatures:none;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:504051707;
mso-list-template-ids:-1572945784;}
@list l0:level1
{mso-level-tab-stop:.5in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:1.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level3
{mso-level-tab-stop:1.5in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level4
{mso-level-tab-stop:2.0in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level5
{mso-level-tab-stop:2.5in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level6
{mso-level-tab-stop:3.0in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level7
{mso-level-tab-stop:3.5in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level8
{mso-level-tab-stop:4.0in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level9
{mso-level-tab-stop:4.5in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1
{mso-list-id:1213693878;
mso-list-template-ids:2095213454;}
@list l1:level1
{mso-level-start-at:3;
mso-level-tab-stop:.5in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level2
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:1.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l2
{mso-list-id:1354112561;
mso-list-template-ids:-270910488;}
@list l3
{mso-list-id:1813407129;
mso-list-type:hybrid;
mso-list-template-ids:791025670 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l3:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l3:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l3:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l3:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l3:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l3:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l3:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l3:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l3:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l4
{mso-list-id:2131391829;
mso-list-template-ids:414604318;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style>
</head>
<body lang="EN-US" link="#467886" vlink="#96607D" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Hi ZQ, <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">By default we build with the version of slurm included with the OS package manager (slurm 22 for rhel9). It looks like Cardinal uses slurm 24, so this may be causing some incompatibilities. Can you try out
the RPM below to see if that’s the resolution? We’re also looking into this on our end.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><a href="https://mvapich.cse.ohio-state.edu/download/mvapich/plus/4.0/cuda/UCX/mofed24.10/mvapich-plus-4.0-cuda12.4.rhel9.ofed24.10.ucx.gcc13.2.0.slurm24-4.0-1.x86_64.rpm">https://mvapich.cse.ohio-state.edu/download/mvapich/plus/4.0/cuda/UCX/mofed24.10/mvapich-plus-4.0-cuda12.4.rhel9.ofed24.10.ucx.gcc13.2.0.slurm24-4.0-1.x86_64.rpm</a><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">It looks like my osc account has been disabled, to help with this troubleshooting, who can I reach out to for reactivation (I assume this is all on Cardinal)? Username is rmotlagh.<br>
<br>
Regarding your questions:<o:p></o:p></span></p>
<ol style="margin-top:0in" start="1" type="1">
<li class="MsoListParagraph" style="margin-left:0in;mso-list:l3 level1 lfo3"><span style="font-size:11.0pt">Yes, we are hoping to have MVAPICH 4.0 released within the month.
<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:0in;mso-list:l3 level1 lfo3"><span style="font-size:11.0pt">We have unified redundant envs (like having separate envs for HIP and CUDA) and made naming conventions more consistent for our CVARs. So yes, replace
that with MVP_ENABLE_GPU<o:p></o:p></span></li><li class="MsoListParagraph" style="margin-left:0in;mso-list:l3 level1 lfo3"><span style="font-size:11.0pt">Some of these are done in the netmod layer now. You can set IB devices with “UCX_NET_DEVICES=mlx5_0:1” and “UCX_SOCKADDR_TLS_PRIORITY=rdmacm” (rdmacm
may require a new rpm with --with-rdmacm ucx configure flag, I will update the website rpms to allow for this if it passes our testing). MVP_HOMOGENEOUS_CLUSTER’s equivalent is irrelevant now, performance is good regardless of this flag.
<o:p></o:p></span></li></ol>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Best,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Reyhan<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<div id="mail-editor-reference-message-container">
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="font-size:12.0pt;color:black">From:
</span></b><span style="font-size:12.0pt;color:black">Mvapich-discuss <mvapich-discuss-bounces@lists.osu.edu> on behalf of You, Zhi-Qiang via Mvapich-discuss <mvapich-discuss@lists.osu.edu><br>
<b>Date: </b>Saturday, January 11, 2025 at 9:32</span><span style="font-size:12.0pt;font-family:"Arial",sans-serif;color:black"> </span><span style="font-size:12.0pt;color:black">PM<br>
<b>To: </b>Panda, Dhabaleswar <panda@cse.ohio-state.edu>, Announcement about MVAPICH (MPI over InfiniBand, RoCE, Omni-Path, Slingshot, iWARP and EFA) Libraries developed at NBCL/OSU <mvapich-discuss@lists.osu.edu><br>
<b>Subject: </b>Re: [Mvapich-discuss] Failed to unpack MVAPICH-Plus RPM<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">Hi DK,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Thank you for the prompt fix. The RPM is now functioning correctly. However, I encountered the following error while running a simple ping-pong MPI test over two nodes:</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><br>
slurmstepd: error: pmijobid missing in fullinit command</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><br>
I suspected this might be due to PMI incompatibility. I referred to </span><a href="https://urldefense.com/v3/__https:/mvapich-docs.readthedocs.io/en/latest/cvar.html*mvapich-environment-variables__;Iw!!KGKeukY!3fo-CIZdjSLr3Qr4T-N801LdCwjo-3DZiuA5KjZOvLaCn4id5M3xni5dWZHrZEnZrHIvm_FdrzIPC23DUe4941agsuMkFyC1$"><span style="font-size:11.0pt">this
documentation</span></a><span style="font-size:11.0pt"> and learned about setting MVP_PMI_VERSION to 2 to align with our SLURM configuration. However, the issue persists. I also checked the output of mpichversion -a and confirmed that the --with-pmi=pmi2 option
is enabled, leading me to conclude that this is not a PMI compatibility issue.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Additionally, I have a few related questions:</span><o:p></o:p></p>
<ol style="margin-top:0in" start="1" type="1">
<li class="MsoListParagraph" style="margin-left:0in;mso-list:l0 level1 lfo6"><span style="font-size:11.0pt">Will there be an MVAPICH 4.0 release, or will it be replaced by the MVAPICH-Plus CPU-only version?</span></li><li class="MsoListParagraph" style="margin-left:0in;mso-list:l0 level1 lfo6"><span style="font-size:11.0pt">The documentation linked above lists many environment variables that I haven’t encountered before when using MVAPICH2-GDR. Are these new variables specific
to MVAPICH 4.0? Are variables like MV2_USE_CUDA/MVP_USE_CUDA still available, or should they be replaced with MVP_ENABLE_GPU?</span></li><li class="MsoListParagraph" style="margin-left:0in;mso-list:l0 level1 lfo6"><span style="font-size:11.0pt">Could you help confirm if the following variables are still supported in MVAPICH?</span></li></ol>
<ol style="margin-top:0in" start="3" type="1">
<ul style="margin-top:0in" type="disc">
<li class="MsoListParagraph" style="margin-left:0in;mso-list:l0 level2 lfo6"><span style="font-size:11.0pt">MVP_USE_RDMA_CM</span></li><li class="MsoListParagraph" style="margin-left:0in;mso-list:l0 level2 lfo6"><span style="font-size:11.0pt">MVP_HOMOGENEOUS_CLUSTER</span></li><li class="MsoListParagraph" style="margin-left:0in;mso-list:l0 level2 lfo6"><span style="font-size:11.0pt">MVP_IBA_HCA</span></li></ul>
</ol>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Thank you for your time and assistance!</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Best regards,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">ZQ</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<div id="mail-editor-reference-message-container">
<div>
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="font-size:12.0pt;color:black">From:
</span></b><span style="font-size:12.0pt;color:black">Panda, Dhabaleswar <panda@cse.ohio-state.edu><br>
<b>Date: </b>Saturday, January 11, 2025 at 3:14</span><span style="font-size:12.0pt;font-family:"Arial",sans-serif;color:black"> </span><span style="font-size:12.0pt;color:black">AM<br>
<b>To: </b>You, Zhi-Qiang <zyou@osc.edu>, Announcement about MVAPICH (MPI over InfiniBand, RoCE, Omni-Path, Slingshot, iWARP and EFA) Libraries developed at NBCL/OSU <mvapich-discuss@lists.osu.edu><br>
<b>Subject: </b>RE: Failed to unpack MVAPICH-Plus RPM</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Hi ZQ,
</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">As we have communicated with you separately, a new RPM has been uploaded. Please try this version and let us know whether you see any additional issues.
</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">DK</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Mvapich-discuss <mvapich-discuss-bounces@lists.osu.edu>
<b>On Behalf Of </b>You, Zhi-Qiang via Mvapich-discuss<br>
<b>Sent:</b> Thursday, January 2, 2025 1:54 PM<br>
<b>To:</b> mvapich-discuss@lists.osu.edu<br>
<b>Subject:</b> [Mvapich-discuss] Failed to unpack MVAPICH-Plus RPM</span><o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><span style="font-size:12.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Hello,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I downloaded the MVAPICH-Plus 4.0 RPM from the following link:</span><o:p></o:p></p>
<p class="MsoNormal"><a href="https://mvapich.cse.ohio-state.edu/download/mvapich/plus/4.0/cuda/UCX/mofed5.0/mvapich-plus-4.0-cuda12.4.rhel9.ofed24.10.ucx.gcc13.2.0.slurm-4.0-1.x86_64.rpm"><span style="font-size:11.0pt">https://mvapich.cse.ohio-state.edu/download/mvapich/plus/4.0/cuda/UCX/mofed5.0/mvapich-plus-4.0-cuda12.4.rhel9.ofed24.10.ucx.gcc13.2.0.slurm-4.0-1.x86_64.rpm</span></a><span style="font-size:11.0pt">,
but I encountered an issue when trying to unpack it using cpio. The process failed with the error:</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><br>
cpio: premature end of file</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I have no issues unpacking other RPMs, so it seems this file might be corrupted. Could you please check and confirm?</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Thank you,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">ZQ</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>