<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:PMingLiU;
panose-1:2 2 5 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Aptos;
panose-1:2 11 0 4 2 2 2 2 2 4;}
@font-face
{font-family:"\@PMingLiU";
panose-1:2 1 6 1 0 1 1 1 1 1;}
@font-face
{font-family:"Helvetica Neue";
panose-1:2 0 5 3 0 0 0 2 0 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:12.0pt;
font-family:"Aptos",sans-serif;}
p.p1, li.p1, div.p1
{mso-style-name:p1;
margin:0in;
font-size:10.0pt;
font-family:"Helvetica Neue";}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
mso-ligatures:none;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="EN-US" link="#467886" vlink="#96607D" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Hi Botao,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Thank you for bringing this issue to our attention. The current Allgather algorithm may exhaust the device memory.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">To address this, please try an alternative algorithm by setting
<b>MV2_INTER_ALLGATHER_TUNING=3</b>. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">I hope this resolves your concern, and don't hesitate to reach out if you have any further questions.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Thanks,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Chen-Chun<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<div id="mail-editor-reference-message-container">
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="color:black">From:
</span></b><span style="color:black">Mvapich-discuss <mvapich-discuss-bounces+chen.10252=osu.edu@lists.osu.edu> on behalf of Wu, Botao via Mvapich-discuss <mvapich-discuss@lists.osu.edu><br>
<b>Date: </b>Wednesday, January 24, 2024 at 2:41</span><span style="font-family:"Arial",sans-serif;color:black"> </span><span style="color:black">PM<br>
<b>To: </b>mvapich-discuss@lists.osu.edu <mvapich-discuss@lists.osu.edu><br>
<b>Subject: </b>[Mvapich-discuss] Memory Leak in CUDA-aware allgather MVAPICH-GDR (2.3.7)<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">Hi,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I'm working on a project which needs to use CUDA-aware MPI. I ran into a problem in that the allgather function in MVAPICH-GDR (2.3.7) used too much Vram.
</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I'm using the Pitzer cluster at the Ohio Supercomputer Center. I had 1 node with 2 GPUs (without NVlink). I had 2 ranks in total, 1 rank for each GPU. The program runs out of GPU memory very fast and crashes.
</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I've attached a small program (with code and script) that can reproduce the problem. The attached screenshot highlights the main body of the code.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I would be grateful for any information you can provide.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Thanks,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Botao</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Software version:</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="p1">module load intel/2021.3.0<o:p></o:p></p>
<p class="p1">module load cmake<o:p></o:p></p>
<p class="p1">module load python<o:p></o:p></p>
<p class="p1">module load mkl<o:p></o:p></p>
<p class="p1">module load mvapich2-gdr/2.3.7<o:p></o:p></p>
<p class="p1">module load cuda/11.6.1<o:p></o:p></p>
<p class="p1">export LD_PRELOAD=/apps/mvapich2-gdr/intel/2021.3/2.3.7/lib64/libmpi.so<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
</div>
</div>
</div>
</div>
</body>
</html>