<!-- BaNnErBlUrFlE-BoDy-start -->
<!-- Preheader Text : BEGIN -->
<div style="display:none !important;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;max-height:0px;max-width:0px;opacity:0;overflow:hidden;">
Hello, I am running OSU 5.9 with data validation and have noticed 2 issues: 1) Running with high ranks/node on "osu_multi_lat" will result in 'Out of Memory' failures:
</div>
<!-- Preheader Text : END -->
<!-- Email Banner : BEGIN -->
<div style="display:none !important;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;max-height:0px;max-width:0px;opacity:0;overflow:hidden;">ZjQcmQRYFpfptBannerStart</div>
<!--[if ((ie)|(mso))]>
<table border="0" cellspacing="0" cellpadding="0" width="100%" style="padding: 16px 0px 16px 0px; direction: ltr" lang="en"><tr><td>
<table border="0" cellspacing="0" cellpadding="0" style="padding: 0px 10px 5px 6px; width: 100%; border-radius:4px; border-top:4px solid #8c8e91;background-color:#CFD3D7;"><tr><td valign="top">
<table align="left" border="0" cellspacing="0" cellpadding="0" style="padding: 4px 8px 4px 8px">
<tr><td style="color:#000000; font-family: 'Arial', sans-serif; font-weight:bold; font-size:14px; direction: ltr">
This Message Is From an External Sender
</td></tr>
<tr><td style="color:#000000; font-weight:normal; font-family: 'Arial', sans-serif; font-size:12px; direction: ltr">
This message came from outside your organization.
</td></tr>
</table>
<![if ie]><br clear="all"><![endif]>
<table align="right" border="0" cellspacing="0" cellpadding="0" style="padding: 4px 0px 4px 0px"><tr>
<td style="direction: ltr"> <a target="_blank" href="https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!vwQd8gZND6YgRRdxf65kd2CWQBVMbV4DqdQBL5NPAlklHnGfup4baPPdu-dPmXcOvRX36MnOTKyx76M1X8OWbOWM2CN9uSjyxExNQDPi_lBBJt-bRqEeoOge-JZvCUeOL5guq_AGE3C9EWQ0XcN36w$" style="mso-padding-alt: 7.5px; padding: 7.5px; border-radius: 2px; border: 1.5px solid #666666; "><strong style="font-weight: normal; color: #000000; text-decoration: none; font-family: 'Arial', sans-serif; font-size:14px; line-height: 40px; "> Report Suspicious </strong></a> </td>
</tr></table>
</td></tr></table>
</td></tr></table>
<![endif]-->
<![if !((ie)|(mso))]>
<div dir="ltr" lang="en" id="pfptBannersglass6" style="all: revert !important; display:block !important; text-align: left !important; margin:16px 0px 16px 0px !important; padding:8px 16px 8px 16px !important; border-radius: 4px !important; min-width: 200px !important; background-color: #CFD3D7 !important; border-top: 4px solid #8c8e91 !important;">
<div id="pfptBannersglass6" style="all: unset !important; float:left !important; display:block !important; margin: 0px 0px 1px 0px !important; max-width: 600px !important;">
<div id="pfptBannersglass6" style="all: unset !important; display:block !important; visibility: visible !important; background-color: #CFD3D7 !important; color:#000000 !important; font-family: 'Arial', sans-serif !important; font-weight:bold !important; font-size:14px !important; line-height:18px !important;">
This Message Is From an External Sender
</div>
<div id="pfptBannersglass6" style="all: unset !important; display:block !important; visibility: visible !important; background-color: #CFD3D7 !important; color:#000000 !important; font-weight:normal !important; font-family: 'Arial', sans-serif !important; font-size:12px !important; line-height:18px !important; margin-top:2px !important;">
This message came from outside your organization.
</div>
</div>
<div id="pfptBannersglass6" style="all: unset !important; float: right !important; display: block !important; margin: 0px 0px 0px 16px !important; text-align: right !important; width: fit-content !important;">
<a id="pfptBannersglass6" href="https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!vwQd8gZND6YgRRdxf65kd2CWQBVMbV4DqdQBL5NPAlklHnGfup4baPPdu-dPmXcOvRX36MnOTKyx76M1X8OWbOWM2CN9uSjyxExNQDPi_lBBJt-bRqEeoOge-JZvCUeOL5guq_AGE3C9EWQ0XcN36w$" style="all: unset !important; display: inline-block !important;">
<div class="pfptPrimaryButtonsglass6" style="display: inline-block !important; visibility: visible !important; opacity: 1 !important; color: #000000 !important; font-family: 'Arial', sans-serif !important; font-size: 14px !important; font-weight: normal !important; text-decoration: none !important; border-radius: 2px !important; padding: 7.5px 16px !important; margin: 3px 0 3px 16px !important; white-space: nowrap !important; width: fit-content !important;
border: 1px solid #666666 !important;">
Report Suspicious
</div>
</a>
</div>
<div style="clear: both !important; display: block !important; visibility: hidden !important; line-height: 0 !important; font-size: 0.01px !important"> </div>
</div>
<![endif]>
<div style="display:none !important;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;max-height:0px;max-width:0px;opacity:0;overflow:hidden;">ZjQcmQRYFpfptBannerEnd</div>
<!-- Email Banner : END -->
<!-- BaNnErBlUrFlE-BoDy-end -->
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head><!-- BaNnErBlUrFlE-HeAdEr-start -->
<style>
#pfptBannersglass6 { all: revert !important; display: block !important;
visibility: visible !important; opacity: 1 !important;
background-color: #CFD3D7 !important;
max-width: none !important; max-height: none !important }
.pfptPrimaryButtonsglass6:hover, .pfptPrimaryButtonsglass6:focus {
background-color: #adb0b4 !important; }
.pfptPrimaryButtonsglass6:active {
background-color: #8c8e91 !important; }
</style>
<!-- BaNnErBlUrFlE-HeAdEr-end -->
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
{mso-style-priority:99;
mso-style-link:"Plain Text Char";
margin:0in;
font-size:11.0pt;
font-family:Consolas;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
span.PlainTextChar
{mso-style-name:"Plain Text Char";
mso-style-priority:99;
mso-style-link:"Plain Text";
font-family:Consolas;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoPlainText">Hello,<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">I am running OSU 5.9 with data validation and have noticed 2 issues:<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">1) Running with high ranks/node on "osu_multi_lat" will result in 'Out of Memory' failures:<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">Configuration: <o:p></o:p></p>
<p class="MsoPlainText"> 48 ranks/node * 4 nodes (192 ranks total)<o:p></o:p></p>
<p class="MsoPlainText"> Running over OMPI with OFI (psm3 provider).<o:p></o:p></p>
<p class="MsoPlainText"> Args: "-c"<o:p></o:p></p>
<p class="MsoPlainText"> Mem Size: 64GB/node<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">ERROR (Dmesg): <o:p></o:p></p>
<p class="MsoPlainText"> [107540.289787] Out of memory: Killed process 114599 (osu_multi_lat) total-vm:2278092kB, anon-rss:1636984kB, file-rss:0kB, shmem-rss:1644kB, UID:0 pgtables:4236kB oom_score_adj:0<o:p></o:p></p>
<p class="MsoPlainText"> [107540.456582] oom_reaper: reaped process 114599 (osu_multi_lat), now anon-rss:0kB, file-rss:0kB, shmem-rss:1644kB<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">This was easily repeatable, however, if I started at message size 524288 ("-m 524288:") I could get a bit past (2 more message sizes).<o:p></o:p></p>
<p class="MsoPlainText">I think there might be a memory leak with data validation.<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">Without data validation I do not use even half the total memory usage.<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">2) Running pt2pt on CUDA with args "H D" or "D H" will not work
<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">Configuration: <o:p></o:p></p>
<p class="MsoPlainText"> 1 ranks/node * 2 nodes (2 ranks total)<o:p></o:p></p>
<p class="MsoPlainText"> Running over CUDA enabled OMPI with OFI (psm3 provider).<o:p></o:p></p>
<p class="MsoPlainText"> Args: "<OSU> -c [DST] [SRC]"<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">ERROR: (osu_bibw -c D H)<o:p></o:p></p>
<p class="MsoPlainText"> # OSU MPI-CUDA Bi-Directional Bandwidth Test v5.9<o:p></o:p></p>
<p class="MsoPlainText"> # Send Buffer on DEVICE (D) and Receive Buffer on HOST (H)<o:p></o:p></p>
<p class="MsoPlainText"> # Size Bandwidth (MB/s) Validation<o:p></o:p></p>
<p class="MsoPlainText"> [../../util/osu_util_mpi.c:940] CUDA call 'cudaMemcpy((void *)s_buf, (void *)temp_s_buffer, size, cudaMemcpyHostToDevice)' failed with 1: invalid argument<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">Looks to be repeatable on all pt2pt benchmarks.<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">Quick look at code shows that we do not check what the src and dst buffers are before calling memcpy/cudaMemcpy.<o:p></o:p></p>
<p class="MsoPlainText">“Managed” buffers (MH and MD) are also not handled correctly and seem to report false errors on validation.
<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">Regards,<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">Adam Goldman<o:p></o:p></p>
<p class="MsoPlainText">Intel Corporation<o:p></o:p></p>
<p class="MsoPlainText"><a href="mailto:adam.goldman@intel.com">adam.goldman@intel.com</a><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>