<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ks_c_5601-1987">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="caret-color: rgb(32, 31, 30); color: rgb(32, 31, 30); font-family: Arial, Helvetica, sans-serif; font-size: 11pt; background-color: rgb(255, 255, 255); display: inline !important;">Hi </span><span style="caret-color:rgb(32, 31, 30);color:rgb(32, 31, 30);font-family:"Malgun Gothic", sans-serif;font-size:13.333333015441895px;background-color:rgb(255, 255, 255);display:inline !important"><span style="caret-color: rgb(32, 31, 30); background-color: rgb(255, 255, 255); display: inline !important; font-size: 11pt; font-family: Arial, Helvetica, sans-serif;">Byungkwon,</span></span><br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="caret-color:rgb(32, 31, 30);color:rgb(32, 31, 30);font-family:"Malgun Gothic", sans-serif;font-size:13.333333015441895px;background-color:rgb(255, 255, 255);display:inline !important"><span style="caret-color:rgb(32, 31, 30);background-color:rgb(255, 255, 255);display:inline !important"><br>
</span></span></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="caret-color:rgb(32, 31, 30);color:rgb(32, 31, 30);font-family:"Malgun Gothic", sans-serif;font-size:13.333333015441895px;background-color:rgb(255, 255, 255);display:inline !important"><span style="caret-color: rgb(32, 31, 30); background-color: rgb(255, 255, 255); display: inline !important; font-size: 11pt; font-family: Arial, Helvetica, sans-serif;">For
this question, it may help to reach out to NVIDIA/HPC-X or the developers for the MPI library you are utilizing. The performance and behavior here can also rely on the underlying protocols in the MPI library being utilized for the data transfer. </span></span></div>
<div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div id="Signature">
<div>
<div id="divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Helvetica,sans-serif">
<span style="font-size: 11pt; font-family: Arial, Helvetica, sans-serif;"></span>
<p style="margin-top:0; margin-bottom:0"><span style="font-size: 11pt; font-family: Arial, Helvetica, sans-serif;">Thank you,</span></p>
<p style="margin-top:0; margin-bottom:0"><br>
<span style="font-size: 11pt; font-family: Arial, Helvetica, sans-serif;"></span></p>
<p style="margin-top:0; margin-bottom:0"><span style="font-size: 11pt; font-family: Arial, Helvetica, sans-serif;">Kawthar Shafie Khorassani</span></p>
<p style="margin-top:0; margin-bottom:0"><br>
</p>
<p style="margin-top:0; margin-bottom:0"></p>
</div>
</div>
</div>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Mvapich-discuss <mvapich-discuss-bounces+shafiekhorassani.1=buckeyemail.osu.edu@lists.osu.edu> on behalf of ÃÖº´±Ç via Mvapich-discuss <mvapich-discuss@lists.osu.edu><br>
<b>Sent:</b> Thursday, April 21, 2022 2:44 AM<br>
<b>To:</b> mvapich-discuss@lists.osu.edu <mvapich-discuss@lists.osu.edu><br>
<b>Cc:</b> À¯ÁØ»ó <js.louis.you@samsung.com>; Á¶»ó¿í <swkhan.cho@samsung.com><br>
<b>Subject:</b> [Mvapich-discuss] Question about osu_bw Benchmark Results</font>
<div> </div>
</div>
<style>
<!--
.x_pfptBannerTableMSO
{padding:0px 12px 5px 12px;
width:100%;
border-radius:4px;
border-top:4px solid #8c8e91;
background-color:#CFD3D7}
.x_pfptTitleMSO
{color:#000000!important;
font-family:'Arial',sans-serif!important;
font-weight:bold!important;
font-size:14px!important}
.x_pfptSubtitleMSO
{font-size:12px!important;
font-family:'Arial',sans-serif!important}
.x_pfptButtonMSO
{padding:7.5px;
text-decoration:none;
font-family:'Arial',sans-serif!important;
font-size:14px;
line-height:40px;
border-radius:2px}
.x_pfptPrimaryButtonMSO
{border:1.5px solid #666666;
color:#000000}
.x_pfptBanner
{margin:15px 14px 30px 14px;
padding:8px 16px 8px 16px;
border-radius:4px;
min-width:200px;
background-color:#CFD3D7;
border-top:4px solid #8c8e91}
.x_pfptBannerTitle
{color:#000000;
font-family:'Arial',sans-serif;
font-size:14px;
font-weight:bold;
line-height:18px;
display:block}
.x_pfptBannerSubtitle
{color:#000000;
font-weight:normal;
font-family:'Arial',sans-serif;
font-size:12px;
line-height:18px;
margin-top:2px;
display:block}
.x_pfptButton
{display:inline-block;
font-family:'Arial',sans-serif;
font-size:14px;
font-weight:normal;
border-radius:2px;
padding:7.5px 16px;
margin:3px 0 3px 16px;
white-space:nowrap;
width:fit-content}
.x_pfptPrimaryButton
{border:1px solid #666666}
.x_pfptMessageContainer
{display:inline-block;
margin:0px 0px 1px 0px;
max-width:600px}
.x_pfptButtonGroup
{float:right;
margin:0px 0px 0px 16px;
text-align:right;
width:fit-content}
.x_pfptPreheader
{display:none!important;
visibility:hidden;
font-size:1px;
line-height:1px;
max-height:0px;
max-width:0px;
opacity:0;
overflow:hidden}
-->
</style><style class="x_cui-content-default">
<!--
div
{display:block;
margin:10px}
div ol, div ul
{margin:0;
padding-left:40px}
div p, div li
{line-height:1.9;
margin:0 auto}
table.x_cui-div
{width:100%;
display:block}
table.x_cui-div > tbody
{display:block}
table.x_cui-div > tbody > tr
{display:block}
table.x_cui-div > tbody > tr > td, table.x_cui-div > tbody > tr > th
{display:block}
table.x_cui-pasted-table th, table.x_cui-pasted-table td, table.x_cui-pasted-table p, table.x_cui-pasted-table h1, table.x_cui-pasted-table h2, table.x_cui-pasted-table h3, table.x_cui-pasted-table h4, table.x_cui-pasted-table h5, table.x_cui-pasted-table h6, table.x_cui-pasted-table li
{line-height:normal}
img[data-cui-alt-image], div[data-cui-alt-image]
{background:url("data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABYAAAAUCAYAAACJfM0wAAABH0lEQVQ4jbXU26qEIBQGYF+wtxLMgg4QdGdBREUhFEFBBx/tn6sGpcPezTTCAi/0Q11rSQghhDGGJ4P8An3j20QIgSzLvookSfawUgrrukIp9VVcwkIIcM7h+z66rnsGbprGeCvf95+B8zw3YM75M/A4jnAcB3pS9Y3zPKPv+0vYKDf9jcdxRJ7naJrG2DRNE8IwBGMMZVkewmQbOnwVy7IgiqL3TWzbRtu2x+gVnKYpXNdF27ZY1xVxHO+awHVdDMMAKaWJnsFZlhkn8zzvtMOCIACl1No8Sqm1S55SCkVR3GrdI3QH13V9/7PRxmG5SSlh2/bHKGMMVVXtYc75bZRSaunX1w92+9vUT6mju3WfJuqvv/zf8FmZXq7/BfoCA1VRsvK4AfgAAAAASUVORK5CYII=") no-repeat center #c1c1c1}
-->
</style><style class="x_cui-content-default">
<!--
div
{margin:10px;
font-size:10pt;
font-family:'¸¼Àº °íµñ';
line-height:1.9}
div, div p, div li, div h1, div h2, div h3, div h4, div h5, div h6
{font-family:'¸¼Àº °íµñ';
line-height:1.9}
-->
</style>
<div><span class="x_pfptPreheader" style="display:none!important; visibility:hidden; font-size:1px; color:#ffffff; line-height:1px; max-height:0px; max-width:0px; opacity:0; overflow:hidden">Dear all, We are conducting the performance benchmark using osu_bw.
We want to see how much performance can be delivered when leveraging RDMA & GDR(NVIDIA GPUDirect RDMA). We could not understand some benchmark results by ourselves and
</span><span style="display:none!important; visibility:hidden; font-size:1px; color:#ffffff; line-height:1px; max-height:0px; max-width:0px; opacity:0; overflow:hidden">ZjQcmQRYFpfptBannerStart</span>
<div dir="ltr" lang="en" class="x_pfptBanner" style="margin:16px 0px 16px 0px; padding:8px 16px 8px 16px; border-radius:4px; min-width:200px; background-color:#CFD3D7; border-top:4px solid #8c8e91">
<div class="x_pfptMessageContainer" style="display:inline-block; margin:0px 0px 1px 0px; max-width:600px">
<div class="x_pfptBannerTitle" style="color:#000000!important; font-family:'Arial',sans-serif!important; font-weight:bold!important; font-size:14px!important; line-height:18px; display:block">
This Message Is From an External Sender </div>
<div class="x_pfptBannerSubtitle" style="color:#000000!important; font-weight:normal!important; font-family:'Arial',sans-serif!important; font-size:12px!important; line-height:18px; margin-top:2px; display:block">
This message came from outside your organization. </div>
</div>
<div class="x_pfptButtonGroup" style="float:right; margin:0px 0px 0px 16px; text-align:right; width:fit-content">
<a href="https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!vSQT_YYBQ-aAKpkx3w4EN1riEJSjQqUm9kNLwKHCCcbSVnBFa1Trgc-jlAUbt-kFaKKX5aKlRgLeFDNWsdSwRpky45pfqYiz8wcae_tcnszWCZ7KAm6HE-yTQbp0RSn33g$" style="color:#000000!important; font-family:'Arial',sans-serif; font-size:14px; font-weight:normal; text-decoration:none!important">
<div class="x_pfptButton x_pfptPrimaryButton" style="display:inline-block; font-family:'Arial',sans-serif; font-size:14px; font-weight:normal; border-radius:2px; padding:7.5px 16px; margin:3px 0 3px 16px; white-space:nowrap; width:fit-content; border:1px solid #666666">
Report Suspicious </div>
</a></div>
<div style="clear:both; display:block; visibility:hidden; line-height:0"> </div>
</div>
<div style="display:none!important; visibility:hidden; font-size:1px; color:#ffffff; line-height:1px; max-height:0px; max-width:0px; opacity:0; overflow:hidden">
ZjQcmQRYFpfptBannerEnd</div>
<p>Dear all,</p>
<p> </p>
<p>We are conducting the performance benchmark using osu_bw.</p>
<p>We want to see how much performance can be delivered when leveraging <span style="color:#000000; font-family:'¸¼Àº °íµñ'; orphans:2">RDMA & GDR(NVIDIA GPUDirect RDMA).</span></p>
<p><span style="color:#000000; font-family:'¸¼Àº °íµñ'; orphans:2"> </span></p>
<p>We could not understand some benchmark results by ourselves and hope someone in this mailing list help us for that.</p>
<p>Any advices are welcoming and we believe they will be very helpful.</p>
<p> </p>
<p><span style="font-weight:bold">Our environment:</span></p>
<p> - Two of the NVIDIA DGX A100 machines are used.</p>
<p> - They are connected over 400Gbps Infiniband fabric (each machine has two of 200Gbps HDR Infiniband HCA).</p>
<p> - We use osu_bw included in the NVIDIA HPC-X package that is a precompiled OpenMPI and UCX packages with CUDA support. (ref: <a href="https://urldefense.com/v3/__https://docs.nvidia.com/networking/display/GPUDirectRDMAv17/Benchmark*Tests*BenchmarkTests-RunningGPUDirectRDMAwithOpenMPI__;KyM!!KGKeukY!lQo53wtCqpVhhPtL0pp_CukE4C8dUiwuOzv9XllIUqKPSTo9CYmmeM4F4yw-BbOrvhu6tnTdZQ$" target="_blank" title="">Link</a>)</p>
<p> - We run four osu_bw entities in total by using the <span style="font-style:italic">
mpirun</span> command.</p>
<p> </p>
<p> </p>
<p><span style="font-weight:bold">Result:</span></p>
<p><img style="border-width:0px; zoom:1; display:inline-block; margin:0px; top:0px; left:0px; height:294px; width:493px; visibility:visible" data-outlook-trace="F:1|T:1" src="cid:cafe_image_0@s-core.co.kr"></p>
<p> - 'Device-to-Device' is the case where both of the sender and receiver of osu_bw use GPU memory. </p>
<p> - 'Device-to-Host' is the case where the sender uses GPU memory and the receiver uses the host memory.</p>
<p> - 'Host-to-Device' is the case where the sender used the host memory and the receiver uses the GPU memory.</p>
<p><span style="color:#000000; font-family:'¸¼Àº °íµñ'; orphans:2"> - 'Host-to-Host' is the case where both use the host memory. </span></p>
<p> - 'w/ Device Affinity' is an affinity between GPU and IB HCA. When we use GPU and IB HCA connected to the same PCIe root complex, </p>
<p> we call it 'w/ Device Affinity'. When GPU and IB HCA are not located below the same root complex, then we call it 'w/o Device Affinity'. </p>
<p> When they are in the same root complex, the GDR feature can be used in the communication and deliver better performance because </p>
<p> host CPU is not involved in the transmission.</p>
<p> - 'w/ CPU Affinity' is a NUMA affinity between IB HCA and CPU cores. When we run osu_bw benchmark on the CPU cores that have affinity </p>
<p> with IB HCA, we call it 'w/ CPU Affinity'. We call the case where they don't have affinity 'w/o CPU Affinity'.</p>
<p> </p>
<p><span style="font-weight:bold">Question:</span></p>
<p><span style="font-weight:bold; color:rgb(0,0,255)"> </span><span style="color:rgb(0,0,255)">We couldn't understand the result of the cases </span><span style="font-weight:bold"><span style="font-family:"¸¼Àº °íµñ"; orphans:2; color:rgb(0,0,255)">'Device-to-Host'
and </span><span style="font-family:"¸¼Àº °íµñ"; orphans:2; color:rgb(0,0,255)">'Host-to-Device' w/o Device Affinity.</span></span></p>
<p><span style="color:#000000; font-family:'¸¼Àº °íµñ'; orphans:2"> We initially thought one side of the both cases could not benefit from RDMA and GDR at all so that the performance must be much slower than other cases.</span></p>
<p><span style="color:#000000; font-family:'¸¼Àº °íµñ'; orphans:2"> However, as you can see in the figure above, the results of </span><span style="color:#000000; font-family:'¸¼Àº °íµñ'; orphans:2">'Host-to-Device' w/o device affinity are 318Gbps and 325Gbps respectively, </span></p>
<p><span style="color:#000000; font-family:'¸¼Àº °íµñ'; orphans:2"> which are much higher than the result of </span><span style="color:#000000; font-family:'¸¼Àº °íµñ'; orphans:2">'Device-to-Host' (76Gbps).</span></p>
<p><span style="color:#000000; font-family:'¸¼Àº °íµñ'; orphans:2"> </span></p>
<p><span style="color:#000000; font-family:'¸¼Àº °íµñ'; orphans:2"> Our hypothesis is that the difference would be in the operation type: read or write. </span></p>
<p><span style="color:#000000; font-family:'¸¼Àº °íµñ'; orphans:2"> - The write operation cannot leverage GDR and RDMA w/o Device Affinity. In the case 'Device-to-Host', CPU is involved in the sender side communication </span></p>
<p><span style="color:#000000; font-family:'¸¼Àº °íµñ'; orphans:2"> and the performance drops to 76Gbps.</span></p>
<p><span style="color:#000000; font-family:'¸¼Àº °íµñ'; orphans:2"> - The read operation can benefit from GDR and RDMA even in the case w/o Device Affinity. In the case 'Host-to-Device', the receiver is able to retreive data </span></p>
<p><span style="color:#000000; font-family:'¸¼Àº °íµñ'; orphans:2"> in the GPU memory without the help of CPU so the performance drop is negligible.</span></p>
<p> </p>
<p>Could you give us any comments about our hypothesis?</p>
<p>Thank you so much for reading this long email.</p>
<p> </p>
<p>Best regards,</p>
<p>Yours,</p>
<p> </p>
<p>Byungkwon Choi</p>
<table id="x_bannersignimg">
<tbody>
<tr>
<td>
<p> </p>
</td>
</tr>
</tbody>
</table>
<table id="x_confidentialsignimg">
<tbody>
<tr>
<td>
<p><img style="border:0px solid currentColor; width:520px; height:144px; display:inline-block" data-outlook-trace="F:1|T:1" src="cid:20220421064422_0@epcms1p"> </p>
</td>
</tr>
</tbody>
</table>
<table style="display:none">
<tbody>
<tr>
<td><img border="0" width="0" height="0" style="display:none" src="http://ext.samsung.net/mail/ext/v1/external/status/update?userid=bk21.choi&do=bWFpbElEPTIwMjIwNDIxMDY0NDIyZXBjbXMxcDM2NTIwMzEyZWExYjdkMjg2N2NlY2JlYzg0YTA2OTk5NyZyZWNpcGllbnRBZGRyZXNzPW12YXBpY2gtZGlzY3Vzc0BsaXN0cy5vc3UuZWR1"></td>
</tr>
</tbody>
</table>
</div>
</body>
</html>