[mvapich-discuss] (no subject)

Hari Subramoni subramoni.1 at osu.edu
Wed Oct 28 15:05:36 EDT 2015


Hello John,

If you're looking for extreme scalability, one other suggestion I have is
to look at the UD transport protocol instead of the XRC transport protocol.
Please refer to slides #28 - #30 of the MUG'15 tutorial I mentioned earlier
for a comparison of the performance and memory scalability of different IB
transport protocols done with MVAPICH2.

Another option would be to use the new DC transport protocol from Mellanox (
http://link.springer.com/chapter/10.1007%2F978-3-319-07518-1_18). Please
refer to slide #58 of the MUG'15 tutorial for more details of how to use DC
with MVAPICH2.

Regards,
Hari.

On Wed, Oct 28, 2015 at 2:48 PM, Hari Subramoni <subramoni.1 at osu.edu> wrote:

> Hello John,
>
> MVAPICH pins down some memory statically and some dynamically. MVAPICH
> provides debug options to show the amount of memory that has been pinned
> statically.
>
> We have discussed this and other optimizations / debugging options that
> MVAPICH provides at the annual MVAPICH user group meeting that took place
> earlier this year (http://mug.mvapich.cse.ohio-state.edu/). Please refer
> to slide #150 of the following tutorial that was given MUG'15 for more
> details on how to identify the amount memory that was statically pinned by
> MVAPICH2.
>
>
> http://mug.mvapich.cse.ohio-state.edu/static/media/mug/presentations/2015/mug15-tutorial_all_you_want_to_know_about_mvapich2_libraries_and_much_more...-the_mvapich_team.pdf
>
> If you were using MVAPICH, and you are interested, we could work with you
> to identify the amount of memory being pinned dynamically as well.
>
> I don't clearly understand question number #2. Do you mean where in the
> physical memory the pinned regions are or what portion of the MVAPICH code
> hold these pinned regions?
>
> I know you did not ask for it, however, if you are using MVAPICH, you
> could use on-demand establishment method to reduce the number of QPs
> created to what is actually required (Note: OpenMPI may have similar
> features too. I'm not sure). Please refer to the following link for more
> details about this.
>
>
> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2a-userguide.html#x1-20000011.42
>
> Regards,
> Hari.
>
> On Wed, Oct 28, 2015 at 2:20 PM, Sasso, John (GE Power & Water, Non-GE) <
> John1.Sasso at ge.com> wrote:
>
>> But my questions are:
>>
>>
>> 1.       How do we determine HOW MUCH memory is being pinned by an MPI
>> job on a node?  (If pmap, what exactly are we looking for?)
>>
>> 2.       How do we determine WHERE these pinned memory regions are?
>>
>>
>> Does MVAPICH do pinning of memory regions as well?  If so, my question
>> would still hold even for MVAPICH.  Thanks
>>
>>
>>
>> --john
>>
>>
>>
>>
>>
>> *From:* hari.subramoni at gmail.com [mailto:hari.subramoni at gmail.com] *On
>> Behalf Of *Hari Subramoni
>> *Sent:* Wednesday, October 28, 2015 2:09 PM
>> *To:* Sasso, John (GE Power & Water, Non-GE)
>> *Cc:* mvapich-discuss at cse.ohio-state.edu
>> *Subject:* Re:
>>
>>
>>
>> Hello,
>>
>>
>>
>> The error is OpenMPI specific. So we will not be able to give you exact
>> guidance. However, can you please see if following the steps in the
>> following link solves the issue of being unable to create QPs?
>>
>>
>>
>>
>> http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2a-userguide.html#x1-1150009.1.4
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__mvapich.cse.ohio-2Dstate.edu_static_media_mvapich_mvapich2-2D2.2a-2Duserguide.html-23x1-2D1150009.1.4&d=CwMFaQ&c=IV_clAzoPDE253xZdHuilRgztyh_RiV3wUrLrDQYWSI&r=tqKZ2vRCLufSSXPvzNxBrKr01YPimBPnb-JT-Js0Fmk&m=_Anxvx6vYRdRguW7r1kqggBbndQ8XQwHDE12ZTgUwBs&s=y_WwbpBkL0QntS4JqtxvRylLBaA3QtZMQVOCFjRB2ck&e=>
>>
>>
>>
>> Regards,
>>
>> Hari.
>>
>>
>>
>> On Wed, Oct 28, 2015 at 1:56 PM, Sasso, John (GE Power & Water, Non-GE) <
>> John1.Sasso at ge.com> wrote:
>>
>> --===============3811411851188913222==
>> Content-Language: en-US
>> Content-Type: multipart/alternative;
>>
>> boundary="_000_4F505D9A84D1D74E9397FB427DDF95BC52056EC2ALPMBAPA12e2kad_"
>>
>> --_000_4F505D9A84D1D74E9397FB427DDF95BC52056EC2ALPMBAPA12e2kad_
>> Content-Type: text/plain; charset="us-ascii"
>> Content-Transfer-Encoding: quoted-printable
>>
>> Pardon if this has been addressed already, but I could not find the
>> answer =
>> after doing Google searches.  I tried posing this question on the OpenMPI
>> a=
>> nd OpenFabrics mailing lists, but it was recommended I post to the
>> MVAPICH =
>> list given their focus on IB.
>>
>> We are in the process of analyzing and troubleshooting MPI jobs of
>> increasi=
>> ngly large scale (OpenMPI 1.6.5) which communicate over a Mellanox-based
>> IB=
>>  fabric.  At a sufficiently large scale (# cores) a job will end up
>> failing=
>>  with errors similar to:
>>
>> [yyyyy][[56933,1],1904][connect/btl_openib_connect_oob.c:867:rml_recv_cb]
>> e=
>> rror in endpoint reply start connect
>> [xxxxx:29318] 853 more processes have sent help message
>> help-mpi-btl-openib=
>> -cpc-base.txt / ibv_create_qp failed
>>
>> So I know we are running into some memory limitation (educated guess)
>> when =
>> queue pairs are being created to support such a huge mesh.  We are now
>> inve=
>> stigating using the XRC transport to decrease memory consumption.
>>
>> Anyways, my questions are:
>>
>>
>> 1.       How do we determine HOW MUCH memory is being pinned by an MPI
>> job =
>> on a node?  (If pmap, what exactly are we looking for?)
>>
>> 2.       How do we determine WHERE these pinned memory regions are?
>>
>> We are running RedHat 6.x
>>
>> --john
>>
>>
>>
>> --_000_4F505D9A84D1D74E9397FB427DDF95BC52056EC2ALPMBAPA12e2kad_
>> Content-Type: text/html; charset="us-ascii"
>> Content-Transfer-Encoding: quoted-printable
>>
>> <html xmlns:v=3D"urn:schemas-microsoft-com:vml"
>> xmlns:o=3D"urn:schemas-micr=
>> osoft-com:office:office"
>> xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
>> xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__schemas.microsoft.com_office_2004_12_omml&d=CwMFaQ&c=IV_clAzoPDE253xZdHuilRgztyh_RiV3wUrLrDQYWSI&r=tqKZ2vRCLufSSXPvzNxBrKr01YPimBPnb-JT-Js0Fmk&m=_Anxvx6vYRdRguW7r1kqggBbndQ8XQwHDE12ZTgUwBs&s=GWvDkv-ML2547ipefHP5V8hLcs-E9pWW_mZOSy03Xnk&e=>"
>> xmlns=3D"http:=
>> //www.w3.org/TR/REC-html40
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.w3.org_TR_REC-2Dhtml40&d=CwMFaQ&c=IV_clAzoPDE253xZdHuilRgztyh_RiV3wUrLrDQYWSI&r=tqKZ2vRCLufSSXPvzNxBrKr01YPimBPnb-JT-Js0Fmk&m=_Anxvx6vYRdRguW7r1kqggBbndQ8XQwHDE12ZTgUwBs&s=7X4lOrOPt-NR-xmDc8Q8YoopvNWPu6kmeb24SR6bj-M&e=>
>> ">
>> <head>
>> <meta http-equiv=3D"Content-Type" content=3D"text/html;
>> charset=3Dus-ascii"=
>> >
>> <meta name=3D"Generator" content=3D"Microsoft Word 14 (filtered medium)">
>> <style><!--
>> /* Font Definitions */
>> @font-face
>>         {font-family:Calibri;
>>         panose-1:2 15 5 2 2 2 4 3 2 4;}
>> /* Style Definitions */
>> p.MsoNormal, li.MsoNormal, div.MsoNormal
>>         {margin:0in;
>>         margin-bottom:.0001pt;
>>         font-size:11.0pt;
>>         font-family:"Calibri","sans-serif";}
>> a:link, span.MsoHyperlink
>>         {mso-style-priority:99;
>>         color:blue;
>>         text-decoration:underline;}
>> a:visited, span.MsoHyperlinkFollowed
>>         {mso-style-priority:99;
>>         color:purple;
>>         text-decoration:underline;}
>> p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
>>         {mso-style-priority:34;
>>         margin-top:0in;
>>         margin-right:0in;
>>         margin-bottom:0in;
>>         margin-left:.5in;
>>         margin-bottom:.0001pt;
>>         font-size:11.0pt;
>>         font-family:"Calibri","sans-serif";}
>> span.EmailStyle17
>>         {mso-style-type:personal-compose;
>>         font-family:"Calibri","sans-serif";
>>         color:windowtext;}
>> .MsoChpDefault
>>         {mso-style-type:export-only;
>>         font-family:"Calibri","sans-serif";}
>> @page WordSection1
>>         {size:8.5in 11.0in;
>>         margin:1.0in 1.0in 1.0in 1.0in;}
>> div.WordSection1
>>         {page:WordSection1;}
>> /* List Definitions */
>> @list l0
>>         {mso-list-id:1625035151;
>>         mso-list-type:hybrid;
>>         mso-list-template-ids:1366715120 67698703 67698713 67698715
>> 67698703 67698=
>> 713 67698715 67698703 67698713 67698715;}
>> @list l0:level1
>>         {mso-level-tab-stop:none;
>>         mso-level-number-position:left;
>>         text-indent:-.25in;}
>> @list l0:level2
>>         {mso-level-number-format:alpha-lower;
>>         mso-level-tab-stop:none;
>>         mso-level-number-position:left;
>>         text-indent:-.25in;}
>> @list l0:level3
>>         {mso-level-number-format:roman-lower;
>>         mso-level-tab-stop:none;
>>         mso-level-number-position:right;
>>         text-indent:-9.0pt;}
>> @list l0:level4
>>         {mso-level-tab-stop:none;
>>         mso-level-number-position:left;
>>         text-indent:-.25in;}
>> @list l0:level5
>>         {mso-level-number-format:alpha-lower;
>>         mso-level-tab-stop:none;
>>         mso-level-number-position:left;
>>         text-indent:-.25in;}
>> @list l0:level6
>>         {mso-level-number-format:roman-lower;
>>         mso-level-tab-stop:none;
>>         mso-level-number-position:right;
>>         text-indent:-9.0pt;}
>> @list l0:level7
>>         {mso-level-tab-stop:none;
>>         mso-level-number-position:left;
>>         text-indent:-.25in;}
>> @list l0:level8
>>         {mso-level-number-format:alpha-lower;
>>         mso-level-tab-stop:none;
>>         mso-level-number-position:left;
>>         text-indent:-.25in;}
>> @list l0:level9
>>         {mso-level-number-format:roman-lower;
>>         mso-level-tab-stop:none;
>>         mso-level-number-position:right;
>>         text-indent:-9.0pt;}
>> ol
>>         {margin-bottom:0in;}
>> ul
>>         {margin-bottom:0in;}
>> --></style><!--[if gte mso 9]><xml>
>> <o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
>> </xml><![endif]--><!--[if gte mso 9]><xml>
>> <o:shapelayout v:ext=3D"edit">
>> <o:idmap v:ext=3D"edit" data=3D"1" />
>> </o:shapelayout></xml><![endif]-->
>> </head>
>> <body lang=3D"EN-US" link=3D"blue" vlink=3D"purple">
>> <div class=3D"WordSection1">
>> <p class=3D"MsoNormal">Pardon if this has been addressed already, but I
>> cou=
>> ld not find the answer after doing Google searches.  I tried posing
>> th=
>> is question on the OpenMPI and OpenFabrics mailing lists, but it was
>> recomm=
>> ended I post to the MVAPICH list given
>>  their focus on IB.<o:p></o:p></p>
>> <p class=3D"MsoNormal"><o:p> </o:p></p>
>> <p class=3D"MsoNormal">We are in the process of analyzing and
>> troubleshooti=
>> ng MPI jobs of increasingly large scale (OpenMPI 1.6.5) which communicate
>> o=
>> ver a Mellanox-based IB fabric.  At a sufficiently large scale (#
>> core=
>> s) a job will end up failing with errors
>>  similar to:<o:p></o:p></p>
>> <p class=3D"MsoNormal"><o:p> </o:p></p>
>> <p
>> class=3D"MsoNormal">[yyyyy][[56933,1],1904][connect/btl_openib_connect_o=
>> ob.c:867:rml_recv_cb] error in endpoint reply start connect<o:p></o:p></p>
>> <p class=3D"MsoNormal">[xxxxx:29318] 853 more processes have sent help
>> mess=
>> age help-mpi-btl-openib-cpc-base.txt / ibv_create_qp failed<o:p></o:p></p>
>> <p class=3D"MsoNormal"><o:p> </o:p></p>
>> <p class=3D"MsoNormal">So I know we are running into some memory
>> limitation=
>>  (educated guess) when queue pairs are being created to support such a
>> huge=
>>  mesh.  We are now investigating using the XRC transport to decrease
>> m=
>> emory consumption.<o:p></o:p></p>
>> <p class=3D"MsoNormal"><o:p> </o:p></p>
>> <p class=3D"MsoNormal">Anyways, my questions are:<o:p></o:p></p>
>> <p class=3D"MsoNormal"><o:p> </o:p></p>
>> <p class=3D"MsoListParagraph" style=3D"text-indent:-.25in;mso-list:l0
>> level=
>> 1 lfo1"><![if !supportLists]><span style=3D"mso-list:Ignore">1.<span
>> style=
>> =3D"font:7.0pt "Times New
>> Roman"">     &=
>> nbsp;
>> </span></span><![endif]>How do we determine HOW MUCH memory is being
>> pinned=
>>  by an MPI job on a node?  (If pmap, what exactly are we looking
>> for?)=
>> <o:p></o:p></p>
>> <p class=3D"MsoListParagraph" style=3D"text-indent:-.25in;mso-list:l0
>> level=
>> 1 lfo1"><![if !supportLists]><span style=3D"mso-list:Ignore">2.<span
>> style=
>> =3D"font:7.0pt "Times New
>> Roman"">     &=
>> nbsp;
>> </span></span><![endif]>How do we determine WHERE these pinned memory
>> regio=
>> ns are?<o:p></o:p></p>
>> <p class=3D"MsoNormal"><o:p> </o:p></p>
>> <p class=3D"MsoNormal">We are running RedHat 6.x<o:p></o:p></p>
>> <p class=3D"MsoNormal"><o:p> </o:p></p>
>> <p class=3D"MsoNormal"><b><span
>> style=3D"font-size:10.0pt;font-family:&quot=
>> ;Arial","sans-serif";color:#333333">--john</span></b><span
>> s=
>>
>> tyle=3D"font-size:10.0pt;font-family:"Arial","sans-serif&quo=
>> t;;color:#333333"><o:p></o:p></span></p>
>> <p class=3D"MsoNormal"><o:p> </o:p></p>
>> <p class=3D"MsoNormal"><o:p> </o:p></p>
>> </div>
>> </body>
>> </html>
>>
>> --_000_4F505D9A84D1D74E9397FB427DDF95BC52056EC2ALPMBAPA12e2kad_--
>>
>> --===============3811411851188913222==
>> Content-Type: text/plain; charset="us-ascii"
>> MIME-Version: 1.0
>> Content-Transfer-Encoding: 7bit
>> Content-Disposition: inline
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__mailman.cse.ohio-2Dstate.edu_mailman_listinfo_mvapich-2Ddiscuss&d=CwMFaQ&c=IV_clAzoPDE253xZdHuilRgztyh_RiV3wUrLrDQYWSI&r=tqKZ2vRCLufSSXPvzNxBrKr01YPimBPnb-JT-Js0Fmk&m=_Anxvx6vYRdRguW7r1kqggBbndQ8XQwHDE12ZTgUwBs&s=VPMOhe-xhxBRsbyYnvFkSQ3E5hUES9sCYHvgAq1aJAU&e=>
>>
>> --===============3811411851188913222==--
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20151028/93a61b79/attachment-0001.html>


More information about the mvapich-discuss mailing list