[Mvapich-discuss] Transitioning to MVAPICH 4: Environment Variables

Panda, Dhabaleswar panda at cse.ohio-state.edu
Tue Mar 25 21:46:58 EDT 2025


Hi Matt,

Thanks for your note. Good to know that your model is working with MVAPICH 4.0. Sorry to know that you are getting variable performance here.

I am sending a follow-up note to you. It will be good to have a short discussion with you and some of the MVAPICH team members to understand your environment and issues in detail and investigate these further.

Thanks,

DK

Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> on behalf of Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Sent: Tuesday, March 25, 2025 2:52:22 PM
To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
Subject: [Mvapich-discuss] Transitioning to MVAPICH 4: Environment Variables

All, First, congrats to the MVAPICH Team for v4. 0! Now, in the past MVAPICH2 + our model never seemed to work on a cluster (well, 10 years ago it did, but then it. . . stopped working). MVAPICH 3 also had issues, but MVAPICH 4. 0 seems to be working. . . albeit
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!vwQdMidND6YBRRdx3-4leYPCCGtCZUVIgQ8Tbd9q0E8oX2cw6znhgW_PFqU9whzUZy_dJ3khpDk1tmXs3S2Vqdhd_bvcFEXBXCAEAUUvfYei1z4Q8W1u5UUzhuEod-A_2UygNcCaXO7vAr8sdKpQ0w$>
Report Suspicious

ZjQcmQRYFpfptBannerEnd

All,

First, congrats to the MVAPICH Team for v4.0!



Now, in the past MVAPICH2 + our model never seemed to work on a cluster (well, 10 years ago it did, but then it...stopped working). MVAPICH 3 also had issues, but MVAPICH 4.0 seems to be working...albeit a bit more "wobbly" than Open MPI and Intel MPI.[1]



In the past, we found these:



  MV2_ENABLE_AFFINITY=0

  MV2_MPIRUN_TIMEOUT=100

  MV2_GATHERV_SSEND_THRESHOLD=256

  MV2_HOMOGENEOUS_CLUSTER=1

  MV2_ON_DEMAND_THRESHOLD=1



were useful, but I also see from https://mvapich-docs.readthedocs.io/en/mvapich-open/cvar.html<https://urldefense.com/v3/__https://mvapich-docs.readthedocs.io/en/mvapich-open/cvar.html__;!!KGKeukY!xaP_6bqO7gmKV9aKYNbYE5fguOeRX9CtljhsK8bPEiYXDkrZHUVWK3kxy_GlOk6jmPLnXE6J45FBTT_vl_tuzDAQ9CHKUqjp3fAsdHs$> that the MVAPICH environment variables might have changed.



So, is there a page/pages like https://mvapich.cse.ohio-state.edu/performance/job-startup/ that might have useful tips? (Though I don't think our startup was all that slow, but similar.) Or a page translating old flags into new?



Thanks,
Matt



[1]: "Wobbly" in that at "high-ish" core count (3456 processes on 28 nodes) Intel MPI seems to be around 110 d/d (model days per wall clock day) per step, Open MPI around 120 d/d per step (until it seems to have crashed weirdly). But MVAPICH 4 is varying from 50 to 110 d/d. No rhyme or reason, just different throughput randomly at each time step.







[signature_1230636052]<https://urldefense.com/v3/__http://www.ssaihq.com/__;!!KGKeukY!xaP_6bqO7gmKV9aKYNbYE5fguOeRX9CtljhsK8bPEiYXDkrZHUVWK3kxy_GlOk6jmPLnXE6J45FBTT_vl_tuzDAQ9CHKUqjpqLHNs90$>



Matt Thompson

Lead Scientific Software Engineer/Supervisor

Global Modeling and Assimilation Office

Science Systems and Applications, Inc.

Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771

o: 301-614-6712

matthew.thompson at nasa.gov<mailto:matthew.thompson at nasa.gov>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20250326/01328d4c/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 63716 bytes
Desc: image001.png
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20250326/01328d4c/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 365 bytes
Desc: image002.png
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20250326/01328d4c/attachment-0005.png>


More information about the Mvapich-discuss mailing list