[Mvapich-discuss] Azure HBv4 mpi failure

Panda, Dhabaleswar panda at cse.ohio-state.edu
Fri Jul 12 09:21:47 EDT 2024


Sorry to know that the issue still persists with the patch. Can you please provide some more details on the failure you are seeing?

Thanks,

DK

________________________________________
From: Mvapich-discuss <mvapich-discuss-bounces+panda.2=osu.edu at lists.osu.edu> on behalf of Sandhu, Prabhjot(Nicky)@DWR via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Sent: Thursday, July 11, 2024 8:41 PM
To: Paniraja Guptha, Akshay; Announcement about MVAPICH (MPI over InfiniBand, RoCE, Omni-Path, Slingshot, iWARP and EFA)        Libraries developed at NBCL/OSU
Subject: Re: [Mvapich-discuss] Azure HBv4 mpi failure

Hi Akshay, I patched it and the warning message goes away but the application still fails with internal checks. In other words, the issue remains. From: Paniraja Guptha, Akshay <panirajaguptha. 1@ osu. edu> Sent: Thursday, July 11, 2024
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!vwQd8KYtD6YBRRdwHw6EV4dZS5FaiaY466u7UcpxWSnbi9t6G-QLGY3lOlB1O8AQA14-nQwddFa2wY3zxTuKWmiwbDcDSzDERSlrrlTo2sbBREwTbgdkApWwnR6MVMIPkhCLs9BSxOSeTa7PIJeuEg$>
Report Suspicious

ZjQcmQRYFpfptBannerEnd
Hi Akshay,
I patched it and the warning message goes away but the application still fails with internal checks. In other words, the issue remains.


From: Paniraja Guptha, Akshay <panirajaguptha.1 at osu.edu>
Sent: Thursday, July 11, 2024 12:36 PM
To: Paniraja Guptha, Akshay <panirajaguptha.1 at osu.edu>; Announcement about MVAPICH (MPI over InfiniBand, RoCE, Omni-Path, Slingshot, iWARP and EFA) Libraries developed at NBCL/OSU <mvapich-discuss at lists.osu.edu>; Sandhu, Prabhjot(Nicky)@DWR <Prabhjot.Sandhu at water.ca.gov>
Subject: RE: Azure HBv4 mpi failure

You don't often get email from panirajaguptha.1 at osu.edu<mailto:panirajaguptha.1 at osu.edu>. Learn why this is important<https://urldefense.com/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!KGKeukY!0TzJcH9W0K9L5-SZAqrNAJCksDvztRhINlrGkx2NCf0n0BS8P_e6UP_lJoZVpr9zBkMOs-sU0lXs0haOKeBwi5h-B4S_cw8SWdM7L8Tmvg$>
Hi Nicky,
    Can you please try the attached patch?

-Akshay

From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu<mailto:mvapich-discuss-bounces at lists.osu.edu>> On Behalf Of Paniraja Guptha, Akshay via Mvapich-discuss
Sent: Monday, July 1, 2024 11:38 AM
To: Sandhu, Prabhjot(Nicky)@DWR <Prabhjot.Sandhu at water.ca.gov<mailto:Prabhjot.Sandhu at water.ca.gov>>; Announcement about MVAPICH (MPI over InfiniBand, RoCE, Omni-Path, Slingshot, iWARP and EFA) Libraries developed at NBCL/OSU <mvapich-discuss at lists.osu.edu<mailto:mvapich-discuss at lists.osu.edu>>
Subject: Re: [Mvapich-discuss] Azure HBv4 mpi failure

Hi Nicky,
    Thanks for bringing this to our attention. We will take a look at the issue and get back to you.

-Akshay Paniraja Guptha

From: Mvapich-discuss <mvapich-discuss-bounces+panirajaguptha.1=osu.edu at lists.osu.edu<mailto:mvapich-discuss-bounces+panirajaguptha.1=osu.edu at lists.osu.edu>> On Behalf Of Sandhu, Prabhjot(Nicky)@DWR via Mvapich-discuss
Sent: Monday, July 1, 2024 11:09 AM
To: mvapich-discuss at lists.osu.edu<mailto:mvapich-discuss at lists.osu.edu>
Subject: [Mvapich-discuss] Azure HBv4 mpi failure

I compiled my code against the lastest alma linux 8. 7 and mvapich2-2. 3. 7-1 on Azure. The code performs very well when using HBv2-series or HBv3-series, however it fails when using HBv4-series with the following warning at start of the mpirun

I compiled my code against the lastest alma linux 8.7 and mvapich2-2.3.7-1 on Azure. The code performs very well when using HBv2-series<https://urldefense.com/v3/__https:/learn.microsoft.com/en-us/azure/virtual-machines/sizes/high-performance-compute/hb-family*hbv2-series__;Iw!!KGKeukY!wHkEMZ0eG8-_lzRbW3pQoiNeTm2zvI6k4mCGcQ5RhL_zSzxaLb28swQvFn_sXZm35ID-u19N9dXDw0rWbGB0sUpj2J05VChdBNmn6MzFmg$> or HBv3-series<https://urldefense.com/v3/__https:/learn.microsoft.com/en-us/azure/virtual-machines/sizes/high-performance-compute/hb-family*hbv3-series__;Iw!!KGKeukY!wHkEMZ0eG8-_lzRbW3pQoiNeTm2zvI6k4mCGcQ5RhL_zSzxaLb28swQvFn_sXZm35ID-u19N9dXDw0rWbGB0sUpj2J05VChdBNl1qZThrw$>, however it fails when using HBv4-series<https://urldefense.com/v3/__https:/learn.microsoft.com/en-us/azure/virtual-machines/sizes/high-performance-compute/hb-family*hbv4-series__;Iw!!KGKeukY!wHkEMZ0eG8-_lzRbW3pQoiNeTm2zvI6k4mCGcQ5RhL_zSzxaLb28swQvFn_sXZm35ID-u19N9dXDw0rWbGB0sUpj2J05VChdBNmJDg7Qiw$> with the following warning at start of the mpirun after which the application code also fails.

[get_link_speed] Invalid link speed 128


Has anyone seen this message? Are there any env vars or config vars to be set?


Nicky



More information about the Mvapich-discuss mailing list