[Mvapich-discuss] libpmi2 could not be found while building for Slurm

Shineman, Nat shineman.5 at osu.edu
Fri Jan 19 12:38:48 EST 2024


Hi Aditya,

Can you let me know the output of ldd​ on your libmpi.so​? I am guessing you are not correctly linked to the Cray libfabric library. In that case the cxi provider will not be available. We typically find that the correct libfabric installation is installed somwhere like /opt/cray/libfabric​ but that is on a per system basis. This can be selected at configure time with --with-libfabric=<path/to/libfabric>​.  Alternatively, you can use LD_PRELOAD​ to adjust the libfabric installation at runtime. This is a critical requirement on all Cray systems because the only way to access the Slingshot11 network is through their proprietary OFI installation.

Thanks,
Nat
________________________________
From: Kashi, Aditya <kashia at ornl.gov>
Sent: Friday, January 19, 2024 12:31
To: Shineman, Nat <shineman.5 at osu.edu>; mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
Cc: Matheson, Michael <mathesonma at ornl.gov>
Subject: Re: [Mvapich-discuss] libpmi2 could not be found while building for Slurm

Hi Nat, Thank you for the quick reply. Indeed, with those flags set, now the app fails with Abort(2665871) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: MPIR_Init_thread(175). . . . . . . : MPID_Init(597). . . . . . . . . . . . . . : 
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!vwQfsoZND6YBRRdx_q5E-YzCJRiQZNoNDrMFoUmHFuXh8T3DK1V99ijEbPcwvPH6J3WSjW2mCyAI0oEqL2faCNPLOESDOu-9meICpeQ57piDO5E4V0yjRgc8WuJ27zcE6XjK4mLlTIn9$>
Report Suspicious

ZjQcmQRYFpfptBannerEnd
Hi Nat,

Thank you for the quick reply. Indeed, with those flags set, now the app fails with

Abort(2665871) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(175).......:
MPID_Init(597)..............:
MPIDI_MVP_mpi_init_hook(289):
MPIDI_OFI_mpi_init_hook(637):
open_fabric(1338)...........:
find_provider(1431).........: OFI fi_getinfo() failed (ofi_init.c:1431:find_provider:No data available)

I guess that means it can't detect the CXI provider. Do you have any guess about where the issue might lie?

It is a GPU app. I'll take a look at MVAPICH-plus.

Best,
Aditya
________________________________
From: Shineman, Nat <shineman.5 at osu.edu>
Sent: Friday, January 19, 2024 12:20 PM
To: Kashi, Aditya <kashia at ornl.gov>; mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
Cc: Matheson, Michael <mathesonma at ornl.gov>
Subject: [EXTERNAL] Re: [Mvapich-discuss] libpmi2 could not be found while building for Slurm

Hi Aditya,

To get performance on par with Cray MPICH you will need to run with MPIR_CVAR_OFI_USE_PROVIDER=cxi​ to make sure that MVAPICH correctly detects the cxi provider. For good measure you can also set FI_PROVIDER=cxi​ to force OFI to only allow this provider to be used. This will cause the application for fail if MVAPICH does not correctly identify the slingshot provider. With these cvars set you should see the performance you expect on CPU applications. For GPU applications, you will need to use our MVAPICH-Plus library, available on our downloads page.

Regarding the sub communicator issue, I will take a look at the reproducer and get back to you.

Thanks,
Nat
________________________________
From: Kashi, Aditya <kashia at ornl.gov>
Sent: Friday, January 19, 2024 12:16
To: Shineman, Nat <shineman.5 at osu.edu>; mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
Cc: Matheson, Michael <mathesonma at ornl.gov>
Subject: Re: [Mvapich-discuss] libpmi2 could not be found while building for Slurm

Hi Nat, Thank you for the suggestion! I managed to get the following build: FFLAGS=-fallow-argument-mismatch ./configure --prefix=. . /Programs/mvapich-3. 0rc-rocm543-slurm --with-device=ch4: ofi --enable-rocm --with-rocm=$ROCM_PATH --with-ch4-shmmods=gpudirect
Hi Nat,

Thank you for the suggestion! I managed to get the following build:

FFLAGS=-fallow-argument-mismatch ./configure --prefix=../Programs/mvapich-3.0rc-rocm543-slurm --with-device=ch4:ofi --enable-rocm --with-rocm=$ROCM_PATH --with-ch4-shmmods=gpudirect --with-pm=slurm --with-pmi=cray --with-hwloc-prefix=$HWLOC_DIR

However, the application runs much more slowly compared to Cray MPICH, and more importantly, reduce and allreduce fail on subcommunicators created by MPI_Comm_split​ when the subcommunicator spans more than one node. The code I wrote to test this is here: https://bitbucket.org/Slaedr/mpi-hip-test-suite/src/main/<https://urldefense.com/v3/__https://urldefense.us/v2/url?u=https-3A__urldefense.com_v3_-5F-5Fhttps-3A__bitbucket.org_Slaedr_mpi-2Dhip-2Dtest-2Dsuite_src_main_-5F-5F-3B-21-21KGKeukY-21wnfn-2DhY901VbPEn68hyZgt6tCeZ7R2Qh0CqMJ846ccmwTk3hV1dxppbxecA4thkf1tDxq9g7i-2DaMpEHceg-24&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=SF2YipDJY2dwSsQ76LTjoA&m=pkRMgOuY1r34KU-HoW8mZwUlvVKn22z2F-gbnLpqin4IBgclP5jwanLH7ZAYjNrA&s=ZnKf9tAVj4CwbasCTGUSz5Vxq6kczb25wBCi4F_DtoE&e=__;!!KGKeukY!2U4jCS3WacaAISCJZ-e3QzOpBza2_RwJTAnBd-dFiURuCdE4QBJueDgZJDL-81iDx6-QG6zgSvrU0q7hjg$> It's a simple CMake build with MPI_HOME​ pointing to the MPI install directory. I ran the comm_reduce​ test using
srun -n 16 -c 7 build/test/comm_reduce gpu​
on two nodes. The code essentially separates ranks 3 through 15 into a separate communicator and calls MPI_Allreduce​ on that communicator. However, there's a segfault in the MPI_Allreduce​. When the communicator is MPI_COMM_WORLD​, this works fine though.

Should I try some other build setting for MVAPICH? Please let me know if I should provide any more details.
Best,
Aditya
________________________________
From: Shineman, Nat <shineman.5 at osu.edu>
Sent: Wednesday, January 17, 2024 1:02 PM
To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>; Kashi, Aditya <kashia at ornl.gov>
Cc: Matheson, Michael <mathesonma at ornl.gov>
Subject: [EXTERNAL] Re: [Mvapich-discuss] libpmi2 could not be found while building for Slurm

Hi Aditya,

We have found that typically Cray systems use a different version of PMI/PMI2 than is found on other slurm installations. Can you please try building with --with-pmi=cray​ instead of giving the pmi2 path? This has been more successful in most tests that we have tried. If you are still having issues, sometimes you need to also add --with-craypmi=<path/to/craypmi/dir>​ to ensure the right version is picked up.

Thanks,
Nat
________________________________
From: Mvapich-discuss <mvapich-discuss-bounces+shineman.5=osu.edu at lists.osu.edu> on behalf of Kashi, Aditya via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Sent: Tuesday, January 9, 2024 18:18
To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
Cc: Matheson, Michael <mathesonma at ornl.gov>
Subject: [Mvapich-discuss] libpmi2 could not be found while building for Slurm

This Message Is From an External Sender
This message came from outside your organization.
Report Suspicious<https://urldefense.com/v3/__https://urldefense.us/v2/url?u=https-3A__urldefense.com_v3_-5F-5Fhttps-3A__urldefense.us_v2_url-3Fu-3Dhttps-2D3A-5F-5Fus-2D2Dphishalarm-2D2Dewt.proofpoint.com-5FEWT-5Fv1-5FKGKeukY-2D21siQZ16bhKayAAJnRn86E3cDMzsH8lNbryju8HZ7WNS4aDJ0T0cxv1kzTpsKK5ugjGyuGPD1VUmTJs6tjOQrCvUk4ZjkielXdW2XhjtqF6Zg4Ot3gIlXbQ1E-2D24-26d-3DDwMGaQ-26c-3Dv4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-2DO7C4ViYc-26r-3DSF2YipDJY2dwSsQ76LTjoA-26m-3DEo2KFWr55ERu1pJFgpaICYMtENZSz4GSwER9UPX-2DDnzcpNhxejQqF4ZC2O2GN-5FPr-26s-3DpVYX09R9nWI9VyhQnbWYLPYWWggW8xM7Ko83LKqo06E-26e-3D-5F-5F-3B-21-21KGKeukY-21wnfn-2DhY901VbPEn68hyZgt6tCeZ7R2Qh0CqMJ846ccmwTk3hV1dxppbxecA4thkf1tDxq9g7i-2DYPQ4G2-5Fg-24&d=DwMFaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=SF2YipDJY2dwSsQ76LTjoA&m=pkRMgOuY1r34KU-HoW8mZwUlvVKn22z2F-gbnLpqin4IBgclP5jwanLH7ZAYjNrA&s=Scgkaqs1usxvvnUPG3R4hlQyMpfpdHNV7lZFg5QHAGE&e=__;!!KGKeukY!2U4jCS3WacaAISCJZ-e3QzOpBza2_RwJTAnBd-dFiURuCdE4QBJueDgZJDL-81iDx6-QG6zgSvpvFToZmQ$>

Hi everyone,

I'm trying to build MVAPICH 3.0rc on a Cray Shasta system with Slurm. This is my current build line:

FFLAGS=-fallow-argument-mismatch ./configure --with-pmi=pmi2 --enable-slurm --with-pm=slurm --enable-rocm --with-rocm=$ROCM_PATH --with-libfabric=/opt/cray/libfabric/1.15.2.0  --with-pmi2-libdir=/usr/lib64/slurmpmi

It's able to find pmi2.h, but not libpmi2.so:

...
configure: RUNNING CONFIGURE FOR src/pm/slurm
checking for srun... /usr/bin/srun
checking slurm/pmi2.h usability... yes
checking slurm/pmi2.h presence... yes
checking for slurm/pmi2.h... yes
checking for /usr/include/slurm/pmi2.h... yes
./configure: line 60908: found: command not found
checking for PMI2_Init in -lpmi2... no
configure: error: could not find the slurm libpmi2.  Configure aborted

However, I can see the file /usr/lib64/slurmpmi/libpmi2.so, which is symlinked to /usr/lib64/slurmpmi/libpmi2.so.0.0.0. I've tried variations of the last flag like "--with-pmi-libdir", "--with-pmi2-lib=.../libpmi2.so" etc.

Is there a known way to build MVAPICH with a scalable backend on this kind of system? Getting the best possible performance at scale is absolutely necessary for what I'm trying to do.

Thanks,
Aditya Kashi
Analytics and AI Methods at Scale
Oak Ridge National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20240119/5d5f5fff/attachment-0002.html>


More information about the Mvapich-discuss mailing list