[Mvapich-discuss] libpmi2 could not be found while building for Slurm
Shineman, Nat
shineman.5 at osu.edu
Wed Jan 17 13:02:32 EST 2024
Hi Aditya,
We have found that typically Cray systems use a different version of PMI/PMI2 than is found on other slurm installations. Can you please try building with --with-pmi=cray instead of giving the pmi2 path? This has been more successful in most tests that we have tried. If you are still having issues, sometimes you need to also add --with-craypmi=<path/to/craypmi/dir> to ensure the right version is picked up.
Thanks,
Nat
________________________________
From: Mvapich-discuss <mvapich-discuss-bounces+shineman.5=osu.edu at lists.osu.edu> on behalf of Kashi, Aditya via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Sent: Tuesday, January 9, 2024 18:18
To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
Cc: Matheson, Michael <mathesonma at ornl.gov>
Subject: [Mvapich-discuss] libpmi2 could not be found while building for Slurm
This Message Is From an External Sender
This message came from outside your organization.
Report Suspicious<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!siQZ16bhKayAAJnRn86E3cDMzsH8lNbryju8HZ7WNS4aDJ0T0cxv1kzTpsKK5ugjGyuGPD1VUmTJs6tjOQrCvUk4ZjkielXdW2XhjtqF6Zg4Ot3gIlXbQ1E$>
Hi everyone,
I'm trying to build MVAPICH 3.0rc on a Cray Shasta system with Slurm. This is my current build line:
FFLAGS=-fallow-argument-mismatch ./configure --with-pmi=pmi2 --enable-slurm --with-pm=slurm --enable-rocm --with-rocm=$ROCM_PATH --with-libfabric=/opt/cray/libfabric/1.15.2.0 --with-pmi2-libdir=/usr/lib64/slurmpmi
It's able to find pmi2.h, but not libpmi2.so:
...
configure: RUNNING CONFIGURE FOR src/pm/slurm
checking for srun... /usr/bin/srun
checking slurm/pmi2.h usability... yes
checking slurm/pmi2.h presence... yes
checking for slurm/pmi2.h... yes
checking for /usr/include/slurm/pmi2.h... yes
./configure: line 60908: found: command not found
checking for PMI2_Init in -lpmi2... no
configure: error: could not find the slurm libpmi2. Configure aborted
However, I can see the file /usr/lib64/slurmpmi/libpmi2.so, which is symlinked to /usr/lib64/slurmpmi/libpmi2.so.0.0.0. I've tried variations of the last flag like "--with-pmi-libdir", "--with-pmi2-lib=.../libpmi2.so" etc.
Is there a known way to build MVAPICH with a scalable backend on this kind of system? Getting the best possible performance at scale is absolutely necessary for what I'm trying to do.
Thanks,
Aditya Kashi
Analytics and AI Methods at Scale
Oak Ridge National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20240117/49b58ae9/attachment-0002.html>
More information about the Mvapich-discuss
mailing list