[Mvapich-discuss] mvapich2 3.0a build failure finding pmi2.h

Shineman, Nat shineman.5 at osu.edu
Thu May 11 09:30:05 EDT 2023


Hi Cristof,

Regarding the quickstart guide, please try clearing your cache and retrying, it should be linked. I am able to view it without issue. Glad to hear that the pmi1 build is working, this is our suggested build for slurm since the interface is much more reliable (and on rare occasions several pmi2 implementations can cause race conditions during startup/finalize).

For libfabric, we bundled with that version since we were able to reliable test all of the providers. You are correct that newer versions are now available. You are welcome to install those and use the configure flag --with-libfabric to link against a newer version instead and you should be able to subsitutute new versions in with LD_LIBRARY_FLAGS​ as needed. Please be aware that with versions >1.18.0 you may experience issues. We were able to validate 1.18.0 against most providers, but future changes may require us to patch our internal capability sets.

Regarding the shmmod, this is expected. We deliberately disable MPICH's shmmod to use our own designs instead. These do not appear as an MPICH shmmod as they are implemented side-by-side with the netmod. If you would like to verify the performance gains, please try running osu_latency​ with no additional parameters compared to running osu_latency​ with MVP_USE_SHARED_MEM=0​. This will disable our shared memory designs and should show significantly reduce performance using only libfabric.

Please let me know if you have any additional quesions.

Thanks,
Nat
________________________________
From: Mvapich-discuss <mvapich-discuss-bounces+shineman.5=osu.edu at lists.osu.edu> on behalf of christof.koehler--- via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Sent: Thursday, May 11, 2023 08:36
To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
Subject: Re: [Mvapich-discuss] mvapich2 3.0a build failure finding pmi2.h

Hello again.

One additional question. When using
FFLAGS=-fallow-argument-mismatch ./configure --with-pm=slurm
--with-pmi=pmi1 --with-device=ch4:ofi

the configure output shows

config.status: executing libtool commands
***
*** device: ch4
*** netmods: ofi
*** shm: none
***
Configuration completed.

What does "shm: none" mean ? Will the MPI then only use libfabric shared
memory providers and not contain any of its own ?

Best Regards

Christof

On Thu, May 11, 2023 at 02:13:35PM +0200, christof.koehler at bccms.uni-bremen.de wrote:
> Hello everybody,
>
> I can confirm that 3.0b build succeeds without any bandaids for
> slurm/pmi2.h up to the error first reported by Zhi-Qiang You.
>
> The pmi1 build appears to succeed completely with 3.0b, no error
> messages. I did not check if libnl-3 dependency detection is working
> in cofigure in 3.0b now, but I assume it does.
>
> Thank you !
>
> Is the internal libfabric in 3.0b still the ancient 1.15.1
> version ? We would like to uses newer versions 1.18.0 and up via the
> --with-libfabric option. Would it be possible to dynamically switch
> libfabric versions after build by changing LD_LIBRARY_PATH or similar to
> point to the desired version ?
>
> Finally, as far as I can see the 3.0a (or 3.0b if there was an update)
> Quick Start Guide pdf is not linked from the download page ? Would it be
> possible to link to it there ? I only found it using google by accident.
>
>
> Best Regards
>
> Christof
>
> On Wed, May 10, 2023 at 05:57:25PM +0200, christof.koehler--- via Mvapich-discuss wrote:
> > Hello,
> >
> > the slurm version here is 22.05.8.
> >
> > I can confirm that prefixing with "MPICHLIB_CFLAGS=-I/usr/include/slurm"
> > during configure changes the error in make to
> >
> >   CC       src/util/lib_libmpi_la-mpir_pmi.lo
> > src/util/mpir_pmi.c:781:53: error: unknown type name ‘PMI_keyval_t’
> >   781 | static int mpi_to_pmi_keyvals(MPIR_Info * info_ptr, PMI_keyval_t
> > ** kv_ptr, int *nkeys_ptr);
> >       |                                                     ^~~~~~~~~~~~
> > src/util/mpir_pmi.c:782:30: error: unknown type name ‘PMI_keyval_t’
> >   782 | static void free_pmi_keyvals(PMI_keyval_t ** kv, int size, int
> > *counts);
> >       |                              ^~~~~~~~~~~~
> > src/util/mpir_pmi.c:1130:53: error: unknown type name ‘PMI_keyval_t’
> >  1130 | static int mpi_to_pmi_keyvals(MPIR_Info * info_ptr, PMI_keyval_t
> > ** kv_ptr, int *nkeys_ptr)
> >       |                                                     ^~~~~~~~~~~~
> > src/util/mpir_pmi.c:1168:30: error: unknown type name ‘PMI_keyval_t’
> >  1168 | static void free_pmi_keyvals(PMI_keyval_t ** kv, int size, int
> > *counts)
> >
> > I am not sure why the mvapich2 2.3.7-1 build against pmi2, same
> > configure arguments, apparently worked. Do you want the logs?
> >
> > When I change from --with-pmi=pmi2 to --with-pmi=pmi1 when building 3.0a
> > the build fails with
> >
> >   MOD      src/binding/fortran/use_mpi/mpi_base.mod-stamp
> >   MOD      src/binding/fortran/use_mpi/mpi.mod-stamp
> >   GEN      lib/libmpi.la
> > /usr/bin/ld: cannot find -lnl-3
> > /usr/bin/ld: cannot find -lnl-route-3
> > collect2: error: ld returned 1 exit status
> > make[2]: *** [Makefile:20852: lib/libmpi.la] Error 1
> > make[2]: Leaving directory '/backup1/build_temp/mvapich2-3.0a'
> > make[1]: *** [Makefile:50962: all-recursive] Error 1
> > make[1]: Leaving directory '/backup1/build_temp/mvapich2-3.0a'
> > make: *** [Makefile:13232: all] Error 2
> >
> > although configure was happy. Apparently it does not test if the
> > library headers are installed.
> >
> > After installing libnl3-devel the build of 3.0a succeeds for pmi1.
> >
> >
> > Best Regards
> >
> > Christof
> >
> >
> > On Wed, May 10, 2023 at 03:26:07PM +0000, Shineman, Nat via Mvapich-discuss wrote:
> > > Hi ZQ,
> > >
> > > This looks to be a newer version of pmi2.h. Older version's of slurm's PMI2 had that uncommented which allowed us to use PMI_keyval_t​ in both pmi1 and pmi2 builds. We have seen a similar issues with Cray pmi support recently. l will take a look to see how this can best be resolved to work with both versions. Can you tell me what version of Slurm you are using?
> > >
> > > Thanks,
> > > Nat
> > > ________________________________
> > > From: You, Zhi-Qiang <zyou at osc.edu>
> > > Sent: Wednesday, May 10, 2023 11:19
> > > To: Shineman, Nat <shineman.5 at osu.edu>; Announcement about MVAPICH2 (MPI over InfiniBand, RoCE, Omni-Path, iWARP and EFA) Libraries developed at NBCL/OSU <mvapich-discuss at lists.osu.edu>
> > > Subject: Re: [Mvapich-discuss] mvapich2 3.0a build failure finding pmi2.h
> > >
> > >
> > > Hello,
> > >
> > >
> > >
> > > I was able to fix the "pmi2.h" error by adding `--with-slurm-include=/usr/include/slurm` to the configure command. However, I encountered another error:
> > >
> > >
> > > >> 9796    src/util/mpir_pmi.c(781): error: identifier "PMI_keyval_t" is undefined
> > >
> > >      9797      static int mpi_to_pmi_keyvals(MPIR_Info * info_ptr, PMI_keyval_t ** kv_ptr, int *nkeys_ptr);
> > >
> > >      9798                                                          ^
> > >
> > >      9799
> > >
> > >   >> 9800    src/util/mpir_pmi.c(782): error: identifier "PMI_keyval_t" is undefined
> > >
> > >      9801      static void free_pmi_keyvals(PMI_keyval_t ** kv, int size, int *counts);
> > >
> > >      9802                                   ^
> > >
> > >
> > >
> > > Upon checking the pmi2.h file, I found the following:
> > >
> > >
> > > /* This is here to allow spawn multiple functions to compile.  This
> > >
> > >    needs to be removed once those functions are fixed for pmi2 */
> > >
> > > /*
> > >
> > > typedef struct PMI_keyval_t
> > >
> > > {
> > >
> > >     char * key;
> > >
> > >     char * val;
> > >
> > > } PMI_keyval_t;
> > >
> > > */
> > >
> > >
> > >
> > > The “PMI_keyval_t" struct is commented out. We are using the same SLURM version for other mvapich2 installations. Does this mean we need to upgrade SLURM or PMI2?
> > >
> > >
> > > Thank you,
> > >
> > > -ZQ
> > >
> > >
> > >
> > > From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> on behalf of Shineman, Nat via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
> > > Date: Wednesday, May 10, 2023 at 9:20 AM
> > > To: christof.koehler at bccms.uni-bremen.de <christof.koehler at bccms.uni-bremen.de>, c.koehler at uni-bremen.de <c.koehler at uni-bremen.de>, Announcement about MVAPICH2 (MPI over InfiniBand, RoCE, Omni-Path, iWARP and EFA) Libraries developed at NBCL/OSU <mvapich-discuss at lists.osu.edu>
> > > Subject: Re: [Mvapich-discuss] mvapich2 3.0a build failure finding pmi2.h
> > >
> > > Hi Cristof,
> > >
> > >
> > >
> > > Thanks for the bug report. Looks like our configure script is checking for the file slurm/pmi2.h​ while the actual file includes pmi.h​ without the slurm prefix. This include was changed in 3.0 to support non-slurm installations of pmi2. The easiest way to solve this is to set the environment variable CPATH=/usr/include/slurm:$CPATH (or using whatever path your slurm include directory resides on). Alternatively, you can add MPICHLIB_CFLAGS=-I/usr/include/slurm​ as an argument to configure.
> > >
> > >
> > >
> > > Either of the above options should get you past the build error for now. We will also look into having our configure script automatically detect slurm based pmi installations and adapt the paths appropriately. As a final note, we do recomend using pmi1 with slurm when possible (as do the upstream developers of MPICH) as it is a more stable and standardized provider than pmi2, which was never completed. Slurm's --with-mpi=pmi2​ actually supports both the pmi1 and and pmi2 interface.
> > >
> > >
> > >
> > > Please let me know if you have any further issues building MVAPICH 3.0a.
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Nat
> > >
> > >
> > >
> > >
> > >
> > > ________________________________
> > >
> > > From: Mvapich-discuss <mvapich-discuss-bounces+shineman.5=osu.edu at lists.osu.edu> on behalf of christof.koehler--- via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
> > > Sent: Wednesday, May 10, 2023 04:06
> > > To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
> > > Subject: [Mvapich-discuss] mvapich2 3.0a build failure finding pmi2.h
> > >
> > >
> > >
> > > Hello everybody,
> > >
> > > we have an Cornelis Omni-Path system and therefore would like to start
> > > testing mvapich2 3.0a. However, I am currently unable to build it.
> > > I would like to add that building mvpaich2 2.3.7-1 with slurm/pmi2 on
> > > the same system works fine.
> > >
> > > For mvapich2 3.0a the configure step using
> > > FFLAGS=-fallow-argument-mismatch ./configure --with-pm=slurm --with-pmi=pmi2
> > > succeeds.
> > >
> > > But in the make step the build stops with
> > >
> > > make[2]: Entering directory '/backup1/build_temp/mvapich2-3.0a'
> > >   CC       src/mpi/attr/lib_libmpi_la-attr_delete.lo
> > > In file included from ./src/include/mpiimpl.h:130,
> > >                  from src/mpi/attr/attr_delete.c:6:
> > > ./src/include/mpir_pmi.h:18:10: fatal error: pmi2.h: No such file or
> > > directory
> > >    18 | #include <pmi2.h>
> > >       |          ^~~~~~~~
> > > compilation terminated.
> > >
> > > In fact we have  /usr/include/slurm/pmi2.h and this is apparently picked
> > > up correctly by configure:
> > >
> > > > grep pmi2.h configure.out
> > > checking slurm/pmi2.h usability... yes
> > > checking slurm/pmi2.h presence... yes
> > > checking for slurm/pmi2.h... yes
> > >
> > > So this is a bit confusing. Also, because according to the mailing list
> > > some people already built it and did not report such an error.
> > >
> > > I attach the gzipped config.log, the configure stdout (configure.out) and
> > > the stdout of the make step (make.out) to this email.
> > >
> > > Best Regards
> > >
> > > Christof
> > >
> > > --
> > > Dr. rer. nat. Christof Köhler       email: c.koehler at uni-bremen.de
> > > Universitaet Bremen/FB1/BCCMS       phone:  +49-(0)421-218-62334
> > > Am Fallturm 1/ TAB/ Raum 3.06       fax: +49-(0)421-218-62770
> > > 28359 Bremen
> >
> > > _______________________________________________
> > > Mvapich-discuss mailing list
> > > Mvapich-discuss at lists.osu.edu
> > > https://lists.osu.edu/mailman/listinfo/mvapich-discuss
> >
> >
> > --
> > Dr. rer. nat. Christof Köhler       email: c.koehler at uni-bremen.de
> > Universitaet Bremen/FB1/BCCMS       phone:  +49-(0)421-218-62334
> > Am Fallturm 1/ TAB/ Raum 3.06       fax: +49-(0)421-218-62770
> > 28359 Bremen
>
>
>
> > _______________________________________________
> > Mvapich-discuss mailing list
> > Mvapich-discuss at lists.osu.edu
> > https://lists.osu.edu/mailman/listinfo/mvapich-discuss
>
>
> --
> Dr. rer. nat. Christof Köhler       email: c.koehler at uni-bremen.de
> Universitaet Bremen/FB1/BCCMS       phone:  +49-(0)421-218-62334
> Am Fallturm 1/ TAB/ Raum 3.06       fax: +49-(0)421-218-62770
> 28359 Bremen



--
Dr. rer. nat. Christof Köhler       email: c.koehler at uni-bremen.de
Universitaet Bremen/FB1/BCCMS       phone:  +49-(0)421-218-62334
Am Fallturm 1/ TAB/ Raum 3.06       fax: +49-(0)421-218-62770
28359 Bremen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20230511/012f3e81/attachment-0006.html>


More information about the Mvapich-discuss mailing list