[Mvapich-discuss] mvapich2 3.0a build failure finding pmi2.h

christof.koehler at bccms.uni-bremen.de christof.koehler at bccms.uni-bremen.de
Thu May 11 08:13:35 EDT 2023


Hello everybody,

I can confirm that 3.0b build succeeds without any bandaids for
slurm/pmi2.h up to the error first reported by Zhi-Qiang You.

The pmi1 build appears to succeed completely with 3.0b, no error
messages. I did not check if libnl-3 dependency detection is working 
in cofigure in 3.0b now, but I assume it does.

Thank you !

Is the internal libfabric in 3.0b still the ancient 1.15.1
version ? We would like to uses newer versions 1.18.0 and up via the
--with-libfabric option. Would it be possible to dynamically switch
libfabric versions after build by changing LD_LIBRARY_PATH or similar to
point to the desired version ?

Finally, as far as I can see the 3.0a (or 3.0b if there was an update)
Quick Start Guide pdf is not linked from the download page ? Would it be
possible to link to it there ? I only found it using google by accident.


Best Regards

Christof

On Wed, May 10, 2023 at 05:57:25PM +0200, christof.koehler--- via Mvapich-discuss wrote:
> Hello,
> 
> the slurm version here is 22.05.8.
> 
> I can confirm that prefixing with "MPICHLIB_CFLAGS=-I/usr/include/slurm"
> during configure changes the error in make to 
> 
>   CC       src/util/lib_libmpi_la-mpir_pmi.lo
> src/util/mpir_pmi.c:781:53: error: unknown type name ‘PMI_keyval_t’
>   781 | static int mpi_to_pmi_keyvals(MPIR_Info * info_ptr, PMI_keyval_t
> ** kv_ptr, int *nkeys_ptr);
>       |                                                     ^~~~~~~~~~~~
> src/util/mpir_pmi.c:782:30: error: unknown type name ‘PMI_keyval_t’
>   782 | static void free_pmi_keyvals(PMI_keyval_t ** kv, int size, int
> *counts);
>       |                              ^~~~~~~~~~~~
> src/util/mpir_pmi.c:1130:53: error: unknown type name ‘PMI_keyval_t’
>  1130 | static int mpi_to_pmi_keyvals(MPIR_Info * info_ptr, PMI_keyval_t
> ** kv_ptr, int *nkeys_ptr)
>       |                                                     ^~~~~~~~~~~~
> src/util/mpir_pmi.c:1168:30: error: unknown type name ‘PMI_keyval_t’
>  1168 | static void free_pmi_keyvals(PMI_keyval_t ** kv, int size, int
> *counts)
> 
> I am not sure why the mvapich2 2.3.7-1 build against pmi2, same
> configure arguments, apparently worked. Do you want the logs?
> 
> When I change from --with-pmi=pmi2 to --with-pmi=pmi1 when building 3.0a
> the build fails with
> 
>   MOD      src/binding/fortran/use_mpi/mpi_base.mod-stamp
>   MOD      src/binding/fortran/use_mpi/mpi.mod-stamp
>   GEN      lib/libmpi.la
> /usr/bin/ld: cannot find -lnl-3
> /usr/bin/ld: cannot find -lnl-route-3
> collect2: error: ld returned 1 exit status
> make[2]: *** [Makefile:20852: lib/libmpi.la] Error 1
> make[2]: Leaving directory '/backup1/build_temp/mvapich2-3.0a'
> make[1]: *** [Makefile:50962: all-recursive] Error 1
> make[1]: Leaving directory '/backup1/build_temp/mvapich2-3.0a'
> make: *** [Makefile:13232: all] Error 2
> 
> although configure was happy. Apparently it does not test if the
> library headers are installed.
> 
> After installing libnl3-devel the build of 3.0a succeeds for pmi1.
> 
> 
> Best Regards
> 
> Christof
> 
> 
> On Wed, May 10, 2023 at 03:26:07PM +0000, Shineman, Nat via Mvapich-discuss wrote:
> > Hi ZQ,
> > 
> > This looks to be a newer version of pmi2.h. Older version's of slurm's PMI2 had that uncommented which allowed us to use PMI_keyval_t​ in both pmi1 and pmi2 builds. We have seen a similar issues with Cray pmi support recently. l will take a look to see how this can best be resolved to work with both versions. Can you tell me what version of Slurm you are using?
> > 
> > Thanks,
> > Nat
> > ________________________________
> > From: You, Zhi-Qiang <zyou at osc.edu>
> > Sent: Wednesday, May 10, 2023 11:19
> > To: Shineman, Nat <shineman.5 at osu.edu>; Announcement about MVAPICH2 (MPI over InfiniBand, RoCE, Omni-Path, iWARP and EFA) Libraries developed at NBCL/OSU <mvapich-discuss at lists.osu.edu>
> > Subject: Re: [Mvapich-discuss] mvapich2 3.0a build failure finding pmi2.h
> > 
> > 
> > Hello,
> > 
> > 
> > 
> > I was able to fix the "pmi2.h" error by adding `--with-slurm-include=/usr/include/slurm` to the configure command. However, I encountered another error:
> > 
> > 
> > >> 9796    src/util/mpir_pmi.c(781): error: identifier "PMI_keyval_t" is undefined
> > 
> >      9797      static int mpi_to_pmi_keyvals(MPIR_Info * info_ptr, PMI_keyval_t ** kv_ptr, int *nkeys_ptr);
> > 
> >      9798                                                          ^
> > 
> >      9799
> > 
> >   >> 9800    src/util/mpir_pmi.c(782): error: identifier "PMI_keyval_t" is undefined
> > 
> >      9801      static void free_pmi_keyvals(PMI_keyval_t ** kv, int size, int *counts);
> > 
> >      9802                                   ^
> > 
> > 
> > 
> > Upon checking the pmi2.h file, I found the following:
> > 
> > 
> > /* This is here to allow spawn multiple functions to compile.  This
> > 
> >    needs to be removed once those functions are fixed for pmi2 */
> > 
> > /*
> > 
> > typedef struct PMI_keyval_t
> > 
> > {
> > 
> >     char * key;
> > 
> >     char * val;
> > 
> > } PMI_keyval_t;
> > 
> > */
> > 
> > 
> > 
> > The “PMI_keyval_t" struct is commented out. We are using the same SLURM version for other mvapich2 installations. Does this mean we need to upgrade SLURM or PMI2?
> > 
> > 
> > Thank you,
> > 
> > -ZQ
> > 
> > 
> > 
> > From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> on behalf of Shineman, Nat via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
> > Date: Wednesday, May 10, 2023 at 9:20 AM
> > To: christof.koehler at bccms.uni-bremen.de <christof.koehler at bccms.uni-bremen.de>, c.koehler at uni-bremen.de <c.koehler at uni-bremen.de>, Announcement about MVAPICH2 (MPI over InfiniBand, RoCE, Omni-Path, iWARP and EFA) Libraries developed at NBCL/OSU <mvapich-discuss at lists.osu.edu>
> > Subject: Re: [Mvapich-discuss] mvapich2 3.0a build failure finding pmi2.h
> > 
> > Hi Cristof,
> > 
> > 
> > 
> > Thanks for the bug report. Looks like our configure script is checking for the file slurm/pmi2.h​ while the actual file includes pmi.h​ without the slurm prefix. This include was changed in 3.0 to support non-slurm installations of pmi2. The easiest way to solve this is to set the environment variable CPATH=/usr/include/slurm:$CPATH (or using whatever path your slurm include directory resides on). Alternatively, you can add MPICHLIB_CFLAGS=-I/usr/include/slurm​ as an argument to configure.
> > 
> > 
> > 
> > Either of the above options should get you past the build error for now. We will also look into having our configure script automatically detect slurm based pmi installations and adapt the paths appropriately. As a final note, we do recomend using pmi1 with slurm when possible (as do the upstream developers of MPICH) as it is a more stable and standardized provider than pmi2, which was never completed. Slurm's --with-mpi=pmi2​ actually supports both the pmi1 and and pmi2 interface.
> > 
> > 
> > 
> > Please let me know if you have any further issues building MVAPICH 3.0a.
> > 
> > 
> > 
> > Thanks,
> > 
> > Nat
> > 
> > 
> > 
> > 
> > 
> > ________________________________
> > 
> > From: Mvapich-discuss <mvapich-discuss-bounces+shineman.5=osu.edu at lists.osu.edu> on behalf of christof.koehler--- via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
> > Sent: Wednesday, May 10, 2023 04:06
> > To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
> > Subject: [Mvapich-discuss] mvapich2 3.0a build failure finding pmi2.h
> > 
> > 
> > 
> > Hello everybody,
> > 
> > we have an Cornelis Omni-Path system and therefore would like to start
> > testing mvapich2 3.0a. However, I am currently unable to build it.
> > I would like to add that building mvpaich2 2.3.7-1 with slurm/pmi2 on
> > the same system works fine.
> > 
> > For mvapich2 3.0a the configure step using
> > FFLAGS=-fallow-argument-mismatch ./configure --with-pm=slurm --with-pmi=pmi2
> > succeeds.
> > 
> > But in the make step the build stops with
> > 
> > make[2]: Entering directory '/backup1/build_temp/mvapich2-3.0a'
> >   CC       src/mpi/attr/lib_libmpi_la-attr_delete.lo
> > In file included from ./src/include/mpiimpl.h:130,
> >                  from src/mpi/attr/attr_delete.c:6:
> > ./src/include/mpir_pmi.h:18:10: fatal error: pmi2.h: No such file or
> > directory
> >    18 | #include <pmi2.h>
> >       |          ^~~~~~~~
> > compilation terminated.
> > 
> > In fact we have  /usr/include/slurm/pmi2.h and this is apparently picked
> > up correctly by configure:
> > 
> > > grep pmi2.h configure.out
> > checking slurm/pmi2.h usability... yes
> > checking slurm/pmi2.h presence... yes
> > checking for slurm/pmi2.h... yes
> > 
> > So this is a bit confusing. Also, because according to the mailing list
> > some people already built it and did not report such an error.
> > 
> > I attach the gzipped config.log, the configure stdout (configure.out) and
> > the stdout of the make step (make.out) to this email.
> > 
> > Best Regards
> > 
> > Christof
> > 
> > --
> > Dr. rer. nat. Christof Köhler       email: c.koehler at uni-bremen.de
> > Universitaet Bremen/FB1/BCCMS       phone:  +49-(0)421-218-62334
> > Am Fallturm 1/ TAB/ Raum 3.06       fax: +49-(0)421-218-62770
> > 28359 Bremen
> 
> > _______________________________________________
> > Mvapich-discuss mailing list
> > Mvapich-discuss at lists.osu.edu
> > https://lists.osu.edu/mailman/listinfo/mvapich-discuss
> 
> 
> -- 
> Dr. rer. nat. Christof Köhler       email: c.koehler at uni-bremen.de
> Universitaet Bremen/FB1/BCCMS       phone:  +49-(0)421-218-62334
> Am Fallturm 1/ TAB/ Raum 3.06       fax: +49-(0)421-218-62770
> 28359 Bremen  



> _______________________________________________
> Mvapich-discuss mailing list
> Mvapich-discuss at lists.osu.edu
> https://lists.osu.edu/mailman/listinfo/mvapich-discuss


-- 
Dr. rer. nat. Christof Köhler       email: c.koehler at uni-bremen.de
Universitaet Bremen/FB1/BCCMS       phone:  +49-(0)421-218-62334
Am Fallturm 1/ TAB/ Raum 3.06       fax: +49-(0)421-218-62770
28359 Bremen  
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20230511/cf3a9248/attachment-0006.sig>


More information about the Mvapich-discuss mailing list