[Mvapich-discuss] mvapich2 3.0a build failure finding pmi2.h

Shineman, Nat shineman.5 at osu.edu
Wed May 10 11:26:07 EDT 2023


Hi ZQ,

This looks to be a newer version of pmi2.h. Older version's of slurm's PMI2 had that uncommented which allowed us to use PMI_keyval_t​ in both pmi1 and pmi2 builds. We have seen a similar issues with Cray pmi support recently. l will take a look to see how this can best be resolved to work with both versions. Can you tell me what version of Slurm you are using?

Thanks,
Nat
________________________________
From: You, Zhi-Qiang <zyou at osc.edu>
Sent: Wednesday, May 10, 2023 11:19
To: Shineman, Nat <shineman.5 at osu.edu>; Announcement about MVAPICH2 (MPI over InfiniBand, RoCE, Omni-Path, iWARP and EFA) Libraries developed at NBCL/OSU <mvapich-discuss at lists.osu.edu>
Subject: Re: [Mvapich-discuss] mvapich2 3.0a build failure finding pmi2.h


Hello,



I was able to fix the "pmi2.h" error by adding `--with-slurm-include=/usr/include/slurm` to the configure command. However, I encountered another error:


>> 9796    src/util/mpir_pmi.c(781): error: identifier "PMI_keyval_t" is undefined

     9797      static int mpi_to_pmi_keyvals(MPIR_Info * info_ptr, PMI_keyval_t ** kv_ptr, int *nkeys_ptr);

     9798                                                          ^

     9799

  >> 9800    src/util/mpir_pmi.c(782): error: identifier "PMI_keyval_t" is undefined

     9801      static void free_pmi_keyvals(PMI_keyval_t ** kv, int size, int *counts);

     9802                                   ^



Upon checking the pmi2.h file, I found the following:


/* This is here to allow spawn multiple functions to compile.  This

   needs to be removed once those functions are fixed for pmi2 */

/*

typedef struct PMI_keyval_t

{

    char * key;

    char * val;

} PMI_keyval_t;

*/



The “PMI_keyval_t" struct is commented out. We are using the same SLURM version for other mvapich2 installations. Does this mean we need to upgrade SLURM or PMI2?


Thank you,

-ZQ



From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> on behalf of Shineman, Nat via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Date: Wednesday, May 10, 2023 at 9:20 AM
To: christof.koehler at bccms.uni-bremen.de <christof.koehler at bccms.uni-bremen.de>, c.koehler at uni-bremen.de <c.koehler at uni-bremen.de>, Announcement about MVAPICH2 (MPI over InfiniBand, RoCE, Omni-Path, iWARP and EFA) Libraries developed at NBCL/OSU <mvapich-discuss at lists.osu.edu>
Subject: Re: [Mvapich-discuss] mvapich2 3.0a build failure finding pmi2.h

Hi Cristof,



Thanks for the bug report. Looks like our configure script is checking for the file slurm/pmi2.h​ while the actual file includes pmi.h​ without the slurm prefix. This include was changed in 3.0 to support non-slurm installations of pmi2. The easiest way to solve this is to set the environment variable CPATH=/usr/include/slurm:$CPATH (or using whatever path your slurm include directory resides on). Alternatively, you can add MPICHLIB_CFLAGS=-I/usr/include/slurm​ as an argument to configure.



Either of the above options should get you past the build error for now. We will also look into having our configure script automatically detect slurm based pmi installations and adapt the paths appropriately. As a final note, we do recomend using pmi1 with slurm when possible (as do the upstream developers of MPICH) as it is a more stable and standardized provider than pmi2, which was never completed. Slurm's --with-mpi=pmi2​ actually supports both the pmi1 and and pmi2 interface.



Please let me know if you have any further issues building MVAPICH 3.0a.



Thanks,

Nat





________________________________

From: Mvapich-discuss <mvapich-discuss-bounces+shineman.5=osu.edu at lists.osu.edu> on behalf of christof.koehler--- via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Sent: Wednesday, May 10, 2023 04:06
To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
Subject: [Mvapich-discuss] mvapich2 3.0a build failure finding pmi2.h



Hello everybody,

we have an Cornelis Omni-Path system and therefore would like to start
testing mvapich2 3.0a. However, I am currently unable to build it.
I would like to add that building mvpaich2 2.3.7-1 with slurm/pmi2 on
the same system works fine.

For mvapich2 3.0a the configure step using
FFLAGS=-fallow-argument-mismatch ./configure --with-pm=slurm --with-pmi=pmi2
succeeds.

But in the make step the build stops with

make[2]: Entering directory '/backup1/build_temp/mvapich2-3.0a'
  CC       src/mpi/attr/lib_libmpi_la-attr_delete.lo
In file included from ./src/include/mpiimpl.h:130,
                 from src/mpi/attr/attr_delete.c:6:
./src/include/mpir_pmi.h:18:10: fatal error: pmi2.h: No such file or
directory
   18 | #include <pmi2.h>
      |          ^~~~~~~~
compilation terminated.

In fact we have  /usr/include/slurm/pmi2.h and this is apparently picked
up correctly by configure:

> grep pmi2.h configure.out
checking slurm/pmi2.h usability... yes
checking slurm/pmi2.h presence... yes
checking for slurm/pmi2.h... yes

So this is a bit confusing. Also, because according to the mailing list
some people already built it and did not report such an error.

I attach the gzipped config.log, the configure stdout (configure.out) and
the stdout of the make step (make.out) to this email.

Best Regards

Christof

--
Dr. rer. nat. Christof Köhler       email: c.koehler at uni-bremen.de
Universitaet Bremen/FB1/BCCMS       phone:  +49-(0)421-218-62334
Am Fallturm 1/ TAB/ Raum 3.06       fax: +49-(0)421-218-62770
28359 Bremen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20230510/459bbe7b/attachment-0006.html>


More information about the Mvapich-discuss mailing list