[mvapich-discuss] Problems installing mvapich2/2.3 with Slurm

Raghu Reddy raghu.reddy at noaa.gov
Mon Mar 11 17:16:48 EDT 2019


Hi all,

 

Here is the info on the hardware:

 

-        Intel haswell processors, with 2 12 cores sockets (for  a total of
24 cores/node)

-        Intel TruScale IB network

 

I am using the following configure line for building with the Intel compiler
(intel/18.0.3.222):

 

./configure --prefix=$INSTALLDIR --with-device=ch3:psm
--with-ib-libpath=/usr/lib64 --with-rdma=gen2 --enable-romio=yes
--enable-shared -enable-fortran=yes --with-pm=slurm --with-pmi=pmi2
--with-slurm=/apps/slurm/default CC=icc CXX=icpc F77=ifort FC=ifort | & tee
configure-ch3.out-rr

 

I get the following error at make:

 

----------------

  CC       src/mpid/ch3/channels/common/src/util/lib_libmpi_la-mv2_config.lo

  CC
src/mpid/ch3/channels/common/src/util/lib_libmpi_la-error_handling.lo

  CC
src/mpid/ch3/channels/common/src/util/lib_libmpi_la-debug_utils.lo

  CC       src/mpid/ch3/channels/common/src/util/lib_libmpi_la-mv2_clock.lo

  CC       src/mpid/ch3/channels/common/src/ft/lib_libmpi_la-cr.lo

src/mpid/ch3/channels/common/src/ft/cr.c(19): catastrophic error: cannot
open source file "ibv_param.h"

  #include "ibv_param.h"

                        ^

 

compilation aborted for src/mpid/ch3/channels/common/src/ft/cr.c (code 4)

make[2]: *** [src/mpid/ch3/channels/common/src/ft/lib_libmpi_la-cr.lo] Error
1

make[2]: *** Waiting for unfinished jobs....

make[2]: Leaving directory
`/tds_scratch3/SYSADMIN/nesccmgmt/Raghu.Reddy/apps/mvapich2-2.3'

make[1]: *** [all-recursive] Error 1

make[1]: Leaving directory
`/tds_scratch3/SYSADMIN/nesccmgmt/Raghu.Reddy/apps/mvapich2-2.3'

make: *** [all] Error 2

sfe01%

----------------

 

If I leave out "--with-device=ch3:psm" it completes the build process, but
when I run a test code I get the following error:

 

sfe01% srun --ntasks=4 --ntasks-per-node=2 ./a.out

[s0014:mpi_rank_0][rdma_find_network_type] QLogic IB card detected in system

[s0014:mpi_rank_0][rdma_find_network_type] Please re-configure the library
with the '--with-device=ch3:psm' configure option for best performance

[s0014:mpi_rank_1][rdma_find_network_type] QLogic IB card detected in system

[s0014:mpi_rank_1][rdma_find_network_type] Please re-configure the library
with the '--with-device=ch3:psm' configure option for best performance

[s0015:mpi_rank_2][rdma_find_network_type] QLogic IB card detected in system

[s0015:mpi_rank_2][rdma_find_network_type] Please re-configure the library
with the '--with-device=ch3:psm' configure option for best performance

[s0015:mpi_rank_3][rdma_find_network_type] QLogic IB card detected in system

[s0015:mpi_rank_3][rdma_find_network_type] Please re-configure the library
with the '--with-device=ch3:psm' configure option for best performance

Warning: RDMA CM Initialization failed. Continuing without RDMA CM support.
Please set MV2_USE_RDMA_CM=0 to disable RDMA CM.

Hello from rank 00 out of 4; procname = s0014, cpuid = 0

Hello from rank 02 out of 4; procname = s0015, cpuid = 0

Hello from rank 01 out of 4; procname = s0014, cpuid = 1

Hello from rank 03 out of 4; procname = s0015, cpuid = 1

sfe01%

 

I believe "--with-device=ch3:psm" is the right thing to do for this
architecture, but I am not able to get past the step above.

 

I do see that the file exist in the distribution, not sure why it is not
finding it:

 

sfe01% find . -name ibv_param.h

./src/mpid/ch3/channels/mrail/src/gen2/ibv_param.h

sfe01%

 

Any suggestions on what I may be doing wrong?

 

Thanks,

Raghu

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190311/944a3c36/attachment.html>


More information about the mvapich-discuss mailing list