[mvapich-discuss] Problems installing mvapich2/2.3 with Slurm

Carlson, Timothy S Timothy.Carlson at pnnl.gov
Mon Mar 11 17:20:48 EDT 2019


That include file should be in the tarball. At least it is there in 2.3.1

# tar ztf mvapich2-2.3.1.tar.gz | grep ibv_param.h
mvapich2-2.3.1/src/mpid/ch3/channels/mrail/src/gen2/ibv_param.h

From: mvapich-discuss <mvapich-discuss-bounces at cse.ohio-state.edu> On Behalf Of Raghu Reddy
Sent: Monday, March 11, 2019 2:17 PM
To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Cc: Brian Osmond <brian.osmond at noaa.gov>; 'Kyle Stern' <kstern at redlineperf.com>
Subject: [mvapich-discuss] Problems installing mvapich2/2.3 with Slurm

Hi all,

Here is the info on the hardware:


  *   Intel haswell processors, with 2 12 cores sockets (for  a total of 24 cores/node)
  *   Intel TruScale IB network

I am using the following configure line for building with the Intel compiler (intel/18.0.3.222):

./configure --prefix=$INSTALLDIR --with-device=ch3:psm --with-ib-libpath=/usr/lib64 --with-rdma=gen2 --enable-romio=yes --enable-shared -enable-fortran=yes --with-pm=slurm --with-pmi=pmi2 --with-slurm=/apps/slurm/default CC=icc CXX=icpc F77=ifort FC=ifort | & tee configure-ch3.out-rr

I get the following error at make:

----------------
  CC       src/mpid/ch3/channels/common/src/util/lib_libmpi_la-mv2_config.lo
  CC       src/mpid/ch3/channels/common/src/util/lib_libmpi_la-error_handling.lo
  CC       src/mpid/ch3/channels/common/src/util/lib_libmpi_la-debug_utils.lo
  CC       src/mpid/ch3/channels/common/src/util/lib_libmpi_la-mv2_clock.lo
  CC       src/mpid/ch3/channels/common/src/ft/lib_libmpi_la-cr.lo
src/mpid/ch3/channels/common/src/ft/cr.c(19): catastrophic error: cannot open source file "ibv_param.h"
  #include "ibv_param.h"
                        ^

compilation aborted for src/mpid/ch3/channels/common/src/ft/cr.c (code 4)
make[2]: *** [src/mpid/ch3/channels/common/src/ft/lib_libmpi_la-cr.lo] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: Leaving directory `/tds_scratch3/SYSADMIN/nesccmgmt/Raghu.Reddy/apps/mvapich2-2.3'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/tds_scratch3/SYSADMIN/nesccmgmt/Raghu.Reddy/apps/mvapich2-2.3'
make: *** [all] Error 2
sfe01%
----------------

If I leave out "--with-device=ch3:psm" it completes the build process, but when I run a test code I get the following error:

sfe01% srun --ntasks=4 --ntasks-per-node=2 ./a.out
[s0014:mpi_rank_0][rdma_find_network_type] QLogic IB card detected in system
[s0014:mpi_rank_0][rdma_find_network_type] Please re-configure the library with the '--with-device=ch3:psm' configure option for best performance
[s0014:mpi_rank_1][rdma_find_network_type] QLogic IB card detected in system
[s0014:mpi_rank_1][rdma_find_network_type] Please re-configure the library with the '--with-device=ch3:psm' configure option for best performance
[s0015:mpi_rank_2][rdma_find_network_type] QLogic IB card detected in system
[s0015:mpi_rank_2][rdma_find_network_type] Please re-configure the library with the '--with-device=ch3:psm' configure option for best performance
[s0015:mpi_rank_3][rdma_find_network_type] QLogic IB card detected in system
[s0015:mpi_rank_3][rdma_find_network_type] Please re-configure the library with the '--with-device=ch3:psm' configure option for best performance
Warning: RDMA CM Initialization failed. Continuing without RDMA CM support. Please set MV2_USE_RDMA_CM=0 to disable RDMA CM.
Hello from rank 00 out of 4; procname = s0014, cpuid = 0
Hello from rank 02 out of 4; procname = s0015, cpuid = 0
Hello from rank 01 out of 4; procname = s0014, cpuid = 1
Hello from rank 03 out of 4; procname = s0015, cpuid = 1
sfe01%

I believe "--with-device=ch3:psm" is the right thing to do for this architecture, but I am not able to get past the step above.

I do see that the file exist in the distribution, not sure why it is not finding it:

sfe01% find . -name ibv_param.h
./src/mpid/ch3/channels/mrail/src/gen2/ibv_param.h
sfe01%

Any suggestions on what I may be doing wrong?

Thanks,
Raghu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190311/f0d5e1ed/attachment-0001.html>


More information about the mvapich-discuss mailing list