[mvapich-discuss] Problems installing mvapich2/2.3 with Slurm
Raghu Reddy
raghu.reddy at noaa.gov
Mon Mar 11 17:16:48 EDT 2019
Hi all,
Here is the info on the hardware:
- Intel haswell processors, with 2 12 cores sockets (for a total of
24 cores/node)
- Intel TruScale IB network
I am using the following configure line for building with the Intel compiler
(intel/18.0.3.222):
./configure --prefix=$INSTALLDIR --with-device=ch3:psm
--with-ib-libpath=/usr/lib64 --with-rdma=gen2 --enable-romio=yes
--enable-shared -enable-fortran=yes --with-pm=slurm --with-pmi=pmi2
--with-slurm=/apps/slurm/default CC=icc CXX=icpc F77=ifort FC=ifort | & tee
configure-ch3.out-rr
I get the following error at make:
----------------
CC src/mpid/ch3/channels/common/src/util/lib_libmpi_la-mv2_config.lo
CC
src/mpid/ch3/channels/common/src/util/lib_libmpi_la-error_handling.lo
CC
src/mpid/ch3/channels/common/src/util/lib_libmpi_la-debug_utils.lo
CC src/mpid/ch3/channels/common/src/util/lib_libmpi_la-mv2_clock.lo
CC src/mpid/ch3/channels/common/src/ft/lib_libmpi_la-cr.lo
src/mpid/ch3/channels/common/src/ft/cr.c(19): catastrophic error: cannot
open source file "ibv_param.h"
#include "ibv_param.h"
^
compilation aborted for src/mpid/ch3/channels/common/src/ft/cr.c (code 4)
make[2]: *** [src/mpid/ch3/channels/common/src/ft/lib_libmpi_la-cr.lo] Error
1
make[2]: *** Waiting for unfinished jobs....
make[2]: Leaving directory
`/tds_scratch3/SYSADMIN/nesccmgmt/Raghu.Reddy/apps/mvapich2-2.3'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory
`/tds_scratch3/SYSADMIN/nesccmgmt/Raghu.Reddy/apps/mvapich2-2.3'
make: *** [all] Error 2
sfe01%
----------------
If I leave out "--with-device=ch3:psm" it completes the build process, but
when I run a test code I get the following error:
sfe01% srun --ntasks=4 --ntasks-per-node=2 ./a.out
[s0014:mpi_rank_0][rdma_find_network_type] QLogic IB card detected in system
[s0014:mpi_rank_0][rdma_find_network_type] Please re-configure the library
with the '--with-device=ch3:psm' configure option for best performance
[s0014:mpi_rank_1][rdma_find_network_type] QLogic IB card detected in system
[s0014:mpi_rank_1][rdma_find_network_type] Please re-configure the library
with the '--with-device=ch3:psm' configure option for best performance
[s0015:mpi_rank_2][rdma_find_network_type] QLogic IB card detected in system
[s0015:mpi_rank_2][rdma_find_network_type] Please re-configure the library
with the '--with-device=ch3:psm' configure option for best performance
[s0015:mpi_rank_3][rdma_find_network_type] QLogic IB card detected in system
[s0015:mpi_rank_3][rdma_find_network_type] Please re-configure the library
with the '--with-device=ch3:psm' configure option for best performance
Warning: RDMA CM Initialization failed. Continuing without RDMA CM support.
Please set MV2_USE_RDMA_CM=0 to disable RDMA CM.
Hello from rank 00 out of 4; procname = s0014, cpuid = 0
Hello from rank 02 out of 4; procname = s0015, cpuid = 0
Hello from rank 01 out of 4; procname = s0014, cpuid = 1
Hello from rank 03 out of 4; procname = s0015, cpuid = 1
sfe01%
I believe "--with-device=ch3:psm" is the right thing to do for this
architecture, but I am not able to get past the step above.
I do see that the file exist in the distribution, not sure why it is not
finding it:
sfe01% find . -name ibv_param.h
./src/mpid/ch3/channels/mrail/src/gen2/ibv_param.h
sfe01%
Any suggestions on what I may be doing wrong?
Thanks,
Raghu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190311/944a3c36/attachment.html>
More information about the mvapich-discuss
mailing list