[mvapich-discuss] Problems installing mvapich2/2.3 with Slurm

Peter Kjellström cap at nsc.liu.se
Wed Mar 13 09:21:57 EDT 2019


On Tue, 12 Mar 2019 16:52:14 -0400
Raghu Reddy <raghu.reddy at noaa.gov> wrote:
...
> mpicc -o hello_c
> /tds_scratch3/SYSADMIN/nesccmgmt/Raghu.Reddy/Testsuite3/hello/hello_mpi_c.c
> 
> mpiexec -np 24 ./hello_c
> 
> s0014.110678hfi_wait_for_device: The /dev/hfi1_0 device failed to
> appear after 15.0 seconds: Connection timed out

The above message looks for OPA..

On a system with truescale (PSM) and the following relevant psm
packages installed:

 $ rpm -qa | grep psm
 infinipath-psm-devel-3.0.1-115.1015_open.2_nsc1.el6.x86_64
 psmisc-22.6-24.el6.x86_64
 infinipath-psm-3.0.1-115.1015_open.2_nsc1.el6.x86_64

 NOTE: not psm2


I did:

 module load buildenv-intel/2018u1
 wget http://.../mvapich2-2.3.1.tar.gz
 tar xf mvapich2-2.3.1.tar.gz 1003  cd mvapich2-2.3.1/
 ./configure --prefix=/home/cap/mpiinst/mvapich2-2.3.1_psm
 --with-device=ch3:psm CC=icc CXX=icpc FC=ifort
 make -j 8
 make install

I works fine both with mpiexec and mpirun in a slurm job using my
choice of hello world:

 $ export PATH=/home/cap/mpiinst/mvapich2-2.3.1_psm/bin:$PATH
 $ mpicc -o mdrbench_mvp.x mdrbench.c

 # in slurm -N2 -n32 job shell
 $ unset PSM_RANKS_PER_CONTEXT
 $ mpirun ./mdrbench_mvp.x 
 CPU timing results: iter/us (rank0/mean): 161/161
 Setting load to: 0%
 1D dim geometry is: 32
 ...

/Peter K


More information about the mvapich-discuss mailing list