[mvapich-discuss] MPI communication problem with mvapich2-1.8a1p1

Nirmal Seenu nirmal at fnal.gov
Fri Jan 27 15:57:57 EST 2012


I am having trouble running the Intel MPI Benchmark(IMB_3.2.3 where I 
run IMB-MPI1 without any options) on the latest version of MVAPICH2-1.8a1p1.

The MPI process gets launched properly on the worker nodes but the 
benchmark hangs within a few seconds after the launch and doesn't make 
any progress. I checked the infiniband fabric and everything is healthy. 
We mount Lustre over native IB on all the worker nodes and the lustre 
mounts are healthy as well.

This reproducible on MVAPICH2 compiled with GCC and PGI compiler 11.7 as 
well.

Details about the installation:

The worker nodes run RHEL 5.3 with the latest kernel 2.6.18-274.17.1.el5 
and we use the Infiniband drivers that are distributed as a part of the 
kernel.

MVAPICH2 gcc version was compiled with the following compiler:
gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-50)

The following were the options used to compile the MVAPICH2 and the MPIEXEC:

export CC=gcc
export CXX=g++
export F77=gfortran
export FC=gfortran

export CFLAGS=-mcmodel=medium
export CXXFLAGS=-mcmodel=medium
export FFLAGS=-mcmodel=medium
export FCFLAGS=-mcmodel=medium
export LDFLAGS=-mcmodel=medium

MVAPICH2:
./configure --prefix=/usr/local/mvapich2-1.8a1p1-gcc --enable-fast 
--enable-f77 --enable-fc --enable-cxx --enable-romio --enable-pmiport 
--enable-mpe --with-pm=mpd --with-pmi=simple --with-thread-package 
--with-hwloc

MPIEXEC:
./configure --prefix=/usr/local/mvapich2-1.8a1p1-gcc 
--with-pbs=/usr/local/pbs 
--with-mpicc=/usr/local/mvapich2-1.8a1p1-gcc/bin/mpicc 
--with-mpicxx=/usr/local/mvapich2-1.8a1p1-gcc/bin/mpicxx 
--with-mpif77=/usr/local/mvapich2-1.8a1p1-gcc/bin/mpif77 
--with-mpif90=/usr/local/mvapich2-1.8a1p1-gcc/bin/mpif90 
--disable-mpich-gm --disable-mpich-p4 --disable-mpich-rai 
--with-default-comm=pmi

I was able to run the Intel MPI Benchmark using the following versions 
of MVAPICH2 that was compiled with the same version of gcc:
mvapich2-1.2p1
mvapich2-1.5
mvapich2-1.6rc2
mvapich2-1.6-r4751

I will be more than happy to provide more details if needed. Thanks in 
advance for looking into this problem.

Nirmal


More information about the mvapich-discuss mailing list