[mvapich-discuss] Fail to run MPI program using MVAPICH2-1.5.1

Ting-jen Yen yentj at infowrap.com.tw
Tue Sep 28 02:48:37 EDT 2010


We are setting up a cluster with InfiniBand interconnection.
The OS we are using is CentOS 5.4, along with the OpenIB
driver coming with it.

We managed to compile MVAPICH2 1.5.1 without any problem.  But
when we used this MVAPICH2 to comile a simple "hello world" MPI
program and tried to run it, the program just hanged there
if we used more than one machines. (It ran OK if using only
one machine.)  When we checked processes using 'ps', we noticed
that processes of the MPI program on the first machine was using
almost 100% CPU time, while those on the rest machines was
using 0% CPU time.  It seems that the program stopped at "MPI_Init"
function.

We tried MVAPICH 1.1 as well as older version of MVAPICH2, 1.2p1.
These two does not have the same problem, and is working fine.

And idea what may cause such problem?

(The compiler we used is Intel Compiler V11.1.  I do not have
the detail of InfiniBand HCA right now, though according to 'lspci'
command, it is with "Mellanox MT25208 InfiniHost III Ex" chip.)

Thanks,

Ting-jen



More information about the mvapich-discuss mailing list