[mvapich-discuss] Fail to run MPI program using MVAPICH2-1.5.1

Jonathan Perkins perkinjo at cse.ohio-state.edu
Tue Sep 28 10:12:55 EDT 2010


Hello, can you provide us with the backtrace of the mpi process(es)?
Also, I'd like to know how these are being launched (which launcher,
number of processes, etc...) and which processes you actually see
running each machine.  Thanks.

On Tue, Sep 28, 2010 at 2:48 AM, Ting-jen Yen <yentj at infowrap.com.tw> wrote:
>
> We are setting up a cluster with InfiniBand interconnection.
> The OS we are using is CentOS 5.4, along with the OpenIB
> driver coming with it.
>
> We managed to compile MVAPICH2 1.5.1 without any problem.  But
> when we used this MVAPICH2 to comile a simple "hello world" MPI
> program and tried to run it, the program just hanged there
> if we used more than one machines. (It ran OK if using only
> one machine.)  When we checked processes using 'ps', we noticed
> that processes of the MPI program on the first machine was using
> almost 100% CPU time, while those on the rest machines was
> using 0% CPU time.  It seems that the program stopped at "MPI_Init"
> function.
>
> We tried MVAPICH 1.1 as well as older version of MVAPICH2, 1.2p1.
> These two does not have the same problem, and is working fine.
>
> And idea what may cause such problem?
>
> (The compiler we used is Intel Compiler V11.1.  I do not have
> the detail of InfiniBand HCA right now, though according to 'lspci'
> command, it is with "Mellanox MT25208 InfiniHost III Ex" chip.)
>
> Thanks,
>
> Ting-jen
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo



More information about the mvapich-discuss mailing list