[mvapich-discuss] Fail to run MPI program using MVAPICH2-1.5.1

Jonathan Perkins perkinjo at cse.ohio-state.edu
Wed Sep 29 03:20:54 EDT 2010


2010/9/28 Ting-jen Yen <yentj at infowrap.com.tw>:
> Hello,
>
>  Correction to my own previous post.  MVAPICH2 1.5.1p1 does
> work with my simple "hello world" test program.  However, it still
> have problem with the LINPACK benchmark program comes with
> Intel MKL.  So it probably has nothing to do with
> "--enable-romio --with-file-system=lustre" parameters.
>
> I ran the program using:
> mpirun_rsh -np 4 hc86 hc86 hc87 hc87 ./xhpl
> The program would just hang there.
>
> Related processes on the first node (using "ps auxw" )
> ------------------------------------------------------
> test001 29529  0.0  0.0  21112   760 pts/4    S+   10:56   0:00
> mpirun_rsh -np 4 hc86 hc86 hc87 hc87 ./xhpl
> test001 29531  0.0  0.0  63832  1092 pts/4    S+   10:56
> 0:00 /bin/bash -c cd /home/test001/pbs-test/mpi/linpack; /usr/bin
> test001 29532  0.0  0.0  58372  3224 pts/4    S+   10:56
> 0:00 /usr/bin/ssh -q hc87 cd /home/test001/pbs-test/mpi/linpack;
> test001 29533  0.0  0.0  23180   904 pts/4    S+   10:56
> 0:00 /opt/mvapich2-1.5.1p1/bin/mpispawn 0
> test001 29536 99.6  0.5  81352 41664 pts/4    RLl+ 10:56   0:59 ./xhpl
> test001 29537  100  0.4  80824 37000 pts/4    RLl+ 10:56   1:00 ./xhpl
> -------------------------------------------------------
> Related processes on second nodes:
> ----------------------------------------------------
> test001 12578  0.0  0.0  63832  1096 ?        Ss   10:56   0:00 bash -c
> cd /home/test001/pbs-test/mpi/linpack; /usr/bin/env
> test001 12579  0.1  0.0  23180   896 ?        S    10:56
> 0:00 /opt/mvapich2-1.5.1p1/bin/mpispawn 0
> test001 12580  0.0  0.4  74108 37052 ?        SLl  10:56   0:00 ./xhpl
> test001 12581  0.0  0.4  74108 37052 ?        SLl  10:56   0:00 ./xhpl
> ----------------------------------------------------
>
> If I switched to MVAPICH2 1.2p1, using the same configure parameters,
> the produced program does work.  The configureation for both MVAPICH2
> is:
> --prefix=/opt/mvapich2-version --with-rdma=gen2
> --with-ib-include=/usr/include --with-ib-libpath=/usr/lib64 CC=icc
> CXX=icpc FC=ifort F77=ifort F90=ifort
>
> Any idea what might have cause this?
> How do I produce backtrace of the mpi process?

In this case you will need to ssh into both hc86 and hc87.  On each
node you'll want to get the pid of the xhpl process using ps or some
other method.  You can then use `gdb attach <pid of xhpl>' and then
when inside the gdb shell use the command `thread apply all bt'.
You'll want to gather this for each of the xhpl processes.

-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo



More information about the mvapich-discuss mailing list