[mvapich-discuss] Fail to run MPI program using MVAPICH2-1.5.1
Ting-jen Yen
yentj at infowrap.com.tw
Wed Sep 29 21:04:20 EDT 2010
2010-09-29 Jonathan Perkins <perkinjo at cse.ohio-state.edu>:
> 2010/9/28 Ting-jen Yen <yentj at infowrap.com.tw>:
> > Hello,
> >
> > Correction to my own previous post. MVAPICH2 1.5.1p1 does
> > work with my simple "hello world" test program. However, it still
> > have problem with the LINPACK benchmark program comes with
> > Intel MKL. So it probably has nothing to do with
> > "--enable-romio --with-file-system=lustre" parameters.
> >
> > I ran the program using:
> > mpirun_rsh -np 4 hc86 hc86 hc87 hc87 ./xhpl
> > The program would just hang there.
> >
> > Related processes on the first node (using "ps auxw" )
> > ------------------------------------------------------
> > test001 29529 0.0 0.0 21112 760 pts/4 S+ 10:56 0:00
> > mpirun_rsh -np 4 hc86 hc86 hc87 hc87 ./xhpl
> > test001 29531 0.0 0.0 63832 1092 pts/4 S+ 10:56
> > 0:00 /bin/bash -c cd /home/test001/pbs-test/mpi/linpack; /usr/bin
> > test001 29532 0.0 0.0 58372 3224 pts/4 S+ 10:56
> > 0:00 /usr/bin/ssh -q hc87 cd /home/test001/pbs-test/mpi/linpack;
> > test001 29533 0.0 0.0 23180 904 pts/4 S+ 10:56
> > 0:00 /opt/mvapich2-1.5.1p1/bin/mpispawn 0
> > test001 29536 99.6 0.5 81352 41664 pts/4 RLl+ 10:56 0:59 ./xhpl
> > test001 29537 100 0.4 80824 37000 pts/4 RLl+ 10:56 1:00 ./xhpl
> > -------------------------------------------------------
> > Related processes on second nodes:
> > ----------------------------------------------------
> > test001 12578 0.0 0.0 63832 1096 ? Ss 10:56 0:00 bash -c
> > cd /home/test001/pbs-test/mpi/linpack; /usr/bin/env
> > test001 12579 0.1 0.0 23180 896 ? S 10:56
> > 0:00 /opt/mvapich2-1.5.1p1/bin/mpispawn 0
> > test001 12580 0.0 0.4 74108 37052 ? SLl 10:56 0:00 ./xhpl
> > test001 12581 0.0 0.4 74108 37052 ? SLl 10:56 0:00 ./xhpl
> > ----------------------------------------------------
> >
> > If I switched to MVAPICH2 1.2p1, using the same configure parameters,
> > the produced program does work. The configureation for both MVAPICH2
> > is:
> > --prefix=/opt/mvapich2-version --with-rdma=gen2
> > --with-ib-include=/usr/include --with-ib-libpath=/usr/lib64 CC=icc
> > CXX=icpc FC=ifort F77=ifort F90=ifort
> >
> > Any idea what might have cause this?
> > How do I produce backtrace of the mpi process?
>
> In this case you will need to ssh into both hc86 and hc87. On each
> node you'll want to get the pid of the xhpl process using ps or some
> other method. You can then use `gdb attach <pid of xhpl>' and then
> when inside the gdb shell use the command `thread apply all bt'.
> You'll want to gather this for each of the xhpl processes.
>
Thanks. I did get the backtrace of the mpi processes.
When I ran a simple hello-world MPI program with 2 processes, both
backtraces are almost identical as following: (only argv in main()
differs, so I copy only one of these.)
---------------------------------------------
Thread 1 (Thread 0x2ab6f1524660 (LWP 19810)):
#0 0x0000003326a0d590 in __read_nocancel () from /lib64/libpthread.so.0
#1 0x00000000004a83ec in PMIU_readline ()
#2 0x0000000000439fdc in PMI_KVS_Get ()
#3 0x000000000041c1f6 in MPIDI_Populate_vc_node_ids ()
#4 0x000000000041adbd in MPID_Init ()
#5 0x000000000040c152 in MPIR_Init_thread ()
#6 0x000000000040b2b0 in PMPI_Init ()
#7 0x00000000004048e9 in main (argc=1, argv=0x7fff7a65ad48) at
hello.c:15
------------------------------------------
-- Ting-jen
More information about the mvapich-discuss
mailing list