[mvapich-discuss] Fail to run MPI program using MVAPICH2-1.5.1

Ting-jen Yen yentj at infowrap.com.tw
Wed Sep 29 21:04:20 EDT 2010


2010-09-29 Jonathan Perkins <perkinjo at cse.ohio-state.edu>:
> 2010/9/28 Ting-jen Yen <yentj at infowrap.com.tw>:
> > Hello,
> >
> >  Correction to my own previous post.  MVAPICH2 1.5.1p1 does
> > work with my simple "hello world" test program.  However, it still
> > have problem with the LINPACK benchmark program comes with
> > Intel MKL.  So it probably has nothing to do with
> > "--enable-romio --with-file-system=lustre" parameters.
> >
> > I ran the program using:
> > mpirun_rsh -np 4 hc86 hc86 hc87 hc87 ./xhpl
> > The program would just hang there.
> >
> > Related processes on the first node (using "ps auxw" )
> > ------------------------------------------------------
> > test001 29529  0.0  0.0  21112   760 pts/4    S+   10:56   0:00
> > mpirun_rsh -np 4 hc86 hc86 hc87 hc87 ./xhpl
> > test001 29531  0.0  0.0  63832  1092 pts/4    S+   10:56
> > 0:00 /bin/bash -c cd /home/test001/pbs-test/mpi/linpack; /usr/bin
> > test001 29532  0.0  0.0  58372  3224 pts/4    S+   10:56
> > 0:00 /usr/bin/ssh -q hc87 cd /home/test001/pbs-test/mpi/linpack;
> > test001 29533  0.0  0.0  23180   904 pts/4    S+   10:56
> > 0:00 /opt/mvapich2-1.5.1p1/bin/mpispawn 0
> > test001 29536 99.6  0.5  81352 41664 pts/4    RLl+ 10:56   0:59 ./xhpl
> > test001 29537  100  0.4  80824 37000 pts/4    RLl+ 10:56   1:00 ./xhpl
> > -------------------------------------------------------
> > Related processes on second nodes:
> > ----------------------------------------------------
> > test001 12578  0.0  0.0  63832  1096 ?        Ss   10:56   0:00 bash -c
> > cd /home/test001/pbs-test/mpi/linpack; /usr/bin/env
> > test001 12579  0.1  0.0  23180   896 ?        S    10:56
> > 0:00 /opt/mvapich2-1.5.1p1/bin/mpispawn 0
> > test001 12580  0.0  0.4  74108 37052 ?        SLl  10:56   0:00 ./xhpl
> > test001 12581  0.0  0.4  74108 37052 ?        SLl  10:56   0:00 ./xhpl
> > ----------------------------------------------------
> >
> > If I switched to MVAPICH2 1.2p1, using the same configure parameters,
> > the produced program does work.  The configureation for both MVAPICH2
> > is:
> > --prefix=/opt/mvapich2-version --with-rdma=gen2
> > --with-ib-include=/usr/include --with-ib-libpath=/usr/lib64 CC=icc
> > CXX=icpc FC=ifort F77=ifort F90=ifort
> >
> > Any idea what might have cause this?
> > How do I produce backtrace of the mpi process?
> 
> In this case you will need to ssh into both hc86 and hc87.  On each
> node you'll want to get the pid of the xhpl process using ps or some
> other method.  You can then use `gdb attach <pid of xhpl>' and then
> when inside the gdb shell use the command `thread apply all bt'.
> You'll want to gather this for each of the xhpl processes.
> 

  Thanks.  I did get the backtrace of the mpi processes.
When I ran a simple hello-world MPI program with 2 processes, both
backtraces are almost identical as following: (only argv in main()
differs, so I copy only one of these.)

---------------------------------------------
Thread 1 (Thread 0x2ab6f1524660 (LWP 19810)):
#0  0x0000003326a0d590 in __read_nocancel () from /lib64/libpthread.so.0
#1  0x00000000004a83ec in PMIU_readline ()
#2  0x0000000000439fdc in PMI_KVS_Get ()
#3  0x000000000041c1f6 in MPIDI_Populate_vc_node_ids ()
#4  0x000000000041adbd in MPID_Init ()
#5  0x000000000040c152 in MPIR_Init_thread ()
#6  0x000000000040b2b0 in PMPI_Init ()
#7  0x00000000004048e9 in main (argc=1, argv=0x7fff7a65ad48) at
hello.c:15
------------------------------------------

-- Ting-jen



More information about the mvapich-discuss mailing list