[mvapich-discuss] Fail to run MPI program using MVAPICH2-1.5.1

Jonathan Perkins perkinjo at cse.ohio-state.edu
Fri Oct 1 16:56:26 EDT 2010


Hi, I'm still not really sure what is causing this issue for you but I
would like to suggest using hydra to see if things will work for you
that way.  Please look at the mpiexec.hydra section of our userguide
for more information on its use.

http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.5.1.html#x1-210005.2.3

2010/9/30 Ting-jen Yen <yentj at infowrap.com.tw>:
> ©ó ¥|¡A2010-09-30 ©ó 13:59 -0400¡AJonathan Perkins ´£¨ì¡G
>> On Wed, Sep 29, 2010 at 9:04 PM, Ting-jen Yen <yentj at infowrap.com.tw> wrote:
>> >  Thanks.  I did get the backtrace of the mpi processes.
>> > When I ran a simple hello-world MPI program with 2 processes, both
>> > backtraces are almost identical as following: (only argv in main()
>> > differs, so I copy only one of these.)
>> >
>> > ---------------------------------------------
>> > Thread 1 (Thread 0x2ab6f1524660 (LWP 19810)):
>> > #0  0x0000003326a0d590 in __read_nocancel () from /lib64/libpthread.so.0
>> > #1  0x00000000004a83ec in PMIU_readline ()
>> > #2  0x0000000000439fdc in PMI_KVS_Get ()
>> > #3  0x000000000041c1f6 in MPIDI_Populate_vc_node_ids ()
>> > #4  0x000000000041adbd in MPID_Init ()
>> > #5  0x000000000040c152 in MPIR_Init_thread ()
>> > #6  0x000000000040b2b0 in PMPI_Init ()
>> > #7  0x00000000004048e9 in main (argc=1, argv=0x7fff7a65ad48) at
>> > hello.c:15
>> > ------------------------------------------
>>
>> I'm not really sure what is happening here and the backtrace is
>> missing some information.  Can you rebuild the mvapich2 library with
>> the --enable-dbg=g option included as well.  Also, make sure that you
>> rebuild the mpi benchmark with the new library as well (you're using
>> static libraries).
>>
>
> Hi, Jonathan,
>
>  After enable all the debug options I can find, I got the backtrace
> output as following:
>
> ------------------------------------------------
> Thread 1 (Thread 0x2b16531f8660 (LWP 20015)):
> #0  0x0000003ed8c0d590 in __read_nocancel () from /lib64/libpthread.so.0
> #1  0x00000000004dc0fc in PMIU_readline (fd=5,
>    buf=0x77afa0 "cmd=my_kvsname kvsname=kvs_0\n", maxlen=1023)
>    at simple_pmiutil.c:136
> #2  0x00000000004534bc in PMI_KVS_Get (
>    kvsname=0x5 <Address 0x5 out of bounds>,
>    key=0x77afa0 "cmd=my_kvsname kvsname=kvs_0\n",
>    value=0x3ff <Address 0x3ff out of bounds>, length=-1) at
> simple_pmi.c:597
> #3  0x000000000042edc9 in MPIDI_Populate_vc_node_ids (pg=0x5,
>    our_pg_rank=7843744) at mpid_vc.c:1257
> #4  0x000000000042c28a in MPID_Init (argc=0x5, argv=0x77afa0,
> requested=1023,
>    provided=0xffffffffffffffff, has_args=0x77afbd, has_env=0x63)
>    at mpid_init.c:216
> #5  0x0000000000411942 in MPIR_Init_thread (argc=0x5, argv=0x77afa0,
>    required=1023, provided=0xffffffffffffffff) at initthread.c:405
> #6  0x000000000040f888 in PMPI_Init (argc=0x5, argv=0x77afa0) at
> init.c:188
> #7  0x0000000000404a49 in main (argc=1, argv=0x7fffb62c44f8) at
> hello.c:15
> --------------------------------------------------
>
>  The other backtrace outputs is almost the same, except the argv
> in main().
>
> -- Ting-jen
>
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo



More information about the mvapich-discuss mailing list