[mvapich-discuss] mvapich thread multiple problem

Marcin Zalewski marcin.zalewski at gmail.com
Wed Mar 13 00:38:20 EDT 2013


Devendar,

The first thread is the same. The second thread is just a little different:

(gdb) bt
#0  0x00007fdebe39d303 in __GI___poll (fds=<optimized out>,
nfds=<optimized out>, timeout=<optimized out>) at
../sysdeps/unix/sysv/linux/poll.c:87
#1  0x00007fdebf0c6587 in ips_ptl_pollintr (rcvthreadc=0x2103e78) at
ptl_rcvthread.c:322
#2  0x00007fdebee90e9a in start_thread (arg=0x7fdebc9d2700) at
pthread_create.c:308
#3  0x00007fdebe3a8cbd in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4  0x0000000000000000 in ?? ()

The difference is that now __GI___poll gets called and not poll. I
have also tried it on a single node with a single thread, and then I
get only one thread in gdb stuck in the following place:

#0  0x00007f8a93316a65 in ?? () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x000000000045bc0f in MPIDI_CH3_Progress_start ()
#2  0x0000000000443dd7 in MPIR_Wait_impl ()
#3  0x0000000000443ee1 in PMPI_Wait ()
[snip]

Looking at the implementation of MPIDI_CH3_Progress_start for psm, it
seems that the problem may be psm_poll. What do you think? In the
multi-threaded version, psm also seems to be a possible cause of the
problem. Is there any known reason for why psm would misbehave with
thread multiple?

Thanks,
Marcin


On Tue, Mar 12, 2013 at 11:22 PM, Devendar Bureddy
<bureddy at cse.ohio-state.edu> wrote:
> Hi Marcin
>
> Can you try with simple configuration with out
> "--enable-thread-cs=per-object --enable-refcount=lock-free
> --enable-handle-allocation=tls --with-atomic-primitives" to see if that have
> any effect? library should support thread multiple by default.
> --enable-hybrid do not have any effect with psm interface. You can remove
> that one also.
>
> -Devendar
>
> On Tue, Mar 12, 2013 at 5:20 PM, Marcin Zalewski <marcin.zalewski at gmail.com>
> wrote:
>>
>> Hello.
>>
>> I am using mvapich 1.9b with QLogic (Intel) adapters. I configured
>> mvapich like this:
>>
>> ./configure --enable-fast=all,O3 --enable-thread-cs=per-object
>> --enable-refcount=lock-free --enable-handle-allocation=tls
>> --with-atomic-primitives --enable-shared --with-ch3-rank-bits=16
>> --enable-hybrid --with-device=ch3:psm
>> --with-psm=/xyz/infinipath-psm-3.1-364.1140_open/usr
>>
>> I am trying to run a simple test application with 1 thread on 2 hosts,
>> but I get no progress. Upon investigation, it seems that my
>> application is stuck in mvapich (trace at the end of the email). The
>> same test works OK with mpich. I am wondering what should I do to
>> debug this further. Could it be a problem with my psm installation? I
>> am able to run the same application in thread serialized mode. I would
>> appreciate any pointers you could give me on what to do next.
>>
>> Thank you,
>> Marcin
>>
>>
>>
>> (gdb) info thread
>>   Id   Target Id         Frame
>>   2    Thread 0x7fa2d4c39700 (LWP 15802) "mpi_test_bfs_th"
>> 0x00007fa2d63fc303 in poll () from /lib/x86_64-linux-gnu/libc.so.
>> 6
>> * 1    Thread 0x7fa2d8191b40 (LWP 15801) "mpi_test_bfs_th"
>> 0x00007fa2d6ef4a62 in ?? () from /lib/x86_64-linux-gnu/libpthread.so.0
>> (gdb) bt
>> #0  0x00007fa2d6ef4a62 in ?? () from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x00007fa2d7a11603 in psm_irecv () from
>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>> #2  0x00007fa2d7a0927d in MPID_Irecv () from
>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>> #3  0x00007fa2d79cef63 in MPIC_Sendrecv () from
>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>> #4  0x00007fa2d79cf717 in MPIC_Sendrecv_ft () from
>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>> #5  0x00007fa2d7a5fca7 in MPIR_Allreduce_pt2pt_rd_MV2 () from
>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>> #6  0x00007fa2d7a62cda in MPIR_Allreduce_new_MV2 () from
>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>> #7  0x00007fa2d79dff96 in MPIR_Get_contextid_sparse_group () from
>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>> #8  0x00007fa2d79e08d0 in MPIR_Comm_copy () from
>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>> #9  0x00007fa2d7a7d3ce in MPIR_Comm_dup_impl () from
>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>> #10 0x00007fa2d7a7d422 in PMPI_Comm_dup () from
>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>> ... [snip]
>> (gdb) thread 2
>> [Switching to thread 2 (Thread 0x7fa2d4c39700 (LWP 15802))]
>> #0  0x00007fa2d63fc303 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>> (gdb) bt
>> #0  0x00007fa2d63fc303 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x00007fa2d7125587 in ips_ptl_pollintr (rcvthreadc=0x1dcbae8) at
>> ptl_rcvthread.c:322
>> #2  0x00007fa2d6eefe9a in start_thread () from
>> /lib/x86_64-linux-gnu/libpthread.so.0
>> #3  0x00007fa2d6407cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
>> #4  0x0000000000000000 in ?? ()
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
>
> --
> Devendar


More information about the mvapich-discuss mailing list