[mvapich-discuss] mvapich thread multiple problem

Marcin Zalewski marcin.zalewski at gmail.com
Wed Mar 13 10:06:57 EDT 2013


I added debugging to the psm configuration, and, indeed, psm seems to
be the problem:

#1  0x000000000047193f in MPIDI_CH3_Progress_start
(pstate=pstate at entry=0x7fff806f42e0) at
src/mpid/ch3/channels/psm/src/mpidi_calls.c:180
180       _psm_enter_;

This is where my application is stuck in thread multiple mode.

-m



On Wed, Mar 13, 2013 at 9:52 AM, Marcin Zalewski
<marcin.zalewski at gmail.com> wrote:
> I forgot to mention that if I compile without psm support, I don't get
> thread multiple at all. So, I had to add the psm flags to the
> configuration you recommended.
>
> Thank you,
> Marcin
>
> On Wed, Mar 13, 2013 at 12:38 AM, Marcin Zalewski
> <marcin.zalewski at gmail.com> wrote:
>> Devendar,
>>
>> The first thread is the same. The second thread is just a little different:
>>
>> (gdb) bt
>> #0  0x00007fdebe39d303 in __GI___poll (fds=<optimized out>,
>> nfds=<optimized out>, timeout=<optimized out>) at
>> ../sysdeps/unix/sysv/linux/poll.c:87
>> #1  0x00007fdebf0c6587 in ips_ptl_pollintr (rcvthreadc=0x2103e78) at
>> ptl_rcvthread.c:322
>> #2  0x00007fdebee90e9a in start_thread (arg=0x7fdebc9d2700) at
>> pthread_create.c:308
>> #3  0x00007fdebe3a8cbd in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>> #4  0x0000000000000000 in ?? ()
>>
>> The difference is that now __GI___poll gets called and not poll. I
>> have also tried it on a single node with a single thread, and then I
>> get only one thread in gdb stuck in the following place:
>>
>> #0  0x00007f8a93316a65 in ?? () from /lib/x86_64-linux-gnu/libpthread.so.0
>> #1  0x000000000045bc0f in MPIDI_CH3_Progress_start ()
>> #2  0x0000000000443dd7 in MPIR_Wait_impl ()
>> #3  0x0000000000443ee1 in PMPI_Wait ()
>> [snip]
>>
>> Looking at the implementation of MPIDI_CH3_Progress_start for psm, it
>> seems that the problem may be psm_poll. What do you think? In the
>> multi-threaded version, psm also seems to be a possible cause of the
>> problem. Is there any known reason for why psm would misbehave with
>> thread multiple?
>>
>> Thanks,
>> Marcin
>>
>>
>> On Tue, Mar 12, 2013 at 11:22 PM, Devendar Bureddy
>> <bureddy at cse.ohio-state.edu> wrote:
>>> Hi Marcin
>>>
>>> Can you try with simple configuration with out
>>> "--enable-thread-cs=per-object --enable-refcount=lock-free
>>> --enable-handle-allocation=tls --with-atomic-primitives" to see if that have
>>> any effect? library should support thread multiple by default.
>>> --enable-hybrid do not have any effect with psm interface. You can remove
>>> that one also.
>>>
>>> -Devendar
>>>
>>> On Tue, Mar 12, 2013 at 5:20 PM, Marcin Zalewski <marcin.zalewski at gmail.com>
>>> wrote:
>>>>
>>>> Hello.
>>>>
>>>> I am using mvapich 1.9b with QLogic (Intel) adapters. I configured
>>>> mvapich like this:
>>>>
>>>> ./configure --enable-fast=all,O3 --enable-thread-cs=per-object
>>>> --enable-refcount=lock-free --enable-handle-allocation=tls
>>>> --with-atomic-primitives --enable-shared --with-ch3-rank-bits=16
>>>> --enable-hybrid --with-device=ch3:psm
>>>> --with-psm=/xyz/infinipath-psm-3.1-364.1140_open/usr
>>>>
>>>> I am trying to run a simple test application with 1 thread on 2 hosts,
>>>> but I get no progress. Upon investigation, it seems that my
>>>> application is stuck in mvapich (trace at the end of the email). The
>>>> same test works OK with mpich. I am wondering what should I do to
>>>> debug this further. Could it be a problem with my psm installation? I
>>>> am able to run the same application in thread serialized mode. I would
>>>> appreciate any pointers you could give me on what to do next.
>>>>
>>>> Thank you,
>>>> Marcin
>>>>
>>>>
>>>>
>>>> (gdb) info thread
>>>>   Id   Target Id         Frame
>>>>   2    Thread 0x7fa2d4c39700 (LWP 15802) "mpi_test_bfs_th"
>>>> 0x00007fa2d63fc303 in poll () from /lib/x86_64-linux-gnu/libc.so.
>>>> 6
>>>> * 1    Thread 0x7fa2d8191b40 (LWP 15801) "mpi_test_bfs_th"
>>>> 0x00007fa2d6ef4a62 in ?? () from /lib/x86_64-linux-gnu/libpthread.so.0
>>>> (gdb) bt
>>>> #0  0x00007fa2d6ef4a62 in ?? () from /lib/x86_64-linux-gnu/libpthread.so.0
>>>> #1  0x00007fa2d7a11603 in psm_irecv () from
>>>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>>>> #2  0x00007fa2d7a0927d in MPID_Irecv () from
>>>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>>>> #3  0x00007fa2d79cef63 in MPIC_Sendrecv () from
>>>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>>>> #4  0x00007fa2d79cf717 in MPIC_Sendrecv_ft () from
>>>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>>>> #5  0x00007fa2d7a5fca7 in MPIR_Allreduce_pt2pt_rd_MV2 () from
>>>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>>>> #6  0x00007fa2d7a62cda in MPIR_Allreduce_new_MV2 () from
>>>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>>>> #7  0x00007fa2d79dff96 in MPIR_Get_contextid_sparse_group () from
>>>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>>>> #8  0x00007fa2d79e08d0 in MPIR_Comm_copy () from
>>>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>>>> #9  0x00007fa2d7a7d3ce in MPIR_Comm_dup_impl () from
>>>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>>>> #10 0x00007fa2d7a7d422 in PMPI_Comm_dup () from
>>>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>>>> ... [snip]
>>>> (gdb) thread 2
>>>> [Switching to thread 2 (Thread 0x7fa2d4c39700 (LWP 15802))]
>>>> #0  0x00007fa2d63fc303 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>>> (gdb) bt
>>>> #0  0x00007fa2d63fc303 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>>> #1  0x00007fa2d7125587 in ips_ptl_pollintr (rcvthreadc=0x1dcbae8) at
>>>> ptl_rcvthread.c:322
>>>> #2  0x00007fa2d6eefe9a in start_thread () from
>>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>> #3  0x00007fa2d6407cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>>> #4  0x0000000000000000 in ?? ()
>>>> _______________________________________________
>>>> mvapich-discuss mailing list
>>>> mvapich-discuss at cse.ohio-state.edu
>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>>
>>>
>>>
>>> --
>>> Devendar


More information about the mvapich-discuss mailing list