[mvapich-discuss] mvapich thread multiple problem

Marcin Zalewski marcin.zalewski at gmail.com
Wed Mar 13 09:52:42 EDT 2013


I forgot to mention that if I compile without psm support, I don't get
thread multiple at all. So, I had to add the psm flags to the
configuration you recommended.

Thank you,
Marcin

On Wed, Mar 13, 2013 at 12:38 AM, Marcin Zalewski
<marcin.zalewski at gmail.com> wrote:
> Devendar,
>
> The first thread is the same. The second thread is just a little different:
>
> (gdb) bt
> #0  0x00007fdebe39d303 in __GI___poll (fds=<optimized out>,
> nfds=<optimized out>, timeout=<optimized out>) at
> ../sysdeps/unix/sysv/linux/poll.c:87
> #1  0x00007fdebf0c6587 in ips_ptl_pollintr (rcvthreadc=0x2103e78) at
> ptl_rcvthread.c:322
> #2  0x00007fdebee90e9a in start_thread (arg=0x7fdebc9d2700) at
> pthread_create.c:308
> #3  0x00007fdebe3a8cbd in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
> #4  0x0000000000000000 in ?? ()
>
> The difference is that now __GI___poll gets called and not poll. I
> have also tried it on a single node with a single thread, and then I
> get only one thread in gdb stuck in the following place:
>
> #0  0x00007f8a93316a65 in ?? () from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x000000000045bc0f in MPIDI_CH3_Progress_start ()
> #2  0x0000000000443dd7 in MPIR_Wait_impl ()
> #3  0x0000000000443ee1 in PMPI_Wait ()
> [snip]
>
> Looking at the implementation of MPIDI_CH3_Progress_start for psm, it
> seems that the problem may be psm_poll. What do you think? In the
> multi-threaded version, psm also seems to be a possible cause of the
> problem. Is there any known reason for why psm would misbehave with
> thread multiple?
>
> Thanks,
> Marcin
>
>
> On Tue, Mar 12, 2013 at 11:22 PM, Devendar Bureddy
> <bureddy at cse.ohio-state.edu> wrote:
>> Hi Marcin
>>
>> Can you try with simple configuration with out
>> "--enable-thread-cs=per-object --enable-refcount=lock-free
>> --enable-handle-allocation=tls --with-atomic-primitives" to see if that have
>> any effect? library should support thread multiple by default.
>> --enable-hybrid do not have any effect with psm interface. You can remove
>> that one also.
>>
>> -Devendar
>>
>> On Tue, Mar 12, 2013 at 5:20 PM, Marcin Zalewski <marcin.zalewski at gmail.com>
>> wrote:
>>>
>>> Hello.
>>>
>>> I am using mvapich 1.9b with QLogic (Intel) adapters. I configured
>>> mvapich like this:
>>>
>>> ./configure --enable-fast=all,O3 --enable-thread-cs=per-object
>>> --enable-refcount=lock-free --enable-handle-allocation=tls
>>> --with-atomic-primitives --enable-shared --with-ch3-rank-bits=16
>>> --enable-hybrid --with-device=ch3:psm
>>> --with-psm=/xyz/infinipath-psm-3.1-364.1140_open/usr
>>>
>>> I am trying to run a simple test application with 1 thread on 2 hosts,
>>> but I get no progress. Upon investigation, it seems that my
>>> application is stuck in mvapich (trace at the end of the email). The
>>> same test works OK with mpich. I am wondering what should I do to
>>> debug this further. Could it be a problem with my psm installation? I
>>> am able to run the same application in thread serialized mode. I would
>>> appreciate any pointers you could give me on what to do next.
>>>
>>> Thank you,
>>> Marcin
>>>
>>>
>>>
>>> (gdb) info thread
>>>   Id   Target Id         Frame
>>>   2    Thread 0x7fa2d4c39700 (LWP 15802) "mpi_test_bfs_th"
>>> 0x00007fa2d63fc303 in poll () from /lib/x86_64-linux-gnu/libc.so.
>>> 6
>>> * 1    Thread 0x7fa2d8191b40 (LWP 15801) "mpi_test_bfs_th"
>>> 0x00007fa2d6ef4a62 in ?? () from /lib/x86_64-linux-gnu/libpthread.so.0
>>> (gdb) bt
>>> #0  0x00007fa2d6ef4a62 in ?? () from /lib/x86_64-linux-gnu/libpthread.so.0
>>> #1  0x00007fa2d7a11603 in psm_irecv () from
>>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>>> #2  0x00007fa2d7a0927d in MPID_Irecv () from
>>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>>> #3  0x00007fa2d79cef63 in MPIC_Sendrecv () from
>>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>>> #4  0x00007fa2d79cf717 in MPIC_Sendrecv_ft () from
>>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>>> #5  0x00007fa2d7a5fca7 in MPIR_Allreduce_pt2pt_rd_MV2 () from
>>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>>> #6  0x00007fa2d7a62cda in MPIR_Allreduce_new_MV2 () from
>>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>>> #7  0x00007fa2d79dff96 in MPIR_Get_contextid_sparse_group () from
>>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>>> #8  0x00007fa2d79e08d0 in MPIR_Comm_copy () from
>>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>>> #9  0x00007fa2d7a7d3ce in MPIR_Comm_dup_impl () from
>>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>>> #10 0x00007fa2d7a7d422 in PMPI_Comm_dup () from
>>> /opt/mvapich/2-1.9b/lib/libmpich.so.10
>>> ... [snip]
>>> (gdb) thread 2
>>> [Switching to thread 2 (Thread 0x7fa2d4c39700 (LWP 15802))]
>>> #0  0x00007fa2d63fc303 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>> (gdb) bt
>>> #0  0x00007fa2d63fc303 in poll () from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  0x00007fa2d7125587 in ips_ptl_pollintr (rcvthreadc=0x1dcbae8) at
>>> ptl_rcvthread.c:322
>>> #2  0x00007fa2d6eefe9a in start_thread () from
>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>> #3  0x00007fa2d6407cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
>>> #4  0x0000000000000000 in ?? ()
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>>
>>
>> --
>> Devendar


More information about the mvapich-discuss mailing list