[mvapich-discuss] mvapich thread multiple problem

Marcin Zalewski marcin.zalewski at gmail.com
Tue Mar 12 17:20:26 EDT 2013


Hello.

I am using mvapich 1.9b with QLogic (Intel) adapters. I configured
mvapich like this:

./configure --enable-fast=all,O3 --enable-thread-cs=per-object
--enable-refcount=lock-free --enable-handle-allocation=tls
--with-atomic-primitives --enable-shared --with-ch3-rank-bits=16
--enable-hybrid --with-device=ch3:psm
--with-psm=/xyz/infinipath-psm-3.1-364.1140_open/usr

I am trying to run a simple test application with 1 thread on 2 hosts,
but I get no progress. Upon investigation, it seems that my
application is stuck in mvapich (trace at the end of the email). The
same test works OK with mpich. I am wondering what should I do to
debug this further. Could it be a problem with my psm installation? I
am able to run the same application in thread serialized mode. I would
appreciate any pointers you could give me on what to do next.

Thank you,
Marcin



(gdb) info thread
  Id   Target Id         Frame
  2    Thread 0x7fa2d4c39700 (LWP 15802) "mpi_test_bfs_th"
0x00007fa2d63fc303 in poll () from /lib/x86_64-linux-gnu/libc.so.
6
* 1    Thread 0x7fa2d8191b40 (LWP 15801) "mpi_test_bfs_th"
0x00007fa2d6ef4a62 in ?? () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0  0x00007fa2d6ef4a62 in ?? () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007fa2d7a11603 in psm_irecv () from
/opt/mvapich/2-1.9b/lib/libmpich.so.10
#2  0x00007fa2d7a0927d in MPID_Irecv () from
/opt/mvapich/2-1.9b/lib/libmpich.so.10
#3  0x00007fa2d79cef63 in MPIC_Sendrecv () from
/opt/mvapich/2-1.9b/lib/libmpich.so.10
#4  0x00007fa2d79cf717 in MPIC_Sendrecv_ft () from
/opt/mvapich/2-1.9b/lib/libmpich.so.10
#5  0x00007fa2d7a5fca7 in MPIR_Allreduce_pt2pt_rd_MV2 () from
/opt/mvapich/2-1.9b/lib/libmpich.so.10
#6  0x00007fa2d7a62cda in MPIR_Allreduce_new_MV2 () from
/opt/mvapich/2-1.9b/lib/libmpich.so.10
#7  0x00007fa2d79dff96 in MPIR_Get_contextid_sparse_group () from
/opt/mvapich/2-1.9b/lib/libmpich.so.10
#8  0x00007fa2d79e08d0 in MPIR_Comm_copy () from
/opt/mvapich/2-1.9b/lib/libmpich.so.10
#9  0x00007fa2d7a7d3ce in MPIR_Comm_dup_impl () from
/opt/mvapich/2-1.9b/lib/libmpich.so.10
#10 0x00007fa2d7a7d422 in PMPI_Comm_dup () from
/opt/mvapich/2-1.9b/lib/libmpich.so.10
... [snip]
(gdb) thread 2
[Switching to thread 2 (Thread 0x7fa2d4c39700 (LWP 15802))]
#0  0x00007fa2d63fc303 in poll () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007fa2d63fc303 in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fa2d7125587 in ips_ptl_pollintr (rcvthreadc=0x1dcbae8) at
ptl_rcvthread.c:322
#2  0x00007fa2d6eefe9a in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#3  0x00007fa2d6407cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x0000000000000000 in ?? ()


More information about the mvapich-discuss mailing list