[mvapich-discuss] Program stuck in MPI Framework Library
Dash, Ambika
Ambika.Dash at kla-tencor.com
Wed Jun 27 14:05:03 EDT 2018
Hi,
We use MPI Framework in our software stack and on one of the nodes the MPI Send/Receive/Test calls are getting stuck as shown in the following back traces.
Could you help us on how to go about finding the root cause of this issue.
MPI_ISend back trace
#0 0x00007f10b4fa5773 in MPIDI_CH3I_MRAILI_Cq_poll_ib () from /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
#1 0x00007f10b4fa11cc in MPIDI_CH3I_MRAILI_Waiting_msg () from /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
#2 0x00007f10b4f77227 in MPIDI_CH3I_read_progress () from /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
#3 0x00007f10b4f76e3a in MPIDI_CH3I_Progress_test () from /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
#4 0x00007f10b4f69bb6 in MPID_Isend () from /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
#5 0x00007f10b4ef466a in PMPI_Isend () from /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
#6 0x0000000000476673 in sync_MPI_Isend (buf=0x7ef071b6e020, count=4719952, datatype=1275068673, dest=2, tag=0, comm=1140850688, request=0xba1728 <s_mpiTxRequestList+104>) at CommQueue.cpp:68
#7 0x000000000047876c in DataQueueTx::send (this=0x4753450) at CommQueue.cpp:968
#8 0x000000000047815d in CommQueueTx::send (this=0x46f92a8) at CommQueue.cpp:788
#9 0x00000000004f9980 in CommandManager::sendBuffer (this=0x434e740, dataBufferIndex=0) at CommandManager.cpp:865
#10 0x00000000004f99ba in CommandManager::sendBuffer (this=0x434e740, dataBuffer=0x7ef071b6e010) at CommandManager.cpp:873
MPI_Test
#0 0x00007f5e06e72773 in MPIDI_CH3I_MRAILI_Cq_poll_ib () from /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
#1 0x00007f5e06e6e1cc in MPIDI_CH3I_MRAILI_Waiting_msg () from /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
#2 0x00007f5e06e44227 in MPIDI_CH3I_read_progress () from /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
#3 0x00007f5e06e43e3a in MPIDI_CH3I_Progress_test () from /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
#4 0x00007f5e06dc9260 in MPIR_Test_impl () from /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
#5 0x00007f5e06dc95ab in PMPI_Test () from /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
#6 0x000000000047656f in sync_MPI_Test (mpiRequest=0x41d1dd8, flag=0x7f5e013ba3ec, mpi_status=0x7f5e013ba3f0) at CommQueue.cpp:49
#7 0x0000000000477836 in DataQueueRx::wait (this=0x41d1dd0) at CommQueue.cpp:528
#8 0x0000000000477259 in CommQueueRx::getBuffer (this=0x41eaba8) at CommQueue.cpp:365
#9 0x00000000004f906f in CommandManager::getNextCommand (this=0x3e3d720) at CommandManager.cpp:766
Thanks,
AP Dash
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180627/fe0941c9/attachment.html>
More information about the mvapich-discuss
mailing list