[mvapich-discuss] Program stuck in MPI Framework Library

Sourav Chakraborty chakraborty.52 at buckeyemail.osu.edu
Wed Jun 27 17:26:26 EDT 2018


Hi AP,

Can you please try with the latest MVAPICH2-2.3rc2 release and see if the
issue persists? It has several bugfixes and performance improvements
compared to 2.2a.

Can you also provide some details about the system (Interconnect, no. of
processes etc.), how MVAPICH2 has been configured, and how it is being
launched? You can find this information by running mpiname -a.

Thanks,
Sourav



On Wed, Jun 27, 2018 at 5:19 PM Dash, Ambika <Ambika.Dash at kla-tencor.com>
wrote:

> Hi,
>
>
>
> We use MPI Framework in our software stack and on one of the nodes the MPI
> Send/Receive/Test calls are getting stuck as shown in the following back
> traces.
>
> Could you help us on how to go about finding the root cause of this issue.
>
>
>
> MPI_ISend back trace
>
>
>
> #0  0x00007f10b4fa5773 in MPIDI_CH3I_MRAILI_Cq_poll_ib () from
> /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
>
> #1  0x00007f10b4fa11cc in MPIDI_CH3I_MRAILI_Waiting_msg () from
> /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
>
> #2  0x00007f10b4f77227 in MPIDI_CH3I_read_progress () from
> /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
>
> #3  0x00007f10b4f76e3a in MPIDI_CH3I_Progress_test () from
> /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
>
> #4  0x00007f10b4f69bb6 in MPID_Isend () from
> /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
>
> #5  0x00007f10b4ef466a in PMPI_Isend () from
> /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
>
> #6  0x0000000000476673 in sync_MPI_Isend (buf=0x7ef071b6e020,
> count=4719952, datatype=1275068673, dest=2, tag=0, comm=1140850688,
> request=0xba1728 <s_mpiTxRequestList+104>) at CommQueue.cpp:68
>
> #7  0x000000000047876c in DataQueueTx::send (this=0x4753450) at
> CommQueue.cpp:968
>
> #8  0x000000000047815d in CommQueueTx::send (this=0x46f92a8) at
> CommQueue.cpp:788
>
> #9  0x00000000004f9980 in CommandManager::sendBuffer (this=0x434e740,
> dataBufferIndex=0) at CommandManager.cpp:865
>
> #10 0x00000000004f99ba in CommandManager::sendBuffer (this=0x434e740,
> dataBuffer=0x7ef071b6e010) at CommandManager.cpp:873
>
>
>
> MPI_Test
>
>
>
> #0  0x00007f5e06e72773 in MPIDI_CH3I_MRAILI_Cq_poll_ib () from
> /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
>
> #1  0x00007f5e06e6e1cc in MPIDI_CH3I_MRAILI_Waiting_msg () from
> /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
>
> #2  0x00007f5e06e44227 in MPIDI_CH3I_read_progress () from
> /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
>
> #3  0x00007f5e06e43e3a in MPIDI_CH3I_Progress_test () from
> /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
>
> #4  0x00007f5e06dc9260 in MPIR_Test_impl () from
> /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
>
> #5  0x00007f5e06dc95ab in PMPI_Test () from
> /usr/mpi/gcc/mvapich2-2.2a/lib/libmpi.so.12
>
> #6  0x000000000047656f in sync_MPI_Test (mpiRequest=0x41d1dd8,
> flag=0x7f5e013ba3ec, mpi_status=0x7f5e013ba3f0) at CommQueue.cpp:49
>
> #7  0x0000000000477836 in DataQueueRx::wait (this=0x41d1dd0) at
> CommQueue.cpp:528
>
> #8  0x0000000000477259 in CommQueueRx::getBuffer (this=0x41eaba8) at
> CommQueue.cpp:365
>
> #9  0x00000000004f906f in CommandManager::getNextCommand (this=0x3e3d720)
> at CommandManager.cpp:766
>
>
>
> Thanks,
>
> AP Dash
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180627/03dae848/attachment-0001.html>


More information about the mvapich-discuss mailing list