[mvapich-discuss] Bug: deadlock between ibv_destroy_srq and async_thread

Dhabaleswar Panda panda at cse.ohio-state.edu
Sat May 24 13:57:38 EDT 2008


David - Thanks for reporting this problem.

Christian - Thanks for the temporary workaround solution.

We are taking a look at it and will keep you updated updated on the
status/fix.

Thanks,

DK

On Sat, 24 May 2008, Christian Guggenberger wrote:

> On Fri, May 23, 2008 at 09:23:37PM -0500, David_Kewley at dell.com wrote:
> >
> > #0  0x00000036b2608b3a in pthread_cond_wait@@GLIBC_2.3.2 () from
> > /lib64/tls/libpthread.so.0
> > #1  0x0000002a9595405b in ibv_cmd_destroy_srq (srq=0x82b370) at
> > src/cmd.c:582
> > #2  0x0000002a962b5419 in mthca_destroy_srq (srq=0x82b3bc) at
> > src/verbs.c:475
> > #3  0x0000002a9564878e in MPIDI_CH3I_CM_Finalize () from
> > /opt/mvapich2/1.0.1/intel/10.1.015/lib/libmpich.so
> > #4  0x0000002a955c053b in MPIDI_CH3_Finalize () from
> > /opt/mvapich2/1.0.1/intel/10.1.015/lib/libmpich.so
> > #5  0x0000002a95626202 in MPID_Finalize () from
> > /opt/mvapich2/1.0.1/intel/10.1.015/lib/libmpich.so
> > #6  0x0000002a955f7fee in PMPI_Finalize () from
> > /opt/mvapich2/1.0.1/intel/10.1.015/lib/libmpich.so
> > #7  0x0000002a955f7eae in pmpi_finalize_ () from
> > /opt/mvapich2/1.0.1/intel/10.1.015/lib/libmpich.so
> > #8  0x0000000000459ff8 in stoprog_ ()
> > #9  0x000000000047afa6 in MAIN__ ()
> > #10 0x0000000000405d62 in main ()
> >
> > I think the fix is to add some sort of synchronization between
> > async_thread() and the code that calls the pthread_cancel() on it.  To
> > the
> > MVAPICH developers: Do you think you can work up a fix soon, and forward
> > the patch for testing?
>
> a short-term workaround would be to disable SRQ at runtime with the
> appropriate environment settings. We had seen exactly the same backtrace,
> but so far devel has not found the real cause/fix for it. Just curious -
> what distro/arch are you using ?
>
> cheers.
>  - Christian
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list