[mvapich-discuss] Bug: deadlock between ibv_destroy_srq andasync_thread

David_Kewley at Dell.com David_Kewley at Dell.com
Sat May 24 14:38:25 EDT 2008


> -----Original Message-----
> From: Christian Guggenberger
[mailto:christian.guggenberger at rzg.mpg.de]
> Sent: Saturday, May 24, 2008 5:11 AM
> 
> On Fri, May 23, 2008 at 09:23:37PM -0500, David_Kewley at dell.com wrote:
> >
> > #0  0x00000036b2608b3a in pthread_cond_wait@@GLIBC_2.3.2 () from
> > /lib64/tls/libpthread.so.0
> > #1  0x0000002a9595405b in ibv_cmd_destroy_srq (srq=0x82b370) at
> > src/cmd.c:582
> > #2  0x0000002a962b5419 in mthca_destroy_srq (srq=0x82b3bc) at
> > src/verbs.c:475
> > #3  0x0000002a9564878e in MPIDI_CH3I_CM_Finalize () from
> > /opt/mvapich2/1.0.1/intel/10.1.015/lib/libmpich.so
> > #4  0x0000002a955c053b in MPIDI_CH3_Finalize () from
> > /opt/mvapich2/1.0.1/intel/10.1.015/lib/libmpich.so
> > #5  0x0000002a95626202 in MPID_Finalize () from
> > /opt/mvapich2/1.0.1/intel/10.1.015/lib/libmpich.so
> > #6  0x0000002a955f7fee in PMPI_Finalize () from
> > /opt/mvapich2/1.0.1/intel/10.1.015/lib/libmpich.so
> > #7  0x0000002a955f7eae in pmpi_finalize_ () from
> > /opt/mvapich2/1.0.1/intel/10.1.015/lib/libmpich.so
> > #8  0x0000000000459ff8 in stoprog_ ()
> > #9  0x000000000047afa6 in MAIN__ ()
> > #10 0x0000000000405d62 in main ()
> >
> > I think the fix is to add some sort of synchronization between
> > async_thread() and the code that calls the pthread_cancel() on it.
To
> > the
> > MVAPICH developers: Do you think you can work up a fix soon, and
forward
> > the patch for testing?
> 
> a short-term workaround would be to disable SRQ at runtime with the
> appropriate environment settings. We had seen exactly the same
backtrace,
> but so far devel has not found the real cause/fix for it. Just curious
-
> what distro/arch are you using ?

Thanks for the suggestion!  Looking at the manual, I see that's done
with

  mpiexec ... -env MV2_USE_SRQ 0 <program>

We're using RHEL4 x86_64 with a recent but locally modified RHEL4
kernel.

David




More information about the mvapich-discuss mailing list