[mvapich-discuss] Bug: deadlock between ibv_destroy_srq
andasync_thread
David_Kewley at Dell.com
David_Kewley at Dell.com
Sat May 24 14:38:25 EDT 2008
> -----Original Message-----
> From: Christian Guggenberger
[mailto:christian.guggenberger at rzg.mpg.de]
> Sent: Saturday, May 24, 2008 5:11 AM
>
> On Fri, May 23, 2008 at 09:23:37PM -0500, David_Kewley at dell.com wrote:
> >
> > #0 0x00000036b2608b3a in pthread_cond_wait@@GLIBC_2.3.2 () from
> > /lib64/tls/libpthread.so.0
> > #1 0x0000002a9595405b in ibv_cmd_destroy_srq (srq=0x82b370) at
> > src/cmd.c:582
> > #2 0x0000002a962b5419 in mthca_destroy_srq (srq=0x82b3bc) at
> > src/verbs.c:475
> > #3 0x0000002a9564878e in MPIDI_CH3I_CM_Finalize () from
> > /opt/mvapich2/1.0.1/intel/10.1.015/lib/libmpich.so
> > #4 0x0000002a955c053b in MPIDI_CH3_Finalize () from
> > /opt/mvapich2/1.0.1/intel/10.1.015/lib/libmpich.so
> > #5 0x0000002a95626202 in MPID_Finalize () from
> > /opt/mvapich2/1.0.1/intel/10.1.015/lib/libmpich.so
> > #6 0x0000002a955f7fee in PMPI_Finalize () from
> > /opt/mvapich2/1.0.1/intel/10.1.015/lib/libmpich.so
> > #7 0x0000002a955f7eae in pmpi_finalize_ () from
> > /opt/mvapich2/1.0.1/intel/10.1.015/lib/libmpich.so
> > #8 0x0000000000459ff8 in stoprog_ ()
> > #9 0x000000000047afa6 in MAIN__ ()
> > #10 0x0000000000405d62 in main ()
> >
> > I think the fix is to add some sort of synchronization between
> > async_thread() and the code that calls the pthread_cancel() on it.
To
> > the
> > MVAPICH developers: Do you think you can work up a fix soon, and
forward
> > the patch for testing?
>
> a short-term workaround would be to disable SRQ at runtime with the
> appropriate environment settings. We had seen exactly the same
backtrace,
> but so far devel has not found the real cause/fix for it. Just curious
-
> what distro/arch are you using ?
Thanks for the suggestion! Looking at the manual, I see that's done
with
mpiexec ... -env MV2_USE_SRQ 0 <program>
We're using RHEL4 x86_64 with a recent but locally modified RHEL4
kernel.
David
More information about the mvapich-discuss
mailing list