[mvapich-discuss] race in mvapich-0.9.9 cm_create_rc_qp() with viadev.connections==NULL

Matthew Koop koop at cse.ohio-state.edu
Fri May 2 16:30:15 EDT 2008


John,

We've now fixed this issue in the MVAPICH SVN (trunk and 1.0 branches). It
will be in the upcoming MVAPICH minor release as well. For 0.9.9 your
suggested fix should work there as well.

Thanks again for pointing this issue out.

Matt

On Wed, 23 Apr 2008, Matthew Koop wrote:

> John,
>
> Thanks for reporting this problem and looking into a possible solution.
> This does appear to be a race condition in the initialization of
> viadev.connections. We'll add this as a bug report and fix this in the
> very near future.
>
> Thanks again,
>
> Matt
>
> On Tue, 22 Apr 2008, John Hawkes wrote:
>
> > I've encountered a race condition in mvapich-0.9.9 (also exists in
> > mvapich-1.0) in cm_create_rc_qp() (mpid/ch_gen2/cm.c).  On occasion,
> > under conditions of dozens of threads starting up, cm_create_rc_qp()
> > encounters viadev.connections==NULL.
> >
> > I believe the problem stems from the ordering of initialization.  The
> > main viainit.c calls:
> >     if (MPICM_Connect_UD(viadev.ud_qpn_table, viadev.lid_table)) {
> >         error_abort_all(GEN_EXIT_ERR, "MPICM_Connect_UD");
> >     }
> > and soon thereafter it initializes viadev.connections.  Meanwhile,
> > MPICM_Connect_UD() has done a pthread_create() of cm_completion_handler
> > ().  That concurrently executing thread handles incoming messages, one
> > of which may get to cm_accept(), which then calls cm_create_rc_qp(),
> > which may dereference viadev.connections before the main thread has
> > initialized it.
> >
> > I seem to be able to avoid this race condition by moving the call to
> > MPICM_Connect_UD() to follow the initialization of viadev.connections.
> > Does this fix create other problems that my current testing has not yet
> > encountered?
> >
> > John Hawkes
> > jhawkes at PenguinComputing.com
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list