[mvapich-discuss] race in mvapich-0.9.9 cm_create_rc_qp() with viadev.connections==NULL

Matthew Koop koop at cse.ohio-state.edu
Wed Apr 23 11:27:12 EDT 2008


John,

Thanks for reporting this problem and looking into a possible solution.
This does appear to be a race condition in the initialization of
viadev.connections. We'll add this as a bug report and fix this in the
very near future.

Thanks again,

Matt

On Tue, 22 Apr 2008, John Hawkes wrote:

> I've encountered a race condition in mvapich-0.9.9 (also exists in
> mvapich-1.0) in cm_create_rc_qp() (mpid/ch_gen2/cm.c).  On occasion,
> under conditions of dozens of threads starting up, cm_create_rc_qp()
> encounters viadev.connections==NULL.
>
> I believe the problem stems from the ordering of initialization.  The
> main viainit.c calls:
>     if (MPICM_Connect_UD(viadev.ud_qpn_table, viadev.lid_table)) {
>         error_abort_all(GEN_EXIT_ERR, "MPICM_Connect_UD");
>     }
> and soon thereafter it initializes viadev.connections.  Meanwhile,
> MPICM_Connect_UD() has done a pthread_create() of cm_completion_handler
> ().  That concurrently executing thread handles incoming messages, one
> of which may get to cm_accept(), which then calls cm_create_rc_qp(),
> which may dereference viadev.connections before the main thread has
> initialized it.
>
> I seem to be able to avoid this race condition by moving the call to
> MPICM_Connect_UD() to follow the initialization of viadev.connections.
> Does this fix create other problems that my current testing has not yet
> encountered?
>
> John Hawkes
> jhawkes at PenguinComputing.com
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list