[mvapich-discuss] race in mvapich-0.9.9 cm_create_rc_qp() with viadev.connections==NULL

John Hawkes jhawkes at penguincomputing.com
Tue Apr 22 13:01:19 EDT 2008


I've encountered a race condition in mvapich-0.9.9 (also exists in
mvapich-1.0) in cm_create_rc_qp() (mpid/ch_gen2/cm.c).  On occasion,
under conditions of dozens of threads starting up, cm_create_rc_qp()
encounters viadev.connections==NULL.

I believe the problem stems from the ordering of initialization.  The
main viainit.c calls:
    if (MPICM_Connect_UD(viadev.ud_qpn_table, viadev.lid_table)) {
        error_abort_all(GEN_EXIT_ERR, "MPICM_Connect_UD");
    }
and soon thereafter it initializes viadev.connections.  Meanwhile,
MPICM_Connect_UD() has done a pthread_create() of cm_completion_handler
().  That concurrently executing thread handles incoming messages, one
of which may get to cm_accept(), which then calls cm_create_rc_qp(),
which may dereference viadev.connections before the main thread has
initialized it.

I seem to be able to avoid this race condition by moving the call to
MPICM_Connect_UD() to follow the initialization of viadev.connections.
Does this fix create other problems that my current testing has not yet
encountered?

John Hawkes
jhawkes at PenguinComputing.com



More information about the mvapich-discuss mailing list