[mvapich-discuss] Got FATAL event 0

Adam Moody moody20 at llnl.gov
Fri Sep 19 19:34:37 EDT 2008


Hello MVAPICH team,
I have a user hitting some errors, and I'm hoping you may have some 
insight.  When running with MVAPICH1-0.9.7, the user sees the following 
non-fatal error message on occasion:

    Error getting event!
    [0] Got unknown event 1075841344 (Unknown) ... continuing ...

With 0.9.9 (and PTMALLOC disabled), the user sees the following fatal 
error with the same frequency as the above message:

    [0] Got FATAL event 0 (CQ Error)

This error is detected by the async_thread function in viachek.c.  The 
series of MPI calls the user app has made at this point looks like the 
following:

 MPI_Init(&argc, &argv);
 MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
 MPI_Initialized(&initialized);
 MPI_Comm_size(MPI_COMM_WORLD, &d_size)
 MPI_Comm_rank(MPI_COMM_WORLD, &d_rank)
 MPI_Bcast(const_cast<char*>(d_key), SECURE_KEY_SIZE, MPI_CHAR,
               0, MPI_COMM_WORLD);
 MPI_Bcast(&length, 1, MPI_INT, 0, MPI_COMM_WORLD);
 MPI_Bcast(const_cast<char*>(d_parentUrl.c_str()), length, MPI_CHAR, 0,
                 MPI_COMM_WORLD);
 MPI_Bcast(&length, 1, MPI_INT, 0, MPI_COMM_WORLD);
 MPI_Bcast(const_cast<char*>(d_rank0Url.c_str()), length, MPI_CHAR, 0,
                 MPI_COMM_WORLD);

Have others reported this problem before?  Any idea on how to fix it?
Thanks again,
-Adam


More information about the mvapich-discuss mailing list