[mvapich-discuss] Got FATAL event 0
Adam Moody
moody20 at llnl.gov
Fri Sep 19 19:34:37 EDT 2008
Hello MVAPICH team,
I have a user hitting some errors, and I'm hoping you may have some
insight. When running with MVAPICH1-0.9.7, the user sees the following
non-fatal error message on occasion:
Error getting event!
[0] Got unknown event 1075841344 (Unknown) ... continuing ...
With 0.9.9 (and PTMALLOC disabled), the user sees the following fatal
error with the same frequency as the above message:
[0] Got FATAL event 0 (CQ Error)
This error is detected by the async_thread function in viachek.c. The
series of MPI calls the user app has made at this point looks like the
following:
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
MPI_Initialized(&initialized);
MPI_Comm_size(MPI_COMM_WORLD, &d_size)
MPI_Comm_rank(MPI_COMM_WORLD, &d_rank)
MPI_Bcast(const_cast<char*>(d_key), SECURE_KEY_SIZE, MPI_CHAR,
0, MPI_COMM_WORLD);
MPI_Bcast(&length, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Bcast(const_cast<char*>(d_parentUrl.c_str()), length, MPI_CHAR, 0,
MPI_COMM_WORLD);
MPI_Bcast(&length, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Bcast(const_cast<char*>(d_rank0Url.c_str()), length, MPI_CHAR, 0,
MPI_COMM_WORLD);
Have others reported this problem before? Any idea on how to fix it?
Thanks again,
-Adam
More information about the mvapich-discuss
mailing list