[mvapich-discuss] Hangs in blocking progress mode
Georg Geiser
Georg.Geiser at dlr.de
Fri Aug 30 10:59:35 EDT 2019
I am facing several hangs when using blocking mode progress via
MV2_USE_BLOCKING=1. Here is a first standalone reproducer:
#include <mpi.h>
#include <stdlib.h>
int main(int argc, char* argv[]) {
double data = 1;
MPI_Request request;
MPI_Init(&argc, &argv);
MPI_Iallreduce(MPI_IN_PLACE, &data, 1, MPI_DOUBLE, MPI_SUM,
MPI_COMM_WORLD, &request);
MPI_Wait(&request, MPI_STATUS_IGNORE);
MPI_Finalize();
return EXIT_SUCCESS;
}
This hangs with:
MV2_USE_BLOCKING=1 MV2_SPIN_COUNT=0 mpirun -np 1
MV2_USE_BLOCKING=1 MV2_SPIN_COUNT=1 mpirun -np 1
and succeeds with:
MV2_USE_BLOCKING=1 MV2_SPIN_COUNT=2 mpirun -np 1
as well as with higher -np values and arbitrary values of MV2_SPIN_COUNT.
Well, this allreduce is actually a noop for -np 1, but in my real
application there are also several random hangs where actual
communication is performed and MV2_SPIN_COUNT defaults to 5000. Choosing
higher spin counts also helps to proceed, but in my opinion MVAPICH
should always progress at MV2_SPIN_COUNT=0.
I was not yet able to track down the problem on myself. However, when I
inspected MPIDI_CH3I_MRAILI_Cq_poll_ib() I possibly stumbled upon an
additional issue. This function loops over all available HCAs (using the
i variable), but will block on only a single HCA when
rdma_blocking_spin_count_threshold is exceeded. I guess it should
instead block on all HCAs to ensure that all events are captured.
Though, this would require one separate thread for each blocked HCA.
However, I only have one HCA so this is not the reason for my hangs.
It seems like the problem has already been reported here:
http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2016-September/006178.html
Though, there seems to be no progress in fixing it.
Here is the output from mpiname -a:
MVAPICH2 2.3.2 Fri August 9 22:00:00 EST 2019 ch3:mrail
Compilation
CC: /export/opt/gcc-8.2.0/bin/gcc -DNDEBUG -DNVALGRIND -O2
CXX: /export/opt/gcc-8.2.0/bin/g++ -DNDEBUG -DNVALGRIND -O2
F77: /export/opt/gcc-8.2.0/bin/gfortran -L/lib -L/lib -O2
FC: /export/opt/gcc-8.2.0/bin/gfortran -O2
Configuration
--enable-threads=funneled --disable-mcast
Georg
More information about the mvapich-discuss
mailing list