[mvapich-discuss] Stuck in waitall
Maksym Planeta
mplaneta at os.inf.tu-dresden.de
Thu Sep 22 07:11:24 EDT 2016
Hello,
I have MPI_Reduce which I want to replace with a non-blocking collective.
Each rank passes a double, and I expect to gather sum on rank 0. The
reduce I call as following:
MPI_Reduce(&ts->time, &ts->sum, 1, MPI_DOUBLE, MPI_SUM,
0, scr_comm_node);
scr_comm_node is a communicator which unites ranks of the same node.
I replaced this call with combination of ireduce, test, and wait, but
the code was stuck all the time.
I started to simplify the code and ended up with these MPI calls which
basically follow each other:
MPI_Ireduce(&ts->time, &ts->sum, 1, MPI_DOUBLE, MPI_SUM,
0, scr_comm_node, &ts->request[ts->num_req++]);
...
MPI_Waitall(ts->num_req, &ts->request[0], &ts->status[0]);
... <Then follow collectives on other communicators>
And looking different stack traces I see that at some of the nodes there
are ranks, which can't simply leave this MPI_Waitall.
Am I doing something wrong here, or does it look like a bug?
I set following environment variables:
export MV2_USE_BLOCKING=1
# I set affinity on my own, because I have 2 processes per CPU
export MV2_ENABLE_AFFINITY=0
export MV2_RDMA_NUM_EXTRA_POLLS=1
export MV2_CM_MAX_SPIN_COUNT=1
export MV2_SPIN_COUNT=1
mpiname output:
MVAPICH2 2.2 Thu Sep 08 22:00:00 EST 2016 ch3:mrail
Compilation
CC: gcc -g -O0
CXX: g++ -g -O0
F77: gfortran -L/lib -L/lib -g -O0
FC: gfortran -g -O0
Configuration
--enable-fortran=all --enable-cxx --enable-error-checking=all
--enable-error-messages=none --enable-timing=none
--enable-check-compiler-flags --enable-threads=multiple
--enable-weak-symbols --disable-dependency-tracking
--enable-fast-install --disable-rdma-cm --with-pm=mpirun:hydra
--with-rdma=gen2 --with-device=ch3:mrail --enable-alloca --enable-hwloc
--disable-fast --enable-g=dbg --enable-error-messages=all
--enable-error-checking=all
--prefix=/home/s9951545/apps.taurus/mvapich2/2.2-mpirun-dbg/
--
Regards,
Maksym Planeta
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5174 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160922/157ae855/attachment-0001.p7s>
More information about the mvapich-discuss
mailing list