[mvapich-discuss] Stuck in waitall

Hari Subramoni subramoni.1 at osu.edu
Thu Sep 22 09:54:54 EDT 2016


Thanks for the report. Sorry to hear that you are facing issues. Do you
have a reproducer that we could use? Does the hang go away of you remove
any of the environment variables?

Thanks,
Hari.

On Sep 22, 2016 7:11 AM, "Maksym Planeta" <mplaneta at os.inf.tu-dresden.de>
wrote:

> Hello,
>
> I have MPI_Reduce which I want to replace with a non-blocking collective.
>
> Each rank passes a double, and I expect to gather sum on rank 0. The
> reduce I call as following:
>
>        MPI_Reduce(&ts->time, &ts->sum, 1, MPI_DOUBLE, MPI_SUM,
>                    0, scr_comm_node);
>
> scr_comm_node is a communicator which unites ranks of the same node.
>
> I replaced this call with combination of ireduce, test, and wait, but the
> code was stuck all the time.
>
> I started to simplify the code and ended up with these MPI calls which
> basically follow each other:
>
>        MPI_Ireduce(&ts->time, &ts->sum, 1, MPI_DOUBLE, MPI_SUM,
>                    0, scr_comm_node, &ts->request[ts->num_req++]);
>        ...
>        MPI_Waitall(ts->num_req, &ts->request[0], &ts->status[0]);
>
>        ... <Then follow collectives on other communicators>
>
> And looking different stack traces I see that at some of the nodes there
> are ranks, which can't simply leave this MPI_Waitall.
>
> Am I doing something wrong here, or does it look like a bug?
>
> I set following environment variables:
>
> export MV2_USE_BLOCKING=1
> # I set affinity on my own, because I have 2 processes per CPU
> export MV2_ENABLE_AFFINITY=0
> export MV2_RDMA_NUM_EXTRA_POLLS=1
> export MV2_CM_MAX_SPIN_COUNT=1
> export MV2_SPIN_COUNT=1
>
>
> mpiname output:
>
> MVAPICH2 2.2 Thu Sep 08 22:00:00 EST 2016 ch3:mrail
>
> Compilation
> CC: gcc    -g -O0
> CXX: g++   -g -O0
> F77: gfortran -L/lib -L/lib   -g -O0
> FC: gfortran   -g -O0
>
> Configuration
> --enable-fortran=all --enable-cxx --enable-error-checking=all
> --enable-error-messages=none --enable-timing=none
> --enable-check-compiler-flags --enable-threads=multiple
> --enable-weak-symbols --disable-dependency-tracking --enable-fast-install
> --disable-rdma-cm --with-pm=mpirun:hydra --with-rdma=gen2
> --with-device=ch3:mrail --enable-alloca --enable-hwloc --disable-fast
> --enable-g=dbg --enable-error-messages=all --enable-error-checking=all
> --prefix=/home/s9951545/apps.taurus/mvapich2/2.2-mpirun-dbg/
>
>
>
> --
> Regards,
> Maksym Planeta
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160922/264cac2c/attachment.html>


More information about the mvapich-discuss mailing list