[mvapich-discuss] Stuck in wait with blocking connections
Hari Subramoni
subramoni.1 at osu.edu
Thu Feb 18 11:30:06 EST 2016
Hello Maksym,
We're taking a look at this. We will get back to you soon.
Thx,
Hari.
On Wed, Feb 17, 2016 at 9:42 AM, Maksym Planeta <
mplaneta at os.inf.tu-dresden.de> wrote:
> Hi,
>
> I found a situation when a program hangs in MPI_Wait, while waiting for a
> completion of MPI_Igather call.
>
> Here is an example of a program which shows the effect:
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <mpi.h>
>
>
> int main(int argc, char **argv)
> {
> int rank;
> int size;
> MPI_Comm world_dup;
>
> MPI_Init(&argc, &argv);
>
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> MPI_Comm_size(MPI_COMM_WORLD, &size);
>
> printf("%d %d\n", __LINE__, rank);
> fflush(stdout);
> MPI_Comm_dup(MPI_COMM_WORLD, &world_dup);
> MPI_Barrier(world_dup);
> int *array = calloc(size, sizeof(int));
> MPI_Gather(&rank, 1, MPI_INT, array, 1, MPI_INT, 0, world_dup);
> free(array);
>
> MPI_Barrier(MPI_COMM_WORLD);
> printf("%d %d\n", __LINE__, rank);
> fflush(stdout);
> array = calloc(size, sizeof(int));
> MPI_Request request;
> MPI_Igather(&rank, 1, MPI_INT, array, 1, MPI_INT, 0, world_dup,
> &request);
> printf("%d %d\n", __LINE__, rank);
> MPI_Wait(&request, MPI_STATUS_IGNORE);
> free(array);
>
> printf("Hi %d\n", rank);
> fflush(stdout);
> MPI_Finalize();
> }
>
> For reproducing the hang-up it was important to duplicate the
> MPI_COMM_WORLD communicator, use many processes per node, and use
> MPI_Igather. Adding fflush before MPI_Wait allows program to continue.
>
> I tried this out for mvapich-2.2b with no further modifications.
>
> I was using following srun command:
>
> srun --nodes=2 --overcommit --ntasks=384 --distribution=block
> --mem-per-cpu=2500 --cpu_bind=v,none --kill-on-bad-exit --mpi=pmi2
>
> I also used following environmental variables:
>
> export MV2_ON_DEMAND_THRESHOLD=1000
> export MV2_USE_BLOCKING=1
> export MV2_ENABLE_AFFINITY=0
> export MV2_USE_SHARED_MEM=0
> export MV2_RDMA_NUM_EXTRA_POLLS=1
> export MV2_USE_EAGER_FAST_SEND=0
> export MV2_USE_UD_HYBRID=0
> export MV2_SHMEM_BACKED_UD_CM=0
> export MV2_CM_MAX_SPIN_COUNT=1
> export MV2_SPIN_COUNT=1
>
> export MV2_DEBUG_SHOW_BACKTRACE=1
> export MV2_DEBUG_CORESIZE=unlimited
>
> Compilation configuration:
>
> $ mpiname -a
> MVAPICH2 2.2b Mon Nov 12 20:00:00 EST 2015 ch3:mrail
>
> Compilation
> CC: gcc -DNDEBUG -DNVALGRIND -O2
> CXX: g++ -DNDEBUG -DNVALGRIND -O2
> F77: gfortran -O2
> FC: gfortran -O2
>
> Configuration
> --enable-fortran=all --enable-cxx --with-rdma=gen2 --with-device=ch3:mrail
> --enable-alloca --enable-hwloc --disable-dependency-tracking
> --with-pmi=pmi2 --with-pm=slurm
> --with-slurm=/opt/slurm/15.08.6_20151221-0628/ --prefix=<path>
>
> --
> Regards,
> Maksym Planeta
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160218/fdf465f5/attachment.html>
More information about the mvapich-discuss
mailing list