[mvapich-discuss] Stuck in wait with blocking connections

Hari Subramoni subramoni.1 at osu.edu
Thu Feb 18 11:30:06 EST 2016


Hello Maksym,

We're taking a look at this. We will get back to you soon.

Thx,
Hari.

On Wed, Feb 17, 2016 at 9:42 AM, Maksym Planeta <
mplaneta at os.inf.tu-dresden.de> wrote:

> Hi,
>
> I found a situation when a program hangs in MPI_Wait, while waiting for a
> completion of MPI_Igather call.
>
> Here is an example of a program which shows the effect:
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <mpi.h>
>
>
> int main(int argc, char **argv)
> {
>   int rank;
>   int size;
>   MPI_Comm world_dup;
>
>   MPI_Init(&argc, &argv);
>
>   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>   MPI_Comm_size(MPI_COMM_WORLD, &size);
>
>   printf("%d %d\n", __LINE__, rank);
>   fflush(stdout);
>   MPI_Comm_dup(MPI_COMM_WORLD, &world_dup);
>   MPI_Barrier(world_dup);
>   int *array = calloc(size, sizeof(int));
>   MPI_Gather(&rank, 1, MPI_INT, array, 1, MPI_INT, 0, world_dup);
>   free(array);
>
>   MPI_Barrier(MPI_COMM_WORLD);
>   printf("%d %d\n", __LINE__, rank);
>   fflush(stdout);
>   array = calloc(size, sizeof(int));
>   MPI_Request request;
>   MPI_Igather(&rank, 1, MPI_INT, array, 1, MPI_INT, 0, world_dup,
>                 &request);
>   printf("%d %d\n", __LINE__, rank);
>   MPI_Wait(&request, MPI_STATUS_IGNORE);
>   free(array);
>
>   printf("Hi %d\n", rank);
>   fflush(stdout);
>   MPI_Finalize();
> }
>
> For reproducing the hang-up it was important to duplicate the
> MPI_COMM_WORLD communicator, use many processes per node, and use
> MPI_Igather. Adding fflush before MPI_Wait allows program to continue.
>
> I tried this out for mvapich-2.2b with no further modifications.
>
> I was using following srun command:
>
> srun --nodes=2 --overcommit --ntasks=384 --distribution=block
> --mem-per-cpu=2500 --cpu_bind=v,none --kill-on-bad-exit --mpi=pmi2
>
> I also used following environmental variables:
>
> export MV2_ON_DEMAND_THRESHOLD=1000
> export MV2_USE_BLOCKING=1
> export MV2_ENABLE_AFFINITY=0
> export MV2_USE_SHARED_MEM=0
> export MV2_RDMA_NUM_EXTRA_POLLS=1
> export MV2_USE_EAGER_FAST_SEND=0
> export MV2_USE_UD_HYBRID=0
> export MV2_SHMEM_BACKED_UD_CM=0
> export MV2_CM_MAX_SPIN_COUNT=1
> export MV2_SPIN_COUNT=1
>
> export MV2_DEBUG_SHOW_BACKTRACE=1
> export MV2_DEBUG_CORESIZE=unlimited
>
> Compilation configuration:
>
> $ mpiname -a
> MVAPICH2 2.2b Mon Nov 12 20:00:00 EST 2015 ch3:mrail
>
> Compilation
> CC: gcc    -DNDEBUG -DNVALGRIND -O2
> CXX: g++   -DNDEBUG -DNVALGRIND -O2
> F77: gfortran   -O2
> FC: gfortran   -O2
>
> Configuration
> --enable-fortran=all --enable-cxx --with-rdma=gen2 --with-device=ch3:mrail
> --enable-alloca --enable-hwloc --disable-dependency-tracking
> --with-pmi=pmi2 --with-pm=slurm
> --with-slurm=/opt/slurm/15.08.6_20151221-0628/ --prefix=<path>
>
> --
> Regards,
> Maksym Planeta
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160218/fdf465f5/attachment.html>


More information about the mvapich-discuss mailing list