[mvapich-discuss] hang in iprobe in psm channel

Adam T. Moody moody20 at llnl.gov
Tue Oct 18 18:25:24 EDT 2016


Hello MVAPICH team,
We're seeing a hang in the PSM channel in MVAPICH2-2.2 with a simple 
iprobe reproducer.  This hang does not happen in PSM2. Running the 
example program below with two procs on two nodes will hang.

I found that it does not hang if one changes the reproducer to specify 
the source rank (instead of using MPI_ANY_SOURCE).

Also, I found that it does not hang if I change lines 319-320 of 
src/mpid/ch3/channels/psm/src/psm_queue.cthe from:
         if(unlikely(src == MPI_ANY_SOURCE))
             rtagsel = MQ_TAGSEL_ANY_SOURCE;
to:
         if(unlikely(src == MPI_ANY_SOURCE))
             rtagsel = rtagsel & MQ_TAGSEL_ANY_SOURCE;

It looks like the order of the ANY SOURCE and ANY TAG lines were 
switched going from MV2-2.1 to MV2-2.2.  People are reporting this hang 
starting with MV2-2.2.  It looks like it may have worked in MV2-2.1 
since ANY SOURCE was setting the value of rtagsel and ANY TAG was 
modifying this setting, whereas now the ANY SOURCE line overrides any 
bits rtagsel may have set.

Can you verify whether this is the right fix?
Thanks,
-Adam


#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

int main( int argc, char *argv[] )
{
     int rank, nproc;
     int dest, tag, val;
     int flag = 0;
     MPI_Status status;

     MPI_Init(&argc, &argv);
     MPI_Comm_rank(MPI_COMM_WORLD, &rank);
     MPI_Comm_size(MPI_COMM_WORLD, &nproc);

     if (nproc != 2) {
       fprintf(stderr, "# of procs must be 2\n");
       exit(0);
     }

     if (rank == 0) {
       while (!flag) MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &flag, &status);
       fprintf(stderr, "Rank %d probed   (tag: %d) from Rank %d\n", rank, status.MPI_TAG, status.MPI_SOURCE);
       MPI_Recv(&val, 1, MPI_INT, status.MPI_SOURCE, status.MPI_TAG, MPI_COMM_WORLD, &status);
       fprintf(stderr, "Rank %d received (val: %d, tag: %d) from Rank %d\n", rank, val, status.MPI_TAG, status.MPI_SOURCE);
     } else {
       dest = 0; tag = 1; val = 2;
       MPI_Send(&val, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
       fprintf(stderr, "Rank %d sent (val: %d, tag: %d) to Rank %d\n", rank, val, tag, dest);
     }

     MPI_Finalize();
     return 0;
}



More information about the mvapich-discuss mailing list