[mvapich-discuss] hang in iprobe in psm channel

Hari Subramoni subramoni.1 at osu.edu
Tue Oct 18 18:33:23 EDT 2016


Hi Adam,

Yes - this is the correct fix. There was a recent post on mvapich-discuss
from John Westlund @ Intel who faced the same issue. We were able to
resolve it with a variant of this fix. We've already taken it into mvapich2
code base and will be available with the next release.

Regards,
Hari.

On Oct 18, 2016 6:25 PM, "Adam T. Moody" <moody20 at llnl.gov> wrote:

> Hello MVAPICH team,
> We're seeing a hang in the PSM channel in MVAPICH2-2.2 with a simple
> iprobe reproducer.  This hang does not happen in PSM2. Running the example
> program below with two procs on two nodes will hang.
>
> I found that it does not hang if one changes the reproducer to specify the
> source rank (instead of using MPI_ANY_SOURCE).
>
> Also, I found that it does not hang if I change lines 319-320 of
> src/mpid/ch3/channels/psm/src/psm_queue.cthe from:
>         if(unlikely(src == MPI_ANY_SOURCE))
>             rtagsel = MQ_TAGSEL_ANY_SOURCE;
> to:
>         if(unlikely(src == MPI_ANY_SOURCE))
>             rtagsel = rtagsel & MQ_TAGSEL_ANY_SOURCE;
>
> It looks like the order of the ANY SOURCE and ANY TAG lines were switched
> going from MV2-2.1 to MV2-2.2.  People are reporting this hang starting
> with MV2-2.2.  It looks like it may have worked in MV2-2.1 since ANY SOURCE
> was setting the value of rtagsel and ANY TAG was modifying this setting,
> whereas now the ANY SOURCE line overrides any bits rtagsel may have set.
>
> Can you verify whether this is the right fix?
> Thanks,
> -Adam
>
>
> #include <mpi.h>
> #include <stdio.h>
> #include <stdlib.h>
>
> int main( int argc, char *argv[] )
> {
>     int rank, nproc;
>     int dest, tag, val;
>     int flag = 0;
>     MPI_Status status;
>
>     MPI_Init(&argc, &argv);
>     MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>     MPI_Comm_size(MPI_COMM_WORLD, &nproc);
>
>     if (nproc != 2) {
>       fprintf(stderr, "# of procs must be 2\n");
>       exit(0);
>     }
>
>     if (rank == 0) {
>       while (!flag) MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG,
> MPI_COMM_WORLD, &flag, &status);
>       fprintf(stderr, "Rank %d probed   (tag: %d) from Rank %d\n", rank,
> status.MPI_TAG, status.MPI_SOURCE);
>       MPI_Recv(&val, 1, MPI_INT, status.MPI_SOURCE, status.MPI_TAG,
> MPI_COMM_WORLD, &status);
>       fprintf(stderr, "Rank %d received (val: %d, tag: %d) from Rank
> %d\n", rank, val, status.MPI_TAG, status.MPI_SOURCE);
>     } else {
>       dest = 0; tag = 1; val = 2;
>       MPI_Send(&val, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
>       fprintf(stderr, "Rank %d sent (val: %d, tag: %d) to Rank %d\n",
> rank, val, tag, dest);
>     }
>
>     MPI_Finalize();
>     return 0;
> }
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20161018/2539b339/attachment-0001.html>


More information about the mvapich-discuss mailing list