[mvapich-discuss] hang in iprobe in psm channel
Hari Subramoni
subramoni.1 at osu.edu
Tue Oct 18 18:33:23 EDT 2016
Hi Adam,
Yes - this is the correct fix. There was a recent post on mvapich-discuss
from John Westlund @ Intel who faced the same issue. We were able to
resolve it with a variant of this fix. We've already taken it into mvapich2
code base and will be available with the next release.
Regards,
Hari.
On Oct 18, 2016 6:25 PM, "Adam T. Moody" <moody20 at llnl.gov> wrote:
> Hello MVAPICH team,
> We're seeing a hang in the PSM channel in MVAPICH2-2.2 with a simple
> iprobe reproducer. This hang does not happen in PSM2. Running the example
> program below with two procs on two nodes will hang.
>
> I found that it does not hang if one changes the reproducer to specify the
> source rank (instead of using MPI_ANY_SOURCE).
>
> Also, I found that it does not hang if I change lines 319-320 of
> src/mpid/ch3/channels/psm/src/psm_queue.cthe from:
> if(unlikely(src == MPI_ANY_SOURCE))
> rtagsel = MQ_TAGSEL_ANY_SOURCE;
> to:
> if(unlikely(src == MPI_ANY_SOURCE))
> rtagsel = rtagsel & MQ_TAGSEL_ANY_SOURCE;
>
> It looks like the order of the ANY SOURCE and ANY TAG lines were switched
> going from MV2-2.1 to MV2-2.2. People are reporting this hang starting
> with MV2-2.2. It looks like it may have worked in MV2-2.1 since ANY SOURCE
> was setting the value of rtagsel and ANY TAG was modifying this setting,
> whereas now the ANY SOURCE line overrides any bits rtagsel may have set.
>
> Can you verify whether this is the right fix?
> Thanks,
> -Adam
>
>
> #include <mpi.h>
> #include <stdio.h>
> #include <stdlib.h>
>
> int main( int argc, char *argv[] )
> {
> int rank, nproc;
> int dest, tag, val;
> int flag = 0;
> MPI_Status status;
>
> MPI_Init(&argc, &argv);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> MPI_Comm_size(MPI_COMM_WORLD, &nproc);
>
> if (nproc != 2) {
> fprintf(stderr, "# of procs must be 2\n");
> exit(0);
> }
>
> if (rank == 0) {
> while (!flag) MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG,
> MPI_COMM_WORLD, &flag, &status);
> fprintf(stderr, "Rank %d probed (tag: %d) from Rank %d\n", rank,
> status.MPI_TAG, status.MPI_SOURCE);
> MPI_Recv(&val, 1, MPI_INT, status.MPI_SOURCE, status.MPI_TAG,
> MPI_COMM_WORLD, &status);
> fprintf(stderr, "Rank %d received (val: %d, tag: %d) from Rank
> %d\n", rank, val, status.MPI_TAG, status.MPI_SOURCE);
> } else {
> dest = 0; tag = 1; val = 2;
> MPI_Send(&val, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
> fprintf(stderr, "Rank %d sent (val: %d, tag: %d) to Rank %d\n",
> rank, val, tag, dest);
> }
>
> MPI_Finalize();
> return 0;
> }
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20161018/2539b339/attachment-0001.html>
More information about the mvapich-discuss
mailing list