[mvapich-discuss] MPI_Send error

Hari Subramoni subramoni.1 at osu.edu
Wed Jun 1 14:43:52 EDT 2016


Hello Maksym,

Thank you for the report. We will take a look at it and get back to you.

Thx,
Hari.

On Wed, Jun 1, 2016 at 2:17 PM, Maksym Planeta <
mplaneta at os.inf.tu-dresden.de> wrote:

> Hello,
>
> I have a problem with a program in attachment.
>
> When I start it with 2 processes everything goes OK:
>
> $ srun --mpi=pmi2 -n 2 ./copy -s 1
> 1: Send 1073741824
> 1: Send 1073741824
> Runtime = 1.523823
>
> But when I start it with 3 processes there happens an error:
>
> $ srun --mpi=pmi2 -n 3 ./copy -s 1
> 1: Send 536870912
> 2: Send 536870912
> mlx5: taurusi5470: got completion with error:
> 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000
> 00000000 00008813 0802fd66 000527d0
> [taurusi5470:mpi_rank_1][handle_cqe] Send desc error in msg to 0,
> wc_opcode=0
> [taurusi5470:mpi_rank_1][handle_cqe] Msg from 0: wc.status=10,
> wc.wr_id=0x180f170, wc.opcode=0, vbuf->phead->type=32 =
> MPIDI_CH3_PKT_RNDV_REQ_TO_SEND
> [taurusi5470:mpi_rank_1][handle_cqe]
> ../mvapich2/src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:543:
> [] Got completion with error 10, vendor code=0x88, dest rank=0
>
> [taurusi5469:mpi_rank_0][async_thread]
> ../mvapich2/src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:1115:
> Got FATAL event 3
>
> srun: error: taurusi5470: task 1: Exited with exit code 252
> srun: error: taurusi5469: task 0: Exited with exit code 255
> ^Csrun: interrupt (one more within 1 sec to abort)
>
>
> Any other number of processes works well, except 3. I compiled the program
> with openmpi 1.8.10 and found now such problem whatsoever.
>
> Thus, I assume this may be a bug with mvapich.
>
> Here is the output of mpiname:
>
> $ mpiname -a
> MVAPICH2 2.2rc1 Tue Mar 29 22:00:00 EST 2016 ch3:mrail
>
> Compilation
> CC: gcc    -g -O0
> CXX: g++   -g -O0
> F77: gfortran -L/lib -L/lib   -g -O0
> FC: gfortran   -g -O0
>
> Configuration
> --enable-fortran=all --enable-cxx --enable-error-checking=all
> --enable-error-messages=none --enable-timing=none
> --enable-check-compiler-flags --enable-threads=multiple
> --enable-weak-symbols --disable-dependency-tracking --enable-fast-install
> --disable-rdma-cm --with-pm=mpirun:hydra --with-rdma=gen2
> --with-device=ch3:mrail --enable-alloca --enable-hwloc --disable-fast
> --enable-g=dbg --enable-error-messages=all --enable-error-checking=all
> --prefix=<prefix>
>
> --
> Regards,
> Maksym Planeta
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160601/af74b354/attachment-0001.html>


More information about the mvapich-discuss mailing list