[mvapich-discuss] MPI_Send error
Maksym Planeta
mplaneta at os.inf.tu-dresden.de
Wed Jun 1 14:17:28 EDT 2016
Hello,
I have a problem with a program in attachment.
When I start it with 2 processes everything goes OK:
$ srun --mpi=pmi2 -n 2 ./copy -s 1
1: Send 1073741824
1: Send 1073741824
Runtime = 1.523823
But when I start it with 3 processes there happens an error:
$ srun --mpi=pmi2 -n 3 ./copy -s 1
1: Send 536870912
2: Send 536870912
mlx5: taurusi5470: got completion with error:
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00008813 0802fd66 000527d0
[taurusi5470:mpi_rank_1][handle_cqe] Send desc error in msg to 0,
wc_opcode=0
[taurusi5470:mpi_rank_1][handle_cqe] Msg from 0: wc.status=10,
wc.wr_id=0x180f170, wc.opcode=0, vbuf->phead->type=32 =
MPIDI_CH3_PKT_RNDV_REQ_TO_SEND
[taurusi5470:mpi_rank_1][handle_cqe]
../mvapich2/src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:543:
[] Got completion with error 10, vendor code=0x88, dest rank=0
[taurusi5469:mpi_rank_0][async_thread]
../mvapich2/src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:1115:
Got FATAL event 3
srun: error: taurusi5470: task 1: Exited with exit code 252
srun: error: taurusi5469: task 0: Exited with exit code 255
^Csrun: interrupt (one more within 1 sec to abort)
Any other number of processes works well, except 3. I compiled the
program with openmpi 1.8.10 and found now such problem whatsoever.
Thus, I assume this may be a bug with mvapich.
Here is the output of mpiname:
$ mpiname -a
MVAPICH2 2.2rc1 Tue Mar 29 22:00:00 EST 2016 ch3:mrail
Compilation
CC: gcc -g -O0
CXX: g++ -g -O0
F77: gfortran -L/lib -L/lib -g -O0
FC: gfortran -g -O0
Configuration
--enable-fortran=all --enable-cxx --enable-error-checking=all
--enable-error-messages=none --enable-timing=none
--enable-check-compiler-flags --enable-threads=multiple
--enable-weak-symbols --disable-dependency-tracking
--enable-fast-install --disable-rdma-cm --with-pm=mpirun:hydra
--with-rdma=gen2 --with-device=ch3:mrail --enable-alloca --enable-hwloc
--disable-fast --enable-g=dbg --enable-error-messages=all
--enable-error-checking=all --prefix=<prefix>
--
Regards,
Maksym Planeta
-------------- next part --------------
A non-text attachment was scrubbed...
Name: copy.c
Type: text/x-csrc
Size: 4398 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160601/4f1399e5/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5174 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160601/4f1399e5/attachment.p7s>
More information about the mvapich-discuss
mailing list