[mvapich-discuss] MPI_Send error

Maksym Planeta mplaneta at os.inf.tu-dresden.de
Wed Jun 1 14:17:28 EDT 2016


Hello,

I have a problem with a program in attachment.

When I start it with 2 processes everything goes OK:

$ srun --mpi=pmi2 -n 2 ./copy -s 1
1: Send 1073741824
1: Send 1073741824
Runtime = 1.523823

But when I start it with 3 processes there happens an error:

$ srun --mpi=pmi2 -n 3 ./copy -s 1
1: Send 536870912
2: Send 536870912
mlx5: taurusi5470: got completion with error:
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00008813 0802fd66 000527d0
[taurusi5470:mpi_rank_1][handle_cqe] Send desc error in msg to 0, 
wc_opcode=0
[taurusi5470:mpi_rank_1][handle_cqe] Msg from 0: wc.status=10, 
wc.wr_id=0x180f170, wc.opcode=0, vbuf->phead->type=32 = 
MPIDI_CH3_PKT_RNDV_REQ_TO_SEND
[taurusi5470:mpi_rank_1][handle_cqe] 
../mvapich2/src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:543: 
[] Got completion with error 10, vendor code=0x88, dest rank=0

[taurusi5469:mpi_rank_0][async_thread] 
../mvapich2/src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:1115: 
Got FATAL event 3

srun: error: taurusi5470: task 1: Exited with exit code 252
srun: error: taurusi5469: task 0: Exited with exit code 255
^Csrun: interrupt (one more within 1 sec to abort)


Any other number of processes works well, except 3. I compiled the 
program with openmpi 1.8.10 and found now such problem whatsoever.

Thus, I assume this may be a bug with mvapich.

Here is the output of mpiname:

$ mpiname -a
MVAPICH2 2.2rc1 Tue Mar 29 22:00:00 EST 2016 ch3:mrail

Compilation
CC: gcc    -g -O0
CXX: g++   -g -O0
F77: gfortran -L/lib -L/lib   -g -O0
FC: gfortran   -g -O0

Configuration
--enable-fortran=all --enable-cxx --enable-error-checking=all 
--enable-error-messages=none --enable-timing=none 
--enable-check-compiler-flags --enable-threads=multiple 
--enable-weak-symbols --disable-dependency-tracking 
--enable-fast-install --disable-rdma-cm --with-pm=mpirun:hydra 
--with-rdma=gen2 --with-device=ch3:mrail --enable-alloca --enable-hwloc 
--disable-fast --enable-g=dbg --enable-error-messages=all 
--enable-error-checking=all --prefix=<prefix>

-- 
Regards,
Maksym Planeta
-------------- next part --------------
A non-text attachment was scrubbed...
Name: copy.c
Type: text/x-csrc
Size: 4398 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160601/4f1399e5/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5174 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160601/4f1399e5/attachment.p7s>


More information about the mvapich-discuss mailing list