[mvapich-discuss] unrecognized protocol for send/recv over 8KB
Brian Budge
brian.budge at gmail.com
Thu Jan 3 20:46:15 EST 2008
Hi all -
I'm new to the list here... hi! I have been using OpenMPI for a while, and
LAM before that, but new requirements keep pushing me to new
implementations. In particular, I was interested in using infiniband (using
OFED 1.2.5.1) in a multi-threaded environment. It seems that MVAPICH is the
library for that particular combination :)
In any case, I installed MVAPICH, and I can boot the daemons, and run the
ring speed test with no problems. When I run any programs with mpirun,
however, I get an error when sending or receiving more than 8192 bytes.
For example, if I run the bandwidth test from the benchmarks page
(osu_bw.c), I get the following:
---------------------------------------------------------------
budge at burn:~/tests/testMvapich2> mpirun -np 2 ./a.out
Thursday 06:16:00
burn
burn-3
# OSU MPI Bandwidth Test v3.0
# Size Bandwidth (MB/s)
1 1.24
2 2.72
4 5.44
8 10.18
16 19.09
32 29.69
64 65.01
128 147.31
256 244.61
512 354.32
1024 367.91
2048 451.96
4096 550.66
8192 598.35
[1][ch3_rndvtransfer.c:112] Unknown protocol 0 type from rndv req to send
Internal Error: invalid error code ffffffff (Ring Index out of range) in
MPIDI_CH3_RndvSend:263
Fatal error in MPI_Waitall:
Other MPI error, error stack:
MPI_Waitall(242): MPI_Waitall(count=64, req_array=0xdb21a0,
status_array=0xdb3140) failed
(unknown)(): Other MPI error
rank 1 in job 4 burn_37156 caused collective abort of all ranks
exit status of rank 1: killed by signal 9
---------------------------------------------------------------
I get a similar problem with the latency test, however, the protocol that is
complained about is different:
--------------------------------------------------------------------
budge at burn:~/tests/testMvapich2> mpirun -np 2 ./a.out
Thursday 09:21:20
# OSU MPI Latency Test v3.0
# Size Latency (us)
0 3.93
1 4.07
2 4.06
4 3.82
8 3.98
16 4.03
32 4.00
64 4.28
128 5.22
256 5.88
512 8.65
1024 9.11
2048 11.53
4096 16.17
8192 25.67
[1][ch3_rndvtransfer.c:112] Unknown protocol 8126589 type from rndv req to
send
Internal Error: invalid error code ffffffff (Ring Index out of range) in
MPIDI_CH3_RndvSend:263
Fatal error in MPI_Recv:
Other MPI error, error stack:
MPI_Recv(186): MPI_Recv(buf=0xa8ff80, count=16384, MPI_CHAR, src=0, tag=1,
MPI_COMM_WORLD, status=0x7fff14c7bde0) failed
(unknown)(): Other MPI error
rank 1 in job 5 burn_37156 caused collective abort of all ranks
--------------------------------------------------------------------
The protocols (0 and 8126589) are consistent if I run the program multiple
times.
Anyone have any ideas? If you need more info, please let me know.
Thanks,
Brian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080103/28e0f7e4/attachment.html
More information about the mvapich-discuss
mailing list