[mvapich-discuss] unrecognized protocol for send/recv over 8KB

Brian Budge brian.budge at gmail.com
Thu Jan 3 20:46:15 EST 2008


Hi all -

I'm new to the list here... hi!  I have been using OpenMPI for a while, and
LAM before that, but new requirements keep pushing me to new
implementations.  In particular, I was interested in using infiniband (using
OFED 1.2.5.1) in a multi-threaded environment.  It seems that MVAPICH is the
library for that particular combination :)

In any case, I installed MVAPICH, and I can boot the daemons, and run the
ring speed test with no problems.  When I run any programs with mpirun,
however, I get an error when sending or receiving more than 8192 bytes.

For example, if I run the bandwidth test from the benchmarks page
(osu_bw.c), I get the following:
---------------------------------------------------------------
budge at burn:~/tests/testMvapich2> mpirun -np 2 ./a.out
Thursday 06:16:00
burn
burn-3
# OSU MPI Bandwidth Test v3.0
# Size        Bandwidth (MB/s)
1                         1.24
2                         2.72
4                         5.44
8                        10.18
16                       19.09
32                       29.69
64                       65.01
128                     147.31
256                     244.61
512                     354.32
1024                    367.91
2048                    451.96
4096                    550.66
8192                    598.35
[1][ch3_rndvtransfer.c:112] Unknown protocol 0 type from rndv req to send
Internal Error: invalid error code ffffffff (Ring Index out of range) in
MPIDI_CH3_RndvSend:263
Fatal error in MPI_Waitall:
Other MPI error, error stack:
MPI_Waitall(242): MPI_Waitall(count=64, req_array=0xdb21a0,
status_array=0xdb3140) failed
(unknown)(): Other MPI error
rank 1 in job 4  burn_37156   caused collective abort of all ranks
  exit status of rank 1: killed by signal 9
---------------------------------------------------------------

I get a similar problem with the latency test, however, the protocol that is
complained about is different:
--------------------------------------------------------------------
budge at burn:~/tests/testMvapich2> mpirun -np 2 ./a.out
Thursday 09:21:20
# OSU MPI Latency Test v3.0
# Size            Latency (us)
0                         3.93
1                         4.07
2                         4.06
4                         3.82
8                         3.98
16                        4.03
32                        4.00
64                        4.28
128                       5.22
256                       5.88
512                       8.65
1024                      9.11
2048                     11.53
4096                     16.17
8192                     25.67
[1][ch3_rndvtransfer.c:112] Unknown protocol 8126589 type from rndv req to
send
Internal Error: invalid error code ffffffff (Ring Index out of range) in
MPIDI_CH3_RndvSend:263
Fatal error in MPI_Recv:
Other MPI error, error stack:
MPI_Recv(186): MPI_Recv(buf=0xa8ff80, count=16384, MPI_CHAR, src=0, tag=1,
MPI_COMM_WORLD, status=0x7fff14c7bde0) failed
(unknown)(): Other MPI error
rank 1 in job 5  burn_37156   caused collective abort of all ranks
--------------------------------------------------------------------

The protocols (0 and 8126589) are consistent if I run the program multiple
times.

Anyone have any ideas?  If you need more info, please let me know.

Thanks,
  Brian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080103/28e0f7e4/attachment.html


More information about the mvapich-discuss mailing list