[mvapich-discuss] OSU bw test hangs
Shan-ho Tsai
shtsai at uga.edu
Thu Jan 26 15:46:23 EST 2012
Hello,
I compiled and installed mvapich2 1.7 using gcc 4.1.2,
gcc 4.4.4 and PGI 11.8 on a 64-bit Linux RHEL5.7 node.
Our Infiniband is from Qlogic, and we use the default Open
Fabrics software distributed with RHEL5.7.
The steps used to build it were, e.g.
./configure --prefix=/usr/local/mvapich2/1.7/gcc444 --with-rdma=gen2
--enable-f77 --enable-fc --enable-cxx --enable-shared -
-enable-sharedlibs=gcc CC=gcc44 F77=gfortran44 FC=gfortran44
CXX=g++44
(or with the compilers replaced by pgcc, pgCC, pgf77 and pgf90, etc)
make
make install
In each case there were no errors in the build.
The osu_benchmark tests (osu_latency and osu_bw) work
fine within a node. But when I use 2 nodes, osu_latency
works fine, but osu_bw just hangs after printing
# OSU MPI Bandwidth Test v3.4
# Size Bandwidth (MB/s)
The command used was
/usr/local/mvapich2/1.7-r5140/gcc444-gen2/bin/mpiexec -n 2 -f host /usr/local/mvapich2/1.7-r5140/gcc444-gen2/libexec/osu-micro-benchmarks/osu_bw
where 'host' has two lines with the node names
nodeA
nodeB
Running the above with strace stops at
read(6, "# OSU MPI Bandwidth Test v3.4\n# "..., 61) = 61
write(1, "# OSU MPI Bandwidth Test v3.4\n# "..., 61# OSU MPI Bandwidth Test v3.4# Size Bandwidth (MB/s)
) = 61
poll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=13, events=POLLIN}, {fd=6, events=POLLIN}, {fd=0, events=POLLIN}, {fd=7, events=POLLIN}], 9, -1
And 'top' on the nodes shows osu_bw using cpu time,
but the test just hangs there.
I also tried to build without the --with-rdma=gen2 option
in config, but the same problem with the osu_bw test
occurs. It also occurs on an older cluster (64-bit RHEL4
Linux, with OFED 1.4). The problem also occurs with
mvapich2 1.7-r5140 (downloaded on 1/25/12).
Interestingly, mvapich2 1.6 built as above, appears to
work fine (osu_bw gave reasonable results) on this
cluster.
Any ideas what I might be doing wrong in the installation
and testing? Any suggestions how I can troubleshoot this?
I'll appreciate any help.
Thank you very much!
Shan-Ho
----------------------------------------------------
Shan-Ho Tsai
University of Georgia, Athens GA
More information about the mvapich-discuss
mailing list