[mvapich-discuss] OSU bw test hangs

Shan-ho Tsai shtsai at uga.edu
Fri Jan 27 09:32:30 EST 2012


Hi Dr. Perkins,
Thank you so much for your quick response. We had only
installed the Open Fabrics distributed with RHEL5, but it 
sounds like we need to use the OFED+ from QLogic, which
I presume contains the PSM files. We will try that.

Thanks again!
Shan-Ho
----------------------------------------------------
Shan-Ho Tsai
University of Georgia, Athens GA

________________________________________
From: Jonathan Perkins [perkinjo at cse.ohio-state.edu]
Sent: Thursday, January 26, 2012 4:07 PM
To: Shan-ho Tsai
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] OSU bw test hangs

Hello:
It looks like you are using gen2 with QLogic.  You should use the PSM
interface (as suggested by QLogic).  Please see our user guide for
details on how to build using this interface.  Hope this helps.

http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.7.html#x1-170004.8

On Thu, Jan 26, 2012 at 3:46 PM, Shan-ho Tsai <shtsai at uga.edu> wrote:
>
> Hello,
> I compiled and installed mvapich2 1.7 using gcc 4.1.2,
> gcc 4.4.4 and PGI 11.8 on a 64-bit Linux RHEL5.7 node.
> Our Infiniband is from Qlogic, and we use the default Open
> Fabrics software distributed with RHEL5.7.
>
> The steps used to build it were, e.g.
>
> ./configure --prefix=/usr/local/mvapich2/1.7/gcc444 --with-rdma=gen2
> --enable-f77 --enable-fc --enable-cxx --enable-shared -
> -enable-sharedlibs=gcc CC=gcc44 F77=gfortran44 FC=gfortran44
> CXX=g++44
>
> (or with the compilers replaced by pgcc, pgCC, pgf77 and pgf90, etc)
>
> make
> make install
>
> In each case there were no errors in the build.
>
> The osu_benchmark tests (osu_latency and osu_bw) work
> fine within a node. But when I use 2 nodes, osu_latency
> works fine, but osu_bw just hangs after printing
>
> # OSU MPI Bandwidth Test v3.4
> # Size        Bandwidth (MB/s)
>
> The command used was
>
> /usr/local/mvapich2/1.7-r5140/gcc444-gen2/bin/mpiexec -n 2 -f host /usr/local/mvapich2/1.7-r5140/gcc444-gen2/libexec/osu-micro-benchmarks/osu_bw
>
> where 'host' has two lines with the node names
>
> nodeA
> nodeB
>
> Running the above with strace stops at
>
> read(6, "# OSU MPI Bandwidth Test v3.4\n# "..., 61) = 61
> write(1, "# OSU MPI Bandwidth Test v3.4\n# "..., 61# OSU MPI Bandwidth Test v3.4# Size        Bandwidth (MB/s)
> ) = 61
> poll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=8, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=13, events=POLLIN}, {fd=6, events=POLLIN}, {fd=0, events=POLLIN}, {fd=7, events=POLLIN}], 9, -1
>
> And 'top' on the nodes shows osu_bw using cpu time,
> but the test just hangs there.
>
> I also tried to build without the --with-rdma=gen2 option
> in config, but the same problem with the osu_bw test
> occurs. It also occurs on an older cluster (64-bit RHEL4
> Linux, with OFED 1.4). The problem also occurs with
> mvapich2 1.7-r5140 (downloaded on 1/25/12).
>
> Interestingly, mvapich2 1.6 built as above, appears to
> work fine (osu_bw gave reasonable results) on this
> cluster.
>
> Any ideas what I might be doing wrong in the installation
> and testing? Any suggestions how I can troubleshoot this?
> I'll appreciate any help.
>
> Thank you very much!
> Shan-Ho
>
> ----------------------------------------------------
> Shan-Ho Tsai
> University of Georgia, Athens GA
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



--
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo





More information about the mvapich-discuss mailing list