[mvapich-discuss] OpenFabrics

Di Domenico, Michael mdidomenico at silverstorm.com
Fri Aug 4 17:40:36 EDT 2006


Looks like xhpl won't run either.  Looking at the switch it sends about
40 packets and hangs.  No errors on the switch, but this comes out in
dmesg on the host.

xhpl(7110): unaligned access to 0x600000000028402c,
ip=0x400000000009e9f1
xhpl(7110): unaligned access to 0x6000000000284054,
ip=0x400000000009ea00
xhpl(7110): unaligned access to 0x6000000000284024,
ip=0x400000000009ea10
xhpl(7110): unaligned access to 0x6000000000284034,
ip=0x400000000009ea11
xhpl(7110): floating-point assist fault at ip 2000000000186c12, isr
0000020000001001
xhpl(7110): floating-point assist fault at ip 40000000000203b1, isr
0000020000000008
xhpl(7110): floating-point assist fault at ip 4000000000020721, isr
0000020000000008
xhpl(7110): floating-point assist fault at ip 40000000000207f1, isr
0000020000000008
[root at tse82 ~]#

-----Original Message-----
From: Sayantan Sur [mailto:surs at cse.ohio-state.edu] 
Sent: Friday, August 04, 2006 5:17 PM
To: Di Domenico, Michael
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] OpenFabrics

Hi Michael,

Di Domenico, Michael wrote:

> I just built an OpenFabric v1.0 cluster of two machines on a pair of 
> Quad Proc Itanium servers. Both machines have Mellanox HCA's with 
> Mellanox firmware.
>
> I also downloaded the OSU version of mvapich from the website instead 
> of using the bundled version.
>
> Everything compiles fine, simple cpi tests work okay, netpipe runs 
> okay, so I'm pretty sure my fabric is okay.
>
> But when I try to run osu_latency, osu_bw, or osu_bibw tests, it just 
> stalls.
>
> How can I determine where the program is stalling?
>
In order to see where the programs are stalling, you can just build 
MVAPICH with gdb (by inserting -ggdb in the CFLAGS in 
make.mvapich.gen2). After you're done with the build, run the test as 
usual. To see which function is hanging, you can just ssh to the node on

which the test is running; find out the process id and execute the 
following commands:

$ gdb attach <PID>

(gdb) bt

This will show which function the test is hanging in.

However, there should be no hanging in the first place. Can you run 
other benchmarks like IMB etc? Also did you modify make.mvapich.gen2 at 
all before running these tests? If so, what were the flags you used?

Thanks,
Sayantan.

-- 
http://www.cse.ohio-state.edu/~surs




More information about the mvapich-discuss mailing list