[mvapich-discuss] OpenFabrics

Sayantan Sur surs at cse.ohio-state.edu
Fri Aug 4 17:16:38 EDT 2006


Hi Michael,

Di Domenico, Michael wrote:

> I just built an OpenFabric v1.0 cluster of two machines on a pair of 
> Quad Proc Itanium servers. Both machines have Mellanox HCA’s with 
> Mellanox firmware.
>
> I also downloaded the OSU version of mvapich from the website instead 
> of using the bundled version.
>
> Everything compiles fine, simple cpi tests work okay, netpipe runs 
> okay, so I’m pretty sure my fabric is okay.
>
> But when I try to run osu_latency, osu_bw, or osu_bibw tests, it just 
> stalls.
>
> How can I determine where the program is stalling?
>
In order to see where the programs are stalling, you can just build 
MVAPICH with gdb (by inserting -ggdb in the CFLAGS in 
make.mvapich.gen2). After you're done with the build, run the test as 
usual. To see which function is hanging, you can just ssh to the node on 
which the test is running; find out the process id and execute the 
following commands:

$ gdb attach <PID>

(gdb) bt

This will show which function the test is hanging in.

However, there should be no hanging in the first place. Can you run 
other benchmarks like IMB etc? Also did you modify make.mvapich.gen2 at 
all before running these tests? If so, what were the flags you used?

Thanks,
Sayantan.

-- 
http://www.cse.ohio-state.edu/~surs



More information about the mvapich-discuss mailing list