[mvapich-discuss] Infiniband problems
Jens Glaser
jglaser at umn.edu
Tue Nov 6 12:51:42 EST 2012
Hi,
On Nov 3, 2012, at 11:26 PM, sreeram potluri wrote:
>
> Can you give us some more details of how you are running these tests?
>
> From your earlier email, we assume you are running them inter-node.
Yes, but I will test if the same failure occurs intra-node (cluster is down at the moment, so can't test)
> The following information will be helpful:
> the parameters mvapich2 was configured with (output from <mvapich2-install>/bin/mpiname -a)
The output is
MVAPICH2 1.9a Sat Sep 8 15:01:35 EDT 2012 ch3:mrail
Compilation
CC: gcc -g -DNDEBUG -DNVALGRIND -O2
CXX: g++ -g -DNDEBUG -DNVALGRIND -O2
F77: gfortran -g -O2
FC: gfortran -g -O2
Configuration
--with-cuda=/sw/kfs/cuda/4.2/linux_binary/ --with-cuda-libpath=/sw/kfs/cuda/4.2/linux_binary/lib64 --with-cuda-include=/sw/kfs/cuda/4.2/linux_binary/include/ --prefix=/nics/d/home/jglaser/local-kfs --enable-shared --enable-cuda --enable-g=dbg,meminit
> the complete command used to run the benchmark including any runtime parameters. .
>
I didn't use runtime any runtime parameters, but just ran
mpirun -np 2 ./osu_bibw
mpirun -np 2 ./osu_bw
or
mpirun -np 2 ./osu_bibw H H
mpirun -np 2 ./osu_bw H H
and it didn't produce any output. When the cluster is up again, I will also set MV2_USE_CUDA=1 and try again.
Jens
> Best
> Sreeram Potluri
>
> On Sat, Nov 3, 2012 at 9:47 PM, Jens Glaser <jglaser at umn.edu> wrote:
> Hi,
>
> I suspect I am having some trouble with Infiniband support on the cluster (keeneland final system) I am using.
>
> The latency tests run, but osu_bibw and osu_bw hang.
>
> The system has Mellanox FDR adapters.
>
> This is the information:
>
> $ ibstat
> CA 'mlx4_0'
> CA type: MT4099
> Number of ports: 2
> Firmware version: 2.10.5380
> Hardware version: 0
> Node GUID: 0x0002c903003ff800
> System image GUID: 0x0002c903003ff803
> Port 1:
> State: Active
> Physical state: LinkUp
> Rate: 56
> Base lid: 64
> LMC: 0
> SM lid: 264
> Capability mask: 0x02514868
> Port GUID: 0x0002c903003ff801
> Link layer: InfiniBand
> Port 2:
> State: Down
> Physical state: Disabled
> Rate: 40
> Base lid: 0
> LMC: 0
> SM lid: 0
> Capability mask: 0x02514868
> Port GUID: 0x0002c903003ff802
> Link layer: InfiniBand
>
> The library was configured with CUDA support, I am using the latest version (1.9a).
>
> Any ideas?
>
> Jens
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20121106/d3d38528/attachment.html
More information about the mvapich-discuss
mailing list