[mvapich-discuss] Infiniband problems

Jens Glaser jglaser at umn.edu
Tue Nov 6 12:51:42 EST 2012


Hi,

On Nov 3, 2012, at 11:26 PM, sreeram potluri wrote:
> 
> Can you give us some more details of how you are running these tests? 
> 
> From your earlier email, we assume you are running them inter-node. 
Yes, but I will test if the same failure occurs intra-node (cluster is down at the moment, so can't test)

> The following information will be helpful:

> the parameters mvapich2 was configured with (output from <mvapich2-install>/bin/mpiname -a)

The output is

MVAPICH2 1.9a Sat Sep  8 15:01:35 EDT 2012 ch3:mrail

Compilation
CC: gcc    -g -DNDEBUG -DNVALGRIND -O2
CXX: g++   -g -DNDEBUG -DNVALGRIND -O2
F77: gfortran   -g -O2 
FC: gfortran   -g -O2

Configuration
--with-cuda=/sw/kfs/cuda/4.2/linux_binary/ --with-cuda-libpath=/sw/kfs/cuda/4.2/linux_binary/lib64 --with-cuda-include=/sw/kfs/cuda/4.2/linux_binary/include/ --prefix=/nics/d/home/jglaser/local-kfs --enable-shared --enable-cuda --enable-g=dbg,meminit


> the complete command used to run the benchmark including any runtime parameters. . 
> 

I didn't use runtime any runtime parameters, but just ran
mpirun -np 2 ./osu_bibw
mpirun -np 2 ./osu_bw 
or

mpirun -np 2 ./osu_bibw H H
mpirun -np 2 ./osu_bw H H

and it didn't produce any output. When the cluster is up again, I will also set MV2_USE_CUDA=1 and try again.

Jens

> Best
> Sreeram Potluri
> 
> On Sat, Nov 3, 2012 at 9:47 PM, Jens Glaser <jglaser at umn.edu> wrote:
> Hi,
> 
> I suspect I am having some trouble with Infiniband support  on the cluster (keeneland final system) I am using.
> 
> The latency tests run, but osu_bibw and osu_bw hang.
> 
> The system has Mellanox FDR adapters.
> 
> This is the information:
> 
> $ ibstat
> CA 'mlx4_0'
>         CA type: MT4099
>         Number of ports: 2
>         Firmware version: 2.10.5380
>         Hardware version: 0
>         Node GUID: 0x0002c903003ff800
>         System image GUID: 0x0002c903003ff803
>         Port 1:
>                 State: Active
>                 Physical state: LinkUp
>                 Rate: 56
>                 Base lid: 64
>                 LMC: 0
>                 SM lid: 264
>                 Capability mask: 0x02514868
>                 Port GUID: 0x0002c903003ff801
>                 Link layer: InfiniBand
>         Port 2:
>                 State: Down
>                 Physical state: Disabled
>                 Rate: 40
>                 Base lid: 0
>                 LMC: 0
>                 SM lid: 0
>                 Capability mask: 0x02514868
>                 Port GUID: 0x0002c903003ff802
>                 Link layer: InfiniBand
> 
> The library was configured with CUDA support, I am using the latest version (1.9a).
> 
> Any ideas?
> 
> Jens
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20121106/d3d38528/attachment.html


More information about the mvapich-discuss mailing list