[mvapich-discuss] MPI over hybrid infiniband cards

MICHAEL S DAVIS msdavis at s383.jpl.nasa.gov
Fri Apr 13 16:41:47 EDT 2012


Hello,

We have just upgraded out SGI ICE 8400 EX from 768 cores to over 1800 
cores.  The old system had an InfiniBand: Mellanox Technologies MT26428 
card which looks like this with ibstat:

1i0n0:~ # ibstat
CA 'mlx4_0'
        CA type: MT26428
        Number of ports: 2
        Firmware version: 2.7.0
        Hardware version: b0
        Node GUID: 0x003048fffff09498
        System image GUID: 0x003048fffff0949b
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 53
                LMC: 0
                SM lid: 84
                Capability mask: 0x02510868
                Port GUID: 0x003048fffff09499
                Link layer: IB
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 282
                LMC: 0
                SM lid: 76
                Capability mask: 0x02510868
                Port GUID: 0x003048fffff0949a
                Link layer: IB
r1i0n0:~ #   

The new cards have a different chipset and are supposed to be same, but 
look like this when we run ibstat:
r2i0n0:~ # ibstat
CA 'mlx4_0'
        CA type: MT26428
        Number of ports: 1
        Firmware version: 2.7.200
        Hardware version: b0
        Node GUID: 0x003048fffff4f18c
        System image GUID: 0x003048fffff4f18f
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 146
                LMC: 0
                SM lid: 84
                Capability mask: 0x02510868
                Port GUID: 0x003048fffff4f18d
                Link layer: IB
CA 'mlx4_1'
        CA type: MT26428
        Number of ports: 1
        Firmware version: 2.7.200
        Hardware version: b0
        Node GUID: 0x003048fffff4f188
        System image GUID: 0x003048fffff4f18b
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 220
                LMC: 0
                SM lid: 76
                Capability mask: 0x02510868
                Port GUID: 0x003048fffff4f189
                Link layer: IB
r2i0n0:~ # 

Instead of having one card (mlx4_0) with 2 ports, the new card look like 
two cards with one port each (mlx4_0 and mlx4_1)

I have been using mvapich2 1.5.1p1 for over a year and anything compiled 
and run on the old cards or the new cards works, but if they run on a 
combination of the two either fail with MPI INIT Failed or run forever.

I have tried mvapich2 1.7 latest and mvapich2 1.8 and neither one seems 
to work.  Software compiled with SGI's MPT or openmpi seem to work 
across the cards.

I also tried forcing the MPI runs on mlx4_0 port 1 with the following 
environment variables
setenv          MV2_IBA_HCA         mlx4_0
setenv          MV2_DEFAULT_PORT    1

But it doesn't seem to work.

Any ideas what I could be doing wrong or what I might try to fix this 
problem would be greatly appreciated.

thanks
Mike



More information about the mvapich-discuss mailing list