[mvapich-discuss] Re: How to check if MVAPICH is using IB network but not ethernetwork?

Divi Venkateswarlu divi at ncat.edu
Sat Jun 7 09:07:12 EDT 2008


   
       one more to add:

        My IB hardware         8-port Flextronics SDR switch
                                     MHES18   Mallanox HCA cards

       ibchecknet shows the following
   
 [root at divilab bin]# ibchecknet

# Checking Ca: nodeguid 0x0002c90200244228

# Checking Ca: nodeguid 0x0002c90200244230

# Checking Ca: nodeguid 0x0002c902002740fc

# Checking Ca: nodeguid 0x0002c902002441a4

# Checking Ca: nodeguid 0x0002c902002441c4

# Checking Ca: nodeguid 0x0002c9020024422c

# Checking Ca: nodeguid 0x0002c902002441ac

# Checking Ca: nodeguid 0x0002c9020024418c

## Summary: 9 nodes checked, 0 bad nodes found
##          16 ports checked, 0 bad ports found
##          0 ports have errors beyond threshold

  ----- Original Message ----- 
  From: Divi Venkateswarlu 
  To: mvapich-discuss at cse.ohio-state.edu 
  Sent: Saturday, June 07, 2008 9:02 AM
  Subject: How to check if MVAPICH is using IB network but not ethernetwork?



         Hello all:
    
         Good morning! 
         I set up a 64-core cluster based on ROCKS-5.0 using eight Dell PE2900 boxes.
         All are dual-processor QC machines.

         compiled MVAPICH-1.0 (using intel compiler) with default parameters in make.mvapich.gen2
         IB stack is OFED-1.2.5.5. 

          My MD program (PMEMD/AMBER) is compiled with no errors with IFORT/MKL libraries and
         I could run the code on all 64 cores, but the scaling from 16 to 32 to 64 is terrible. I am inclosing
         the benchmarks on a test run.

                # of CPUs/cores   Time (sec)         Nodes (load-balanced)      Scaling (%)
                   8                            82                      8                                        100 
                 16                            49                      8                                          84            
                32                            42                      8                                           49
                64                            39                      8                                           26

             In contrast, on single box, I get a reasonable scaling.

         # cores     time (sec)
            2              284    (100%)
            4              164   (87%
            8              107    (65%)

         For some reason, I suspect, MPI traffic is not going over IB net.

             MVAPICH is built using make.mvapich.gen2 with F77=ifort and CC=gcc

       mpif77 -link_info is:

       /state/partition1/fc91052/bin/ifort -L/usr/local/ofed/lib64 -L/usr/local/mvapich/lib 
       -lmpich -L/usr/local/ofed/lib64 -Wl,-rpath=/usr/local/ofed/lib64 -libverbs  
      -libumad -lpthread -lpthread -lrt


       How can I be sure that MPI traffic is going through IB network rather than ethernet?
       Are there any specific checks I should perform?

       Thanks a lot for your help.

        Divi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080607/a86b1dc3/attachment.html


More information about the mvapich-discuss mailing list