[mvapich-discuss] HPL Test results

Dhabaleswar Panda panda at cse.ohio-state.edu
Fri Jul 13 23:06:22 EDT 2007


Hi Michael, 

First of all, which device you are using for MVAPICH to run with IB.
Are you using the Gen2 device or the IPoIB device. You should use the
Gen2 device (which runs over the native IB) to get the maximum
performance on IB.  Please refer to the MVAPICH user guide (accessible
from MVAPICH web page - > support -> user guide) about how to
configure and run MVAPICH on IB using the Gen2 device.

Before running HPL, please make sure to run the standard OSU
benchmarks and check the performance numbers to make sure that things
are running on the Gen2 device. This should also show the performance
difference between IB and Ethernet.

Also, MVAPICH 0.9.7 version is two years old. A lot of features and
optimizations have been added to MVAPICH since then. I will strongly
recommend you to use the latest released 0.9.9 version. It is
accessible from MVAPICH web page. It is also available with the latest
OFED 1.2 stack.

Thanks, 

DK


> We are currently running HPL tests on a 38 node cluster, each node consisting of Xeon 3.4 Ghz (dual core), 4G RAM, Ethernet and IB interface. For IB we are using Mellanox Cards: Dual 4X IB Port MT25208 Infinihost III Ex. (we are using only 1 of the 2 available IB ports on each node)
> 
> We are using mvapich 0.9.7 for the IB mpi runs, and mpich 1.2.7 for the mpi runs over ethernet.
> 
> Separate HPL binaries were compiled for both IB and Ethernet interconnects using the appropriate mpicc's found in their respective bin folders. (ie. Mvapich/bin/mpicc , mpich/bin/mpicc). Note that HPL was compiled using GotoBLAS rather than the CBLAS.
> 
> RPeak = Clock * #Cores * Flops/Cycle/Core
> 	= 3.4 * 2 * 2
> 
> HPL:
> 
> Ns=SQRT(0.8*NodeBytes*Nodes/8)
> Nb=160
> 
> Below are the initial results of the tests: 
> 
> Proc	Ns	GFLOPS-IB	GFLOPS-ETH	Rpeak	 IB % Efficiency	Eth % Efficiency
> 1   20353   5.87        11.30       6.8         86.3          166.17
> 2   20353   10.09       9.95        13.6        74.19         73.14
> 4   28784   22.33       20.11       27.2        82.1          73.93
> 8   40707   40.35       34.59       54.4        74.17         63.58
> 16  57568   65.01       63.34       108.8       59.75         58.22
> 32  81414   74.57       68.52       217.6       34.27         31.49
> 64  115137  129.80      131.10      435.2       29.83         30.13
> 
> Does anyone have any ideas as to possible reasons for the above results?
> Suggestions of avenues that should be investigated?
> 
> Questions:
> 
> 1. On a single processor run, ethernet surpasses the theoretical maximum (rpeak). How is this possible? I was under the impression that interconnects are not being utilized for single proc runs, so how is it that the IB and Ethernet results are so drastically different? Also notice that Ethernet beats out IB on the 64 proc run.
> 
> 2. How is it that IB is only slightly better than Ethernet for proc runs: 2,4,8,16,32?
> 
> 
> - Michael
> 
> NOTICE:
> This message may contain privileged or otherwise confidential information.
> If you are not the intended recipient, please immediately advise the sender
> by reply email and delete the message and any attachments without using,
> copying or disclosing the contents.
> 
> 
> 
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> 



More information about the mvapich-discuss mailing list