[mvapich-discuss] HPL Test results

Mon Jul 16 00:39:43 EDT 2007

Many thanks for the suggestions DK...

Here is the output via 'mpirun_rsh -v':

OSU MVAPICH VERSION 0.9.7-mlx2.2.0 SingleRail
Build-ID: 646 TAG=mvapich-0.9.7-mlx2.2.0_20-09-2006-13_10

I am also explicitly declaring the device I want to use via the environ variable VIADEV_DEVICE:

mpirun_rsh -ssh -np 8 -hostfile machines VIADEV_DEVICE=mthca0 ~/hpl_libgoto_ib/bin/ib.gcc/xhpl

Here is the output of mpichversion:

./mpichversion
MPICH Version:          1.2.7
MPICH Release date:     $Date: 2005/06/22 16:33:49$
MPICH Patches applied:  none
MPICH configure:        --enable-sharedlib --with-device=ch_gen2 --with-arch=LINUX -prefix=/var/tmp/OFED///usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0 --enable-f77 --enable-f90 -lib=-L/var/tmp/OFED//usr/local/ofed/lib64 -libverbs -lpthread
MPICH Device:           ch_gen2

Results of osu_benchmark/osu_latency (using mvapich):

# OSU MPI Latency Test (Version 2.2)
# Size          Latency (us)
0               0.81
1               0.82
2               0.82
4               0.83
8               0.77
16              0.80
32              0.85
64              0.82
128             0.87
256             1.02
512             1.32
1024            1.99
2048            3.27
4096            5.76
8192            10.76
16384           20.70
32768           40.73
65536           80.96
131072          142.71
262144          488.59
524288          1086.96
1048576         2161.83
2097152         4293.26
4194304         8605.62

- Michael

-----Original Message-----
From: mvapich-discuss-bounces at cse.ohio-state.edu [mailto:mvapich-discuss-bounces at cse.ohio-state.edu] On Behalf Of Dhabaleswar Panda
Sent: Friday, July 13, 2007 10:06 PM
To: Michael Zebrowski
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] HPL Test results

Hi Michael, 

First of all, which device you are using for MVAPICH to run with IB.
Are you using the Gen2 device or the IPoIB device. You should use the
Gen2 device (which runs over the native IB) to get the maximum
performance on IB.  Please refer to the MVAPICH user guide (accessible
from MVAPICH web page - > support -> user guide) about how to
configure and run MVAPICH on IB using the Gen2 device.

Before running HPL, please make sure to run the standard OSU
benchmarks and check the performance numbers to make sure that things
are running on the Gen2 device. This should also show the performance
difference between IB and Ethernet.

Also, MVAPICH 0.9.7 version is two years old. A lot of features and
optimizations have been added to MVAPICH since then. I will strongly
recommend you to use the latest released 0.9.9 version. It is
accessible from MVAPICH web page. It is also available with the latest
OFED 1.2 stack.

Thanks, 

DK

> We are currently running HPL tests on a 38 node cluster, each node consisting of Xeon 3.4 Ghz (dual core), 4G RAM, Ethernet and IB interface. For IB we are using Mellanox Cards: Dual 4X IB Port MT25208 Infinihost III Ex. (we are using only 1 of the 2 available IB ports on each node)
> 
> We are using mvapich 0.9.7 for the IB mpi runs, and mpich 1.2.7 for the mpi runs over ethernet.
> 
> Separate HPL binaries were compiled for both IB and Ethernet interconnects using the appropriate mpicc's found in their respective bin folders. (ie. Mvapich/bin/mpicc , mpich/bin/mpicc). Note that HPL was compiled using GotoBLAS rather than the CBLAS.
> 
> RPeak = Clock * #Cores * Flops/Cycle/Core
> 	= 3.4 * 2 * 2
> 
> HPL:
> 
> Ns=SQRT(0.8*NodeBytes*Nodes/8)
> Nb=160
> 
> Below are the initial results of the tests: 
> 
> Proc	Ns	GFLOPS-IB	GFLOPS-ETH	Rpeak	 IB % Efficiency	Eth % Efficiency
> 1   20353   5.87        11.30       6.8         86.3          166.17
> 2   20353   10.09       9.95        13.6        74.19         73.14
> 4   28784   22.33       20.11       27.2        82.1          73.93
> 8   40707   40.35       34.59       54.4        74.17         63.58
> 16  57568   65.01       63.34       108.8       59.75         58.22
> 32  81414   74.57       68.52       217.6       34.27         31.49
> 64  115137  129.80      131.10      435.2       29.83         30.13
> 
> Does anyone have any ideas as to possible reasons for the above results?
> Suggestions of avenues that should be investigated?
> 
> Questions:
> 
> 1. On a single processor run, ethernet surpasses the theoretical maximum (rpeak). How is this possible? I was under the impression that interconnects are not being utilized for single proc runs, so how is it that the IB and Ethernet results are so drastically different? Also notice that Ethernet beats out IB on the 64 proc run.
> 
> 2. How is it that IB is only slightly better than Ethernet for proc runs: 2,4,8,16,32?
> 
> 
> - Michael
> 
> NOTICE:
> This message may contain privileged or otherwise confidential information.
> If you are not the intended recipient, please immediately advise the sender
> by reply email and delete the message and any attachments without using,
> copying or disclosing the contents.
> 
> 
> 
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> 

_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

NOTICE:
This message may contain privileged or otherwise confidential information.
If you are not the intended recipient, please immediately advise the sender
by reply email and delete the message and any attachments without using,
copying or disclosing the contents.