[mvapich-discuss] HPL Test results
Michael Zebrowski
MZebrowski at x-iss.com
Mon Jul 16 00:39:43 EDT 2007
Many thanks for the suggestions DK...
Here is the output via 'mpirun_rsh -v':
OSU MVAPICH VERSION 0.9.7-mlx2.2.0 SingleRail
Build-ID: 646 TAG=mvapich-0.9.7-mlx2.2.0_20-09-2006-13_10
I am also explicitly declaring the device I want to use via the environ variable VIADEV_DEVICE:
mpirun_rsh -ssh -np 8 -hostfile machines VIADEV_DEVICE=mthca0 ~/hpl_libgoto_ib/bin/ib.gcc/xhpl
Here is the output of mpichversion:
./mpichversion
MPICH Version: 1.2.7
MPICH Release date: $Date: 2005/06/22 16:33:49$
MPICH Patches applied: none
MPICH configure: --enable-sharedlib --with-device=ch_gen2 --with-arch=LINUX -prefix=/var/tmp/OFED///usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0 --enable-f77 --enable-f90 -lib=-L/var/tmp/OFED//usr/local/ofed/lib64 -libverbs -lpthread
MPICH Device: ch_gen2
Results of osu_benchmark/osu_latency (using mvapich):
# OSU MPI Latency Test (Version 2.2)
# Size Latency (us)
0 0.81
1 0.82
2 0.82
4 0.83
8 0.77
16 0.80
32 0.85
64 0.82
128 0.87
256 1.02
512 1.32
1024 1.99
2048 3.27
4096 5.76
8192 10.76
16384 20.70
32768 40.73
65536 80.96
131072 142.71
262144 488.59
524288 1086.96
1048576 2161.83
2097152 4293.26
4194304 8605.62
- Michael
-----Original Message-----
From: mvapich-discuss-bounces at cse.ohio-state.edu [mailto:mvapich-discuss-bounces at cse.ohio-state.edu] On Behalf Of Dhabaleswar Panda
Sent: Friday, July 13, 2007 10:06 PM
To: Michael Zebrowski
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] HPL Test results
Hi Michael,
First of all, which device you are using for MVAPICH to run with IB.
Are you using the Gen2 device or the IPoIB device. You should use the
Gen2 device (which runs over the native IB) to get the maximum
performance on IB. Please refer to the MVAPICH user guide (accessible
from MVAPICH web page - > support -> user guide) about how to
configure and run MVAPICH on IB using the Gen2 device.
Before running HPL, please make sure to run the standard OSU
benchmarks and check the performance numbers to make sure that things
are running on the Gen2 device. This should also show the performance
difference between IB and Ethernet.
Also, MVAPICH 0.9.7 version is two years old. A lot of features and
optimizations have been added to MVAPICH since then. I will strongly
recommend you to use the latest released 0.9.9 version. It is
accessible from MVAPICH web page. It is also available with the latest
OFED 1.2 stack.
Thanks,
DK
> We are currently running HPL tests on a 38 node cluster, each node consisting of Xeon 3.4 Ghz (dual core), 4G RAM, Ethernet and IB interface. For IB we are using Mellanox Cards: Dual 4X IB Port MT25208 Infinihost III Ex. (we are using only 1 of the 2 available IB ports on each node)
>
> We are using mvapich 0.9.7 for the IB mpi runs, and mpich 1.2.7 for the mpi runs over ethernet.
>
> Separate HPL binaries were compiled for both IB and Ethernet interconnects using the appropriate mpicc's found in their respective bin folders. (ie. Mvapich/bin/mpicc , mpich/bin/mpicc). Note that HPL was compiled using GotoBLAS rather than the CBLAS.
>
> RPeak = Clock * #Cores * Flops/Cycle/Core
> = 3.4 * 2 * 2
>
> HPL:
>
> Ns=SQRT(0.8*NodeBytes*Nodes/8)
> Nb=160
>
> Below are the initial results of the tests:
>
> Proc Ns GFLOPS-IB GFLOPS-ETH Rpeak IB % Efficiency Eth % Efficiency
> 1 20353 5.87 11.30 6.8 86.3 166.17
> 2 20353 10.09 9.95 13.6 74.19 73.14
> 4 28784 22.33 20.11 27.2 82.1 73.93
> 8 40707 40.35 34.59 54.4 74.17 63.58
> 16 57568 65.01 63.34 108.8 59.75 58.22
> 32 81414 74.57 68.52 217.6 34.27 31.49
> 64 115137 129.80 131.10 435.2 29.83 30.13
>
> Does anyone have any ideas as to possible reasons for the above results?
> Suggestions of avenues that should be investigated?
>
> Questions:
>
> 1. On a single processor run, ethernet surpasses the theoretical maximum (rpeak). How is this possible? I was under the impression that interconnects are not being utilized for single proc runs, so how is it that the IB and Ethernet results are so drastically different? Also notice that Ethernet beats out IB on the 64 proc run.
>
> 2. How is it that IB is only slightly better than Ethernet for proc runs: 2,4,8,16,32?
>
>
> - Michael
>
> NOTICE:
> This message may contain privileged or otherwise confidential information.
> If you are not the intended recipient, please immediately advise the sender
> by reply email and delete the message and any attachments without using,
> copying or disclosing the contents.
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
NOTICE:
This message may contain privileged or otherwise confidential information.
If you are not the intended recipient, please immediately advise the sender
by reply email and delete the message and any attachments without using,
copying or disclosing the contents.
More information about the mvapich-discuss
mailing list