[mvapich-discuss] Announcing the release of MVAPICH2 0.9.5 with SRQ, intergrated multi-rail and TotalView support

Thu Aug 31 10:37:53 EDT 2006

Morning all,

I've installed MVAPICH2 0.9.5 on my system, (nodes are dual P4 3.4GHz, PCI-E
SDR 2-port cards)

I can't seem to get the multi-rail functionality working. It works, however,
with the latest MVAPICH distribution on this system.

Here are the mpich[2]version outputs on the two builds:

MVAPICH2:

Version:           1.0.3
Device:            osu_ch3:mrail
Configure Options: --prefix=/share/apps/mvapich2 --with-device=osu_ch3:mrail
--with-rdma=vapi --with-pm=mpd --disable-romio --enable-sharedlibs=gcc
--with-mpe

MVAPICH (Multi-rail):

MPICH Version:          1.2.7
MPICH Release date:     $Date: 2005/06/22 16:33:49$
MPICH Patches applied:  none
MPICH configure:        --with-device=vapi_multirail --with-arch=LINUX
-prefix=/share/apps/mvapich_mr --without-romio --with-mpe --enable-sharedlib
-lib=-L/usr/lib64 -lmtl_common -lvapi -lmosal -lmpga -lpthread
MPICH Device:           vapi_multirail

Here are the IMB ping-pong invocations and results:

MVAPICH2:

> [eborisch at rt2 IMB-MPI]$ mpiexec -n 2 -env NUM_HCAS 1 -env NUM_PORTS 2
> ./IMB-MPI1_mvapich2_sharedlib pingpong
> #---------------------------------------------------
> #    Intel (R) MPI Benchmark Suite V2.3, MPI-1 part
> #---------------------------------------------------
> # Date       : Thu Aug 31 09:24:47 2006
> # Machine    : x86_64# System     : Linux
> # Release    : 2.6.9-22.ELsmp
> # Version    : #1 SMP Sat Oct 8 21:32:36 BST 2005
>
> #
> # Minimum message length in bytes:   0
> # Maximum message length in bytes:   4194304
> #
> # MPI_Datatype                   :   MPI_BYTE
> # MPI_Datatype for reductions    :   MPI_FLOAT
> # MPI_Op                         :   MPI_SUM
> #
> #
>
> # List of Benchmarks to run:
>
> # PingPong
>
> #---------------------------------------------------
> # Benchmarking PingPong
> # #processes = 2
> #---------------------------------------------------
>        #bytes #repetitions      t[usec]   Mbytes/sec
>             0         1000         3.99         0.00
>             1         1000         4.21         0.23
>             2         1000         4.09         0.47
>             4         1000         4.15         0.92
>             8         1000         4.21         1.81
>            16         1000         4.26         3.59
>            32         1000         4.31         7.09
>            64         1000         4.49        13.60
>           128         1000         4.70        25.95
>           256         1000         5.24        46.62
>           512         1000         6.57        74.36
>          1024         1000         8.01       121.86
>          2048         1000         9.50       205.52
>          4096         1000        12.69       307.80
>          8192         1000        24.90       313.75
>         16384         1000        33.25       469.99
>         32768         1000        50.26       621.73
>         65536          640        84.34       741.05
>        131072          320       152.60       819.16
>        262144          160       289.14       864.64
>        524288           80       561.59       890.33
>       1048576           40      1105.97       904.18
>       2097152           20      2199.28       909.39
>       4194304           10      4377.79       913.70
>

MVAPICH (Multi-Rail):

> /share/apps/mvapich_mr/bin/mpirun -np 2 -machinefile machfile
> IMB-MPI1_mvapich_mr pingpong
> #---------------------------------------------------
> #    Intel (R) MPI Benchmark Suite V2.3, MPI-1 part
> #---------------------------------------------------
> # Date       : Thu Aug 31 09:25:13 2006
> # Machine    : x86_64# System     : Linux
> # Release    : 2.6.9-22.ELsmp
> # Version    : #1 SMP Sat Oct 8 21:32:36 BST 2005
>
> #
> # Minimum message length in bytes:   0
> # Maximum message length in bytes:   4194304
> #
> # MPI_Datatype                   :   MPI_BYTE
> # MPI_Datatype for reductions    :   MPI_FLOAT
> # MPI_Op                         :   MPI_SUM
> #
> #
>
> # List of Benchmarks to run:
>
> # PingPong
>
> #---------------------------------------------------
> # Benchmarking PingPong
> # #processes = 2
> #---------------------------------------------------
>        #bytes #repetitions      t[usec]   Mbytes/sec
>             0         1000         4.36         0.00
>             1         1000         4.56         0.21
>             2         1000         4.48         0.43
>             4         1000         4.33         0.88
>             8         1000         4.77         1.60
>            16         1000         4.74         3.22
>            32         1000         4.50         6.79
>            64         1000         4.97        12.28
>           128         1000         4.82        25.30
>           256         1000         5.28        46.27
>           512         1000         6.65        73.47
>          1024         1000         7.93       123.18
>          2048         1000         9.55       204.58
>          4096         1000        12.99       300.76
>          8192         1000        28.35       275.55
>         16384         1000        33.36       468.32
>         32768         1000        44.43       703.39
>         65536          640        66.83       935.16
>        131072          320       111.99      1116.20
>        262144          160       200.89      1244.44
>        524288           80       378.64      1320.52
>       1048576           40       736.00      1358.70
>       2097152           20      1448.02      1381.19
>       4194304           10      2897.55      1380.48
>

Thanks in advance for any suggestions!
  Eric Borisch
  Mayo Clinic - Radiology Research

On 8/30/06, Dhabaleswar Panda <panda at cse.ohio-state.edu> wrote:
> The MVAPICH team is pleased to announce the availability of MVAPICH2
> 0.9.5 with the following NEW features:
>
>  - Shared Receive Queue (SRQ) and Adaptive RDMA support: These
>    features reduce memory usage of the MPI library significantly to
>    provide scalability without any degradation in performance.
>
>    Performance of applications and memory scalability using SRQ
>    and Adaptive RDMA support can be seen by visiting the following
>    URL:
>
>    http://nowlab.cse.ohio-state.edu/projects/mpi-iba/perf-apps.html
>
>  - Integrated multi-rail communication support for both two-sided and
>    one-sided operations
>      - Multiple queue pairs per port
>      - Multiple ports per adapter
>      - Multiple adapters
>
>  - Support for TotalView debugger
>
>  - Auto-detection of Architecture and InfiniBand adapters
>
> More details on all features and supported platforms can be obtained
> by visiting the following URL:
>
> http://nowlab.cse.ohio-state.edu/projects/mpi-iba/mvapich2_features.html
>
> MVAPICH2 0.9.5 continues to deliver excellent performance.  Sample
> performance numbers include:
>
>   - OpenIB/Gen2 on EM64T with PCI-Ex and IBA-DDR:
>       Two-sided operations:
>         - 2.97 microsec one-way latency (4 bytes)
>         - 1478 MB/sec unidirectional bandwidth
>         - 2658 MB/sec bidirectional bandwidth
>
>       One-sided operations:
>         - 5.08 microsec Put latency
>         - 1484 MB/sec unidirectional Put bandwidth
>         - 2658 MB/sec bidirectional Put bandwidth
>
>   - OpenIB/Gen2 on EM64T with PCI-Ex and IBA-DDR (Dual-rail):
>       Two-sided operations:
>         - 3.01 microsec one-way latency (4 bytes)
>         - 2346 MB/sec unidirectional bandwidth
>         - 2779 MB/sec bidirectional bandwidth
>
>       One-sided operations:
>         - 4.70 microsec Put latency
>         - 2389 MB/sec unidirectional Put bandwidth
>         - 2779 MB/sec bidirectional Put bandwidth
>
>   - OpenIB/Gen2 on Opteron with PCI-Ex and IBA-DDR:
>       Two-sided operations:
>         - 2.71 microsec one-way latency (4 bytes)
>         - 1411 MB/sec unidirectional bandwidth
>         - 2238 MB/sec bidirectional bandwidth
>
>       One-sided operations:
>         - 4.28 microsec Put latency
>         - 1411 MB/sec unidirectional Put bandwidth
>         - 2238 MB/sec bidirectional Put bandwidth
>
>   - Solaris uDAPL/IBTL on Opteron with PCI-Ex and IBA-SDR:
>       Two-sided operations:
>         - 4.81 microsec one-way latency (4 bytes)
>         - 981 MB/sec unidirectional bandwidth
>         - 1903 MB/sec bidirectional bandwidth
>
>       One-sided operations:
>         - 7.49 microsec Put latency
>         - 981 MB/sec unidirectional Put bandwidth
>         - 1903 MB/sec bidirectional Put bandwidth
>
>   - OpenIB/Gen2 uDAPL on EM64T with PCI-Ex and IBA-SDR:
>       Two-sided operations:
>         - 3.56 microsec one-way latency (4 bytes)
>         - 964 MB/sec unidirectional bandwidth
>         - 1846 MB/sec bidirectional bandwidth
>
>       One-sided operations:
>         - 6.85 microsec Put latency
>         - 964 MB/sec unidirectional Put bandwidth
>         - 1846 MB/sec bidirectional Put bandwidth
>
>   - OpenIB/Gen2 uDAPL on EM64T with PCI-Ex and IBA-DDR:
>       Two-sided operations:
>         - 3.18 microsec one-way latency (4 bytes)
>         - 1484 MB/sec unidirectional bandwidth
>         - 2635 MB/sec bidirectional bandwidth
>
>       One-sided operations:
>         - 5.41 microsec Put latency
>         - 1485 MB/sec unidirectional Put bandwidth
>         - 2635 MB/sec bidirectional Put bandwidth
>
> Performance numbers for all other platforms, system configurations and
> operations can be viewed by visiting `Performance' section of the
> project's web page.
>
> With the ADI-3-level design, MVAPICH2 0.9.5 delivers similar
> performance for two-sided operations compared to MVAPICH 0.9.8.
> Performance comparison between MVAPICH2 0.9.5 and MVAPICH 0.9.8 for
> sample applications can be seen by visiting the following URL:
>
>   http://nowlab.cse.ohio-state.edu/projects/mpi-iba/perf-apps.html
>
> Organizations and users interested in getting the best performance for
> both two-sided and one-sided operations and also want to exploit
> `multi-threading' and `integrated multi-rail' capabilities may migrate
> from MVAPICH code base to MVAPICH2 code base.
>
> For downloading MVAPICH2 0.9.5 package and accessing the anonymous
> SVN, please visit the following URL:
>
> http://nowlab.cse.ohio-state.edu/projects/mpi-iba/
>
> A stripped down version of this release is also available at the
> OpenIB SVN.
>
> All feedbacks, including bug reports and hints for performance tuning,
> are welcome. Please post it to the mvapich-discuss mailing list.
>
> Thanks,
>
> MVAPICH Team at OSU/NBCL
>
> ======================================================================
> MVAPICH/MVAPICH2 project is currently supported with funding from
> U.S. National Science Foundation, U.S. DOE Office of Science,
> Mellanox, Intel, Cisco Systems, Sun Microsystems and Linux Networx;
> and with equipment support from Advanced Clustering, AMD, Apple,
> Appro, Dell, IBM, Intel, Mellanox, Microway, PathScale, SilverStorm
> and Sun Microsystems. Other technology partner includes Etnus.
> ======================================================================
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at mail.cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>

-- 
Eric A. Borisch
eborisch at ieee.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20060831/46014e14/attachment-0001.html