[mvapich-discuss] Announcing the release of MVAPICH2 0.9.5 with SRQ, intergrated multi-rail and TotalView support

Abhinav Vishnu vishnu at cse.ohio-state.edu
Thu Aug 31 12:39:40 EDT 2006


Hi Eric,

Thanks for trying out MVAPICH2 0.9.5 and report the issue to us.
Also, glad to know that you are seeing excellent performance with MVAPICH
using multi-rail device.

For MVAPICH2, the multi-rail integration has been done for OpenIB/Gen2
device. May i suggest you to try MVAPICH2 with this and
let us know if you are still have performance issues.

Thanks again for trying MVAPICH2. Please keep us posted.

Regards,

- Abhinav

-------------------------------
Abhinav Vishnu,
Graduate Research Associate,
Department Of Comp. Sc. & Engg.
The Ohio State University.
-------------------------------

On Thu, 31 Aug 2006, Eric A. Borisch wrote:

> Morning all,
>
> I've installed MVAPICH2 0.9.5 on my system, (nodes are dual P4 3.4GHz, PCI-E
> SDR 2-port cards)
>
> I can't seem to get the multi-rail functionality working. It works, however,
> with the latest MVAPICH distribution on this system.
>
> Here are the mpich[2]version outputs on the two builds:
>
> MVAPICH2:
>
> Version:           1.0.3
> Device:            osu_ch3:mrail
> Configure Options: --prefix=/share/apps/mvapich2 --with-device=osu_ch3:mrail
> --with-rdma=vapi --with-pm=mpd --disable-romio --enable-sharedlibs=gcc
> --with-mpe
>
>
> MVAPICH (Multi-rail):
>
> MPICH Version:          1.2.7
> MPICH Release date:     $Date: 2005/06/22 16:33:49$
> MPICH Patches applied:  none
> MPICH configure:        --with-device=vapi_multirail --with-arch=LINUX
> -prefix=/share/apps/mvapich_mr --without-romio --with-mpe --enable-sharedlib
> -lib=-L/usr/lib64 -lmtl_common -lvapi -lmosal -lmpga -lpthread
> MPICH Device:           vapi_multirail
>
>
>
> Here are the IMB ping-pong invocations and results:
>
> MVAPICH2:
>
> > [eborisch at rt2 IMB-MPI]$ mpiexec -n 2 -env NUM_HCAS 1 -env NUM_PORTS 2
> > ./IMB-MPI1_mvapich2_sharedlib pingpong
> > #---------------------------------------------------
> > #    Intel (R) MPI Benchmark Suite V2.3, MPI-1 part
> > #---------------------------------------------------
> > # Date       : Thu Aug 31 09:24:47 2006
> > # Machine    : x86_64# System     : Linux
> > # Release    : 2.6.9-22.ELsmp
> > # Version    : #1 SMP Sat Oct 8 21:32:36 BST 2005
> >
> > #
> > # Minimum message length in bytes:   0
> > # Maximum message length in bytes:   4194304
> > #
> > # MPI_Datatype                   :   MPI_BYTE
> > # MPI_Datatype for reductions    :   MPI_FLOAT
> > # MPI_Op                         :   MPI_SUM
> > #
> > #
> >
> > # List of Benchmarks to run:
> >
> > # PingPong
> >
> > #---------------------------------------------------
> > # Benchmarking PingPong
> > # #processes = 2
> > #---------------------------------------------------
> >        #bytes #repetitions      t[usec]   Mbytes/sec
> >             0         1000         3.99         0.00
> >             1         1000         4.21         0.23
> >             2         1000         4.09         0.47
> >             4         1000         4.15         0.92
> >             8         1000         4.21         1.81
> >            16         1000         4.26         3.59
> >            32         1000         4.31         7.09
> >            64         1000         4.49        13.60
> >           128         1000         4.70        25.95
> >           256         1000         5.24        46.62
> >           512         1000         6.57        74.36
> >          1024         1000         8.01       121.86
> >          2048         1000         9.50       205.52
> >          4096         1000        12.69       307.80
> >          8192         1000        24.90       313.75
> >         16384         1000        33.25       469.99
> >         32768         1000        50.26       621.73
> >         65536          640        84.34       741.05
> >        131072          320       152.60       819.16
> >        262144          160       289.14       864.64
> >        524288           80       561.59       890.33
> >       1048576           40      1105.97       904.18
> >       2097152           20      2199.28       909.39
> >       4194304           10      4377.79       913.70
> >
>
> MVAPICH (Multi-Rail):
>
> > /share/apps/mvapich_mr/bin/mpirun -np 2 -machinefile machfile
> > IMB-MPI1_mvapich_mr pingpong
> > #---------------------------------------------------
> > #    Intel (R) MPI Benchmark Suite V2.3, MPI-1 part
> > #---------------------------------------------------
> > # Date       : Thu Aug 31 09:25:13 2006
> > # Machine    : x86_64# System     : Linux
> > # Release    : 2.6.9-22.ELsmp
> > # Version    : #1 SMP Sat Oct 8 21:32:36 BST 2005
> >
> > #
> > # Minimum message length in bytes:   0
> > # Maximum message length in bytes:   4194304
> > #
> > # MPI_Datatype                   :   MPI_BYTE
> > # MPI_Datatype for reductions    :   MPI_FLOAT
> > # MPI_Op                         :   MPI_SUM
> > #
> > #
> >
> > # List of Benchmarks to run:
> >
> > # PingPong
> >
> > #---------------------------------------------------
> > # Benchmarking PingPong
> > # #processes = 2
> > #---------------------------------------------------
> >        #bytes #repetitions      t[usec]   Mbytes/sec
> >             0         1000         4.36         0.00
> >             1         1000         4.56         0.21
> >             2         1000         4.48         0.43
> >             4         1000         4.33         0.88
> >             8         1000         4.77         1.60
> >            16         1000         4.74         3.22
> >            32         1000         4.50         6.79
> >            64         1000         4.97        12.28
> >           128         1000         4.82        25.30
> >           256         1000         5.28        46.27
> >           512         1000         6.65        73.47
> >          1024         1000         7.93       123.18
> >          2048         1000         9.55       204.58
> >          4096         1000        12.99       300.76
> >          8192         1000        28.35       275.55
> >         16384         1000        33.36       468.32
> >         32768         1000        44.43       703.39
> >         65536          640        66.83       935.16
> >        131072          320       111.99      1116.20
> >        262144          160       200.89      1244.44
> >        524288           80       378.64      1320.52
> >       1048576           40       736.00      1358.70
> >       2097152           20      1448.02      1381.19
> >       4194304           10      2897.55      1380.48
> >
>
> Thanks in advance for any suggestions!
>   Eric Borisch
>   Mayo Clinic - Radiology Research
>
>
> On 8/30/06, Dhabaleswar Panda <panda at cse.ohio-state.edu> wrote:
> > The MVAPICH team is pleased to announce the availability of MVAPICH2
> > 0.9.5 with the following NEW features:
> >
> >  - Shared Receive Queue (SRQ) and Adaptive RDMA support: These
> >    features reduce memory usage of the MPI library significantly to
> >    provide scalability without any degradation in performance.
> >
> >    Performance of applications and memory scalability using SRQ
> >    and Adaptive RDMA support can be seen by visiting the following
> >    URL:
> >
> >    http://nowlab.cse.ohio-state.edu/projects/mpi-iba/perf-apps.html
> >
> >  - Integrated multi-rail communication support for both two-sided and
> >    one-sided operations
> >      - Multiple queue pairs per port
> >      - Multiple ports per adapter
> >      - Multiple adapters
> >
> >  - Support for TotalView debugger
> >
> >  - Auto-detection of Architecture and InfiniBand adapters
> >
> > More details on all features and supported platforms can be obtained
> > by visiting the following URL:
> >
> > http://nowlab.cse.ohio-state.edu/projects/mpi-iba/mvapich2_features.html
> >
> > MVAPICH2 0.9.5 continues to deliver excellent performance.  Sample
> > performance numbers include:
> >
> >   - OpenIB/Gen2 on EM64T with PCI-Ex and IBA-DDR:
> >       Two-sided operations:
> >         - 2.97 microsec one-way latency (4 bytes)
> >         - 1478 MB/sec unidirectional bandwidth
> >         - 2658 MB/sec bidirectional bandwidth
> >
> >       One-sided operations:
> >         - 5.08 microsec Put latency
> >         - 1484 MB/sec unidirectional Put bandwidth
> >         - 2658 MB/sec bidirectional Put bandwidth
> >
> >   - OpenIB/Gen2 on EM64T with PCI-Ex and IBA-DDR (Dual-rail):
> >       Two-sided operations:
> >         - 3.01 microsec one-way latency (4 bytes)
> >         - 2346 MB/sec unidirectional bandwidth
> >         - 2779 MB/sec bidirectional bandwidth
> >
> >       One-sided operations:
> >         - 4.70 microsec Put latency
> >         - 2389 MB/sec unidirectional Put bandwidth
> >         - 2779 MB/sec bidirectional Put bandwidth
> >
> >   - OpenIB/Gen2 on Opteron with PCI-Ex and IBA-DDR:
> >       Two-sided operations:
> >         - 2.71 microsec one-way latency (4 bytes)
> >         - 1411 MB/sec unidirectional bandwidth
> >         - 2238 MB/sec bidirectional bandwidth
> >
> >       One-sided operations:
> >         - 4.28 microsec Put latency
> >         - 1411 MB/sec unidirectional Put bandwidth
> >         - 2238 MB/sec bidirectional Put bandwidth
> >
> >   - Solaris uDAPL/IBTL on Opteron with PCI-Ex and IBA-SDR:
> >       Two-sided operations:
> >         - 4.81 microsec one-way latency (4 bytes)
> >         - 981 MB/sec unidirectional bandwidth
> >         - 1903 MB/sec bidirectional bandwidth
> >
> >       One-sided operations:
> >         - 7.49 microsec Put latency
> >         - 981 MB/sec unidirectional Put bandwidth
> >         - 1903 MB/sec bidirectional Put bandwidth
> >
> >   - OpenIB/Gen2 uDAPL on EM64T with PCI-Ex and IBA-SDR:
> >       Two-sided operations:
> >         - 3.56 microsec one-way latency (4 bytes)
> >         - 964 MB/sec unidirectional bandwidth
> >         - 1846 MB/sec bidirectional bandwidth
> >
> >       One-sided operations:
> >         - 6.85 microsec Put latency
> >         - 964 MB/sec unidirectional Put bandwidth
> >         - 1846 MB/sec bidirectional Put bandwidth
> >
> >   - OpenIB/Gen2 uDAPL on EM64T with PCI-Ex and IBA-DDR:
> >       Two-sided operations:
> >         - 3.18 microsec one-way latency (4 bytes)
> >         - 1484 MB/sec unidirectional bandwidth
> >         - 2635 MB/sec bidirectional bandwidth
> >
> >       One-sided operations:
> >         - 5.41 microsec Put latency
> >         - 1485 MB/sec unidirectional Put bandwidth
> >         - 2635 MB/sec bidirectional Put bandwidth
> >
> > Performance numbers for all other platforms, system configurations and
> > operations can be viewed by visiting `Performance' section of the
> > project's web page.
> >
> > With the ADI-3-level design, MVAPICH2 0.9.5 delivers similar
> > performance for two-sided operations compared to MVAPICH 0.9.8.
> > Performance comparison between MVAPICH2 0.9.5 and MVAPICH 0.9.8 for
> > sample applications can be seen by visiting the following URL:
> >
> >   http://nowlab.cse.ohio-state.edu/projects/mpi-iba/perf-apps.html
> >
> > Organizations and users interested in getting the best performance for
> > both two-sided and one-sided operations and also want to exploit
> > `multi-threading' and `integrated multi-rail' capabilities may migrate
> > from MVAPICH code base to MVAPICH2 code base.
> >
> > For downloading MVAPICH2 0.9.5 package and accessing the anonymous
> > SVN, please visit the following URL:
> >
> > http://nowlab.cse.ohio-state.edu/projects/mpi-iba/
> >
> > A stripped down version of this release is also available at the
> > OpenIB SVN.
> >
> > All feedbacks, including bug reports and hints for performance tuning,
> > are welcome. Please post it to the mvapich-discuss mailing list.
> >
> > Thanks,
> >
> > MVAPICH Team at OSU/NBCL
> >
> > ======================================================================
> > MVAPICH/MVAPICH2 project is currently supported with funding from
> > U.S. National Science Foundation, U.S. DOE Office of Science,
> > Mellanox, Intel, Cisco Systems, Sun Microsystems and Linux Networx;
> > and with equipment support from Advanced Clustering, AMD, Apple,
> > Appro, Dell, IBM, Intel, Mellanox, Microway, PathScale, SilverStorm
> > and Sun Microsystems. Other technology partner includes Etnus.
> > ======================================================================
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at mail.cse.ohio-state.edu
> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
>
>
>
> --
> Eric A. Borisch
> eborisch at ieee.org
>



More information about the mvapich-discuss mailing list