[Mvapich-discuss] Announcing the release of MPI4Spark 0.2

Panda, Dhabaleswar panda at cse.ohio-state.edu
Thu Aug 31 23:36:52 EDT 2023


The OSU High-Performance Big Data (HiBD) team is pleased to announce
the release of MPI4Spark 0.2, which is a custom version of the Apache
Spark package that exploits high-performance MPI communication on
modern HPC clusters that support InfiniBand, Intel Omni-Path, ROCE and HPE
Slingshot interconnects for Big Data applications. The MPI
communication backend in MPI4Spark uses the MVAPICH2-J Java bindings
of MVAPICH2. The MPI4Spark design allows performance portability 
for Spark workloads across HPC clusters with different interconnects.

This release of the MPI4Spark package is equipped with the following
features:

* MPI4Spark 0.2 Features:

   - Based on Apache Spark 3.0.0
   - Support for the YARN Cluster Manager
   - Compliant with user-level Apach Spark APIs and packages
   - High performance design that utilizes MPI-based communication
      - Utilizes MPI point-to-point operations
      - Relies on MPI Dynamic Process Management (DPM) features
        for launching executor processes
      - Relies on Multiple-Program-Multiple-Data (MPMD) launcher mode for
        launching executors when using the YARN cluster manager
   - Built on top of the MVAPICH2-J Java bindings for MVAPICH2
     family of MPI libraries
   - Tested with
      - OSU HiBD-Benchmarks, GroupBy and SortBy
      - Intel HiBench Suite, Micro Benchmarks, Machine Learning
        and Graph Workloads
      - Mellanox InfiniBand adapters (EDR and HDR 100G and 200G)
      - HPC systems with Intel OPA and Cray Slingshot interconnects
      - Various multi-core platforms

For downloading MPI4Spark 0.2 package, the associated user guide,
please visit the following URL:

http://hibd.cse.ohio-state.edu

Sample performance numbers for MPI4Spark using benchmarks can be
viewed by visiting the `Performance' tab of the above website.

All questions, feedback and bug reports are welcome. Please post to
rdma-spark-discuss at lists.osu.edu.

Thanks,

The High-Performance Big Data (HiBD) Team
http://hibd.cse.ohio-state.edu

PS: The number of organizations using the HiBD stacks has crossed 360
(from 39 countries). Similarly, the number of downloads from the HiBD
site has crossed 47,700.  The HiBD team would like to thank all its
users and organizations!!




More information about the Mvapich-discuss mailing list