[Hidl-discuss] Announcing the release of ParaInfer-X v1.0 for High-Performance Parallel Inference

Panda, Dhabaleswar panda at cse.ohio-state.edu
Thu Nov 9 17:42:39 EST 2023


The High-Performance Deep Learning (HiDL) team is pleased to announce
the release of ParaInfer-X v1.0, which is a collection of parallel inference techniques
that can facilitate the deployment of emerging AI models on edge devices and HPC clusters.

This package leverages highly performant GPU kernels that maximize computational throughput,
intelligent scheduling strategies that ensure optimal load balancing across resources,
and sophisticated distributed communication libraries that facilitate large-scale
inference by enabling seamless data exchange and coordination among
distributed systems. ParaInfer-X v1.0 proposes a temporal fusion framework,
named Flover, to smartly batch multiple requests during LLM generation,
which is also known as temporal fusion/in-flight batching.

The new features available with this release of the ParaInfer-X package are as follows:

  *   Based on Faster Transformer
  *   (NEW) Support for inference of various large language models:
     *   (NEW) GPT-J 6B
     *   (NEW) LlaMA 7B
     *   (NEW) LlaMA 13B
     *   (NEW) LlaMA 33B
     *   (NEW) LlaMA 65B
  *   (NEW) Support for persistent model inference stream
  *   (NEW) Support for temporal fusion/in-flight batching of multiple requests
  *   (NEW) Support for multiple GPU tensor parallelism
  *   (NEW) Support for asynchronous memory reordering for evicting finished requests
  *   (NEW) Support for float32, float16, bfloat16 for model inference
  *   Compatible with
     *   (NEW) NVIDIA GPU A100 and V100
     *   (NEW) CUDA [11.2, 11.3, 11.4, 11.6]
     *   (NEW) GCC >= 8.5.0
     *   (NEW) CMAKE >= 3.18
     *   (NEW) Intel oneTBB >= v2020.0
     *   (NEW) Customized CUDA kernels
  *   (NEW) Support for visualization output of inference progress
The ParaInfer-X package is open-source, and hosted at the following URL:

https://github.com/OSU-Nowlab/Flover

For associated release information, please visit the following URL:

http://hidl.cse.ohio-state.edu
Sample performance numbers for ParaInfer-X using inference
benchmarks can be viewed by visiting the `Performance' tab
of the above website.

All questions, feedback, and bug reports are welcome. Please post to
hidl-discuss at lists.osu.edu<mailto:hidl-discuss at lists.osu.edu>.

Thanks,

The High-Performance Deep Learning (HiDL) Team
http://hidl.cse.ohio-state.edu<http://hidl.cse.ohio-state.edu/>

PS: The number of organizations using the HiDL stacks has crossed 88
(from 21 countries).  The HiDL team would like to thank all its users
and organizations!!



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/hidl-discuss/attachments/20231109/1ad86955/attachment.html>


More information about the Hidl-discuss mailing list