<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:Mangal;
panose-1:0 0 4 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"Segoe UI";
panose-1:2 11 5 2 4 2 4 2 2 3;}
@font-face
{font-family:Aptos;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
span.contentpasted0
{mso-style-name:contentpasted0;}
span.markw2t7fp904
{mso-style-name:markw2t7fp904;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:1366445118;
mso-list-template-ids:1446524036;}
@list l0:level1
{mso-level-number-format:bullet;
mso-level-text:\F0B7;
mso-level-tab-stop:.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:1.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:"Courier New";
mso-bidi-font-family:"Times New Roman";}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:1.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:2.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:2.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:3.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:3.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:4.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:4.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span class="contentpasted0"><span style="font-family:"Aptos",sans-serif;color:#242424;background:white">The High-Performance Deep Learning (HiDL) team is pleased to announce</span></span><span style="font-size:12.0pt;font-family:"Aptos",sans-serif;color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span class="contentpasted0"><span style="font-family:"Aptos",sans-serif;color:#242424;background:white">the release of </span></span><span class="markw2t7fp904"><span style="font-family:"Aptos",sans-serif;color:#242424;background:white">ParaInfer</span></span><span class="contentpasted0"><span style="font-family:"Aptos",sans-serif;color:#242424;background:white">-X
v1.0, which </span></span>is a collection of parallel inference techniques <o:p>
</o:p></p>
<p class="MsoNormal">that can facilitate the deployment of emerging AI models on edge devices and HPC clusters.
<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">This package leverages highly performant GPU kernels that maximize computational throughput,
<o:p></o:p></p>
<p class="MsoNormal">intelligent scheduling strategies that ensure optimal load balancing across resources,
<o:p></o:p></p>
<p class="MsoNormal">and sophisticated distributed communication libraries that facilitate large-scale
<o:p></o:p></p>
<p class="MsoNormal">inference by enabling seamless data exchange and coordination among
<o:p></o:p></p>
<p class="MsoNormal">distributed systems. ParaInfer-X v1.0 proposes a temporal fusion framework,
<o:p></o:p></p>
<p class="MsoNormal">named Flover, to smartly batch multiple requests during LLM generation,
<o:p></o:p></p>
<p class="MsoNormal">which is also known as temporal fusion/in-flight batching.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal" style="background:white"><span style="color:black">The new features available with this release of the ParaInfer-X package are as follows:<o:p></o:p></span></p>
<ul type="disc">
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
Based on Faster Transformer<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
<span style="color:red">(NEW) </span>Support for inference of various large language models:<o:p></o:p></li><ul type="circle">
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level2 lfo1">
<span style="color:red">(NEW) </span>GPT-J 6B<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level2 lfo1">
<span style="color:red">(NEW) </span>LlaMA 7B<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level2 lfo1">
<span style="color:red">(NEW) </span>LlaMA 13B<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level2 lfo1">
<span style="color:red">(NEW) </span>LlaMA 33B<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level2 lfo1">
<span style="color:red">(NEW) </span>LlaMA 65B<o:p></o:p></li></ul>
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
<span style="color:red">(NEW) </span>Support for persistent model inference stream<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
<span style="color:red">(NEW) </span>Support for temporal fusion/in-flight batching of multiple requests<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
<span style="color:red">(NEW) </span>Support for multiple GPU tensor parallelism<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
<span style="color:red">(NEW) </span>Support for asynchronous memory reordering for evicting finished requests<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
<span style="color:red">(NEW) </span>Support for float32, float16, bfloat16 for model inference<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
Compatible with <o:p></o:p></li><ul type="circle">
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level2 lfo1">
<span style="color:red">(NEW) </span>NVIDIA GPU A100 and V100<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level2 lfo1">
<span style="color:red">(NEW) </span>CUDA [11.2, 11.3, 11.4, 11.6] <o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level2 lfo1">
<span style="color:red">(NEW) </span>GCC >= 8.5.0<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level2 lfo1">
<span style="color:red">(NEW) </span>CMAKE >= 3.18<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level2 lfo1">
<span style="color:red">(NEW) </span>Intel oneTBB >= v2020.0<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level2 lfo1">
<span style="color:red">(NEW) </span>Customized CUDA kernels<o:p></o:p></li></ul>
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
<span style="color:red">(NEW) </span>Support for visualization output of inference progress<o:p></o:p></li></ul>
<p class="MsoNormal" style="background:white"><span style="color:black">The ParaInfer-X package is open-source, and hosted at the following URL:</span><o:p></o:p></p>
<p class="MsoNormal" style="background:white"><o:p> </o:p></p>
<p class="MsoNormal" style="background:white"><span class="contentpasted0"><span style="font-family:"Aptos",sans-serif;color:#242424;background:white"><a href="https://github.com/OSU-Nowlab/Flover">https://github.com/OSU-Nowlab/Flover</a>
</span></span><span style="font-size:12.0pt;font-family:"Aptos",sans-serif;color:black"><o:p></o:p></span></p>
<p class="MsoNormal" style="background:white"><span style="font-size:11.5pt;font-family:"Segoe UI",sans-serif;color:#242424"><o:p> </o:p></span></p>
<p class="MsoNormal" style="background:white"><span class="contentpasted0"><span style="font-family:"Segoe UI",sans-serif;color:#242424">For associated release information, please visit the following URL:</span></span><span style="font-size:11.5pt;font-family:"Segoe UI",sans-serif;color:#242424"><o:p></o:p></span></p>
<p class="MsoNormal" style="background:white"><span style="font-size:11.5pt;font-family:"Segoe UI",sans-serif;color:#242424"><o:p> </o:p></span></p>
<p class="MsoNormal" style="margin-bottom:12.0pt;background:white"><span style="font-family:"Aptos",sans-serif;color:black;background:white"><a href="http://hidl.cse.ohio-state.edu">http://hidl.cse.ohio-state.edu</a></span><span style="font-size:12.0pt;font-family:"Aptos",sans-serif;color:black"><o:p></o:p></span></p>
<p class="MsoNormal" style="background:white"><span class="contentpasted0"><span style="font-family:"Aptos",sans-serif;color:#242424;background:white">Sample performance numbers for </span></span><span class="markw2t7fp904"><span style="font-family:"Aptos",sans-serif;color:#242424;background:white">ParaInfer-X
</span></span><span class="contentpasted0"><span style="font-family:"Aptos",sans-serif;color:#242424;background:white">using inference
</span></span><span style="font-size:12.0pt;font-family:"Aptos",sans-serif;color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span class="contentpasted0"><span style="font-family:"Aptos",sans-serif;color:#242424;background:white">benchmarks can be viewed by visiting the `Performance' tab</span></span><span style="font-size:12.0pt;font-family:"Aptos",sans-serif;color:black"><br>
</span><span class="contentpasted0"><span style="font-family:"Aptos",sans-serif;color:#242424;background:white">of the above website.</span></span><span style="font-size:12.0pt;font-family:"Aptos",sans-serif;color:black"><br>
<br>
</span><span class="contentpasted0"><span style="font-family:"Aptos",sans-serif;color:#242424;background:white">All questions, feedback, and bug reports are welcome. Please post to</span></span><span style="font-size:12.0pt;font-family:"Aptos",sans-serif;color:black"><br>
</span><span class="contentpasted0"><span style="font-family:"Aptos",sans-serif;color:#242424;background:white"><a href="mailto:hidl-discuss@lists.osu.edu">hidl-discuss@lists.osu.edu</a>.</span></span><span style="font-size:12.0pt;font-family:"Aptos",sans-serif;color:black"><br>
<br>
</span><span class="contentpasted0"><span style="font-family:"Aptos",sans-serif;color:#242424;background:white">Thanks,</span></span><span style="font-size:12.0pt;font-family:"Aptos",sans-serif;color:black"><br>
<br>
</span><span class="contentpasted0"><span style="font-family:"Aptos",sans-serif;color:#242424;background:white">The High-Performance Deep Learning (HiDL) Team</span></span><span style="font-size:12.0pt;font-family:"Aptos",sans-serif;color:black"><br>
<a href="http://hidl.cse.ohio-state.edu/" target="_blank"><span style="font-size:11.0pt;background:white">http://hidl.cse.ohio-state.edu</span></a><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Aptos",sans-serif;color:black"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Aptos",sans-serif;color:black">PS: The number of organizations using the HiDL stacks has crossed 88<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Aptos",sans-serif;color:black">(from 21 countries). The HiDL team would like to thank all its users<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span style="font-size:12.0pt;font-family:"Aptos",sans-serif;color:black">and organizations!!</span><o:p></o:p></p>
<p class="MsoNormal" style="background:white"><o:p> </o:p></p>
<p class="MsoNormal" style="background:white"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>