<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">

</head>

<body>

<div>

<div>

<div dir="ltr">

<div>Here’s a site that walks through the math:</div>

<div dir="ltr"><br>

</div>

<div dir="ltr"><a rel="noreferrer noopener" href="https://medium.com/ai-fusion-labs/retentive-networks-retnet-explained-the-much-awaited-transformers-killer-is-here-6c17e3e8add8">https://medium.com/ai-fusion-labs/retentive-networks-retnet-explained-the-much-awaited-transformers-killer-is-here-6c17e3e8add8</a><br>

</div>

</div>

</div>

<div id="ms-outlook-mobile-signature">

<div></div>

</div>

</div>

<hr style="display:inline-block;width:98%" tabindex="-1">

<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> CaCL <cacl-bounces+schuler=ling.osu.edu@lists.osu.edu> on behalf of Oh, Byung-Doh via CaCL <cacl@lists.osu.edu><br>

<b>Sent:</b> Thursday, August 31, 2023 2:32:57 PM<br>

<b>To:</b> cacl@lists.osu.edu <cacl@lists.osu.edu><br>

<b>Subject:</b> [CaCL] 9/7: Retentive Network: A Successor to Transformer for Large Language Models</font>

<div> </div>

</div>

<style type="text/css" style="display:none">

<!--

p

        {margin-top:0;

        margin-bottom:0}

-->

</style>

<div dir="ltr">

<div class="x_elementToProof" style="font-family:Calibri,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

Hello everyone,</div>

<div class="x_elementToProof" style="font-family:Calibri,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

<br>

</div>

<div class="x_elementToProof" style="font-family:Calibri,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

Next week, we'll discuss the following paper on Retentive Network:</div>

<div class="x_elementToProof" style="font-family:Calibri,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

<br>

</div>

<div class="x_elementToProof x_ContentPasted0" style="font-family:Calibri,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

<b>Retentive Network: A Successor to Transformer for Large Language Models</b><br>

</div>

<div class="x_elementToProof">

<div class="x_ContentPasted1" style="font-family:Calibri,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

<a href="https://urldefense.com/v3/__https://arxiv.org/pdf/2307.08621.pdf__;!!KGKeukY!wVnlcLm6OYgby9mskEjEVwuxODKw4np1MdzKBXRGQYdbLsBcN5EvAk8puzKon_4vT7_Czz8cEmdCdsRLzc4$" id="LPlnk280120" class="x_OWAAutoLink" data-loopstyle="linkonly">https://arxiv.org/pdf/2307.08621.pdf</a></div>

<div class="x_ContentPasted2" style="font-family:Calibri,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

In this work, we propose Retentive Network (RETNET) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence and

 attention. Then we propose the retention mechanism for sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, and chunkwise recurrent. Specifically, the parallel representation allows for training parallelism. The recurrent

 representation enables low-cost O(1) inference, which improves decoding throughput, latency, and GPU memory without sacrificing performance. The chunkwise recurrent representation facilitates efficient long-sequence modeling with linear complexity, where each

 chunk is encoded parallelly while recurrently summarizing the chunks. Experimental results on language modeling show that RETNET achieves favorable scaling results, parallel training, low-cost deployment, and efficient inference. The intriguing properties

 make RETNET a strong successor to Transformer for large language models. Code will be available at

<a href="https://urldefense.com/v3/__https://aka.ms/retnet__;!!KGKeukY!wVnlcLm6OYgby9mskEjEVwuxODKw4np1MdzKBXRGQYdbLsBcN5EvAk8puzKon_4vT7_Czz8cEmdCtufTM5M$" id="LPlnk417065" data-loopstyle="linkonly" class="x_OWAAutoLink">

https://aka.ms/retnet</a>.<br>

</div>

<div style="font-family:Calibri,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

<br>

</div>

<div style="font-family:Calibri,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

Best,</div>

<div style="font-family:Calibri,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

Byung-Doh</div>

<div id="x_Signature">

<div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

<br>

</div>

<div></div>

<div></div>

<div></div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

<span style="font-family:"Lucida Sans Unicode","Lucida Grande",sans-serif; font-size:10pt">=================</span></div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

<span style="font-size:10pt"></span><span style="font-size:11pt"></span><span style="font-family:"Lucida Sans Unicode","Lucida Grande",sans-serif; font-size:10pt"><b>Byung-Doh Oh</b> (he/him/his)</span></div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

<span style="font-size:10pt"></span><span style="font-size:11pt"></span><span style="font-family:"Lucida Sans Unicode","Lucida Grande",sans-serif; font-size:10pt">Ph.D. Candidate</span></div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

<span style="font-size:10pt"></span><span style="font-size:11pt"></span><span style="font-family:"Lucida Sans Unicode","Lucida Grande",sans-serif; font-size:10pt">Department of Linguistics</span></div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

<span style="font-size:10pt"></span><span style="font-size:11pt"></span><span style="font-family:"Lucida Sans Unicode","Lucida Grande",sans-serif; font-size:10pt">The Ohio State University</span></div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

<br>

</div>

</div>

</div>

</div>

</div>

</body>

</html>