[CaCL] 4/11: Birth of a Transformer: A Memory Viewpoint

Oh, Byung-Doh oh.531 at buckeyemail.osu.edu
Fri Apr 5 09:57:36 EDT 2024


I forgot to mention that we'll be meeting in Oxley 102 on this day due to the planned construction in our usual meeting room.


=================
Byung-Doh Oh (he/him/his)
Ph.D. Candidate
Department of Linguistics
The Ohio State University

________________________________
From: Oh, Byung-Doh <oh.531 at buckeyemail.osu.edu>
Sent: Thursday, April 4, 2024 2:11 PM
To: cacl at lists.osu.edu <cacl at lists.osu.edu>
Subject: 4/11: Birth of a Transformer: A Memory Viewpoint

Hi everyone,

Next week, we'll discuss the following paper:

Birth of a Transformer: A Memory Viewpoint
https://arxiv.org/abs/2306.00802

Large language models based on transformers have achieved great empirical successes. However, as they are deployed more widely, there is a growing need to better understand their internal mechanisms in order to make them more reliable. These models appear to store vast amounts of knowledge from their training data, and to adapt quickly to new information provided in their context or prompt. We study how transformers balance these two types of knowledge by considering a synthetic setup where tokens are generated from either global or context-specific bigram distributions. By a careful empirical analysis of the training process on a simplified two-layer transformer, we illustrate the fast learning of global bigrams and the slower development of an "induction head" mechanism for the in-context bigrams. We highlight the role of weight matrices as associative memories, provide theoretical insights on how gradients enable their learning during training, and study the role of data-distributional properties.

Best,
Byung-Doh

=================
Byung-Doh Oh (he/him/his)
Ph.D. Candidate
Department of Linguistics
The Ohio State University

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/cacl/attachments/20240405/518e6b9e/attachment.html>


More information about the CaCL mailing list