From schuler.77 at osu.edu Thu Sep 7 08:54:30 2023 From: schuler.77 at osu.edu (Schuler, William) Date: Thu, 7 Sep 2023 12:54:30 +0000 Subject: [CaCL] 9/7: Retentive Network: A Successor to Transformer for Large Language Models In-Reply-To: References: Message-ID: Here?s a site that walks through the math: https://medium.com/ai-fusion-labs/retentive-networks-retnet-explained-the-much-awaited-transformers-killer-is-here-6c17e3e8add8 ________________________________ From: CaCL on behalf of Oh, Byung-Doh via CaCL Sent: Thursday, August 31, 2023 2:32:57 PM To: cacl at lists.osu.edu Subject: [CaCL] 9/7: Retentive Network: A Successor to Transformer for Large Language Models Hello everyone, Next week, we'll discuss the following paper on Retentive Network: Retentive Network: A Successor to Transformer for Large Language Models https://arxiv.org/pdf/2307.08621.pdf In this work, we propose Retentive Network (RETNET) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence and attention. Then we propose the retention mechanism for sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, and chunkwise recurrent. Specifically, the parallel representation allows for training parallelism. The recurrent representation enables low-cost O(1) inference, which improves decoding throughput, latency, and GPU memory without sacrificing performance. The chunkwise recurrent representation facilitates efficient long-sequence modeling with linear complexity, where each chunk is encoded parallelly while recurrently summarizing the chunks. Experimental results on language modeling show that RETNET achieves favorable scaling results, parallel training, low-cost deployment, and efficient inference. The intriguing properties make RETNET a strong successor to Transformer for large language models. Code will be available at https://aka.ms/retnet. Best, Byung-Doh ================= Byung-Doh Oh (he/him/his) Ph.D. Candidate Department of Linguistics The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From clark.3664 at buckeyemail.osu.edu Fri Sep 8 09:58:41 2023 From: clark.3664 at buckeyemail.osu.edu (Clark, Christian) Date: Fri, 8 Sep 2023 13:58:41 +0000 Subject: [CaCL] Reading for 9/14 Message-ID: Hi CaCL members, Here's the reading for our next meeting, on 9/14. Contextual Distortion Reveals Constituency: Masked Language Models are Implicit Parsers (Li and Lu 2023) Paper: https://aclanthology.org/2023.acl-long.285.pdf Zoom: https://osu.zoom.us/j/93495474520?pwd=NWFRZzF6QnZrQWdheThxbWJyNjJ4dz09 Abstract: Recent advancements in pre-trained language models (PLMs) have demonstrated that these models possess some degree of syntactic awareness. To leverage this knowledge, we propose a novel chart-based method for extracting parse trees from masked language models (LMs) without the need to train separate parsers. Our method computes a score for each span based on the distortion of contextual representations resulting from linguistic perturbations. We design a set of perturbations motivated by the linguistic concept of constituency tests, and use these to score each span by aggregating the distortion scores. To produce a parse tree, we use chart parsing to find the tree with the minimum score. Our method consistently outperforms previous state-of-the-art methods on English with masked LMs, and also demonstrates superior performance in a multilingual setting, outperforming the state-of-the-art in 6 out of 8 languages. Notably, although our method does not involve parameter updates or extensive hyperparameter search, its performance can even surpass some unsupervised parsing methods that require fine-tuning. Our analysis highlights that the distortion of contextual representation resulting from syntactic perturbation can serve as an effective indicator of constituency across languages. ---- Christian Clark Ph.D. Student Department of Linguistics The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From oh.531 at buckeyemail.osu.edu Sun Sep 17 16:19:07 2023 From: oh.531 at buckeyemail.osu.edu (Oh, Byung-Doh) Date: Sun, 17 Sep 2023 20:19:07 +0000 Subject: [CaCL] 9/21: Syntax and geometry of information Message-ID: Dear CaCL members, This week, we'll be discussing the following paper: Syntax and Geometry of Information (Bailly et al., 2023) https://aclanthology.org/2023.acl-long.590.pdf This paper presents an information-theoretical model of syntactic generalization. We study syntactic generalization from the perspective of the capacity to disentangle semantic and structural information, emulating the human capacity to assign a grammaticality judgment to semantically nonsensical sentences. In order to isolate the structure, we propose to represent the probability distribution behind a corpus as the product of the probability of a semantic context and the probability of a structure, the latter being independent of the former. We further elaborate the notion of abstraction as a relaxation of the property of independence. It is based on the measure of structural and contextual information for a given representation. We test abstraction as an optimization objective on the task of inducing syntactic categories from natural language data and show that it significantly outperforms alternative methods. Furthermore, we find that when syntax-unaware optimization objectives succeed in the task, their success is mainly due to an implicit disentanglement process rather than to the model structure. On the other hand, syntactic categories can be deduced in a principled way from the independence between structure and context. Best, Byung-Doh ================= Byung-Doh Oh (he/him/his) Ph.D. Candidate Department of Linguistics The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From schuler.77 at osu.edu Thu Sep 21 11:30:36 2023 From: schuler.77 at osu.edu (Schuler, William) Date: Thu, 21 Sep 2023 15:30:36 +0000 Subject: [CaCL] may be a little late Message-ID: Hi all, I may be a little late to cacl, as my previous meeting is on the other side of campus. Feel free to start without me, wm -------------- next part -------------- An HTML attachment was scrubbed... URL: From schuler.77 at osu.edu Wed Sep 27 19:36:29 2023 From: schuler.77 at osu.edu (Schuler, William) Date: Wed, 27 Sep 2023 23:36:29 +0000 Subject: [CaCL] CACL: lecture notes for linear algebra notation Message-ID: Hi all, In CACL tomorrow I?ll be going over the following lecture notes on linear algebra notation: https://www.asc.ohio-state.edu/schuler.77/courses/5523/5523LN08linalg.pdf wm -------------- next part -------------- An HTML attachment was scrubbed... URL: