From lin.4434 at buckeyemail.osu.edu Thu Feb 1 14:46:43 2024 From: lin.4434 at buckeyemail.osu.edu (Lin, Yi Chien) Date: Thu, 1 Feb 2024 19:46:43 +0000 Subject: [CaCL] Reading for Next Week (2/8) Message-ID: Hi all, Next week (2/8) we will be discussing ?Dissociating language and thought in large language models? (Mahowald et al., 2023). Abstract: Large language models (LLMs) have come closest among all models to date to mastering human language, yet opinions about their linguistic and cognitive capabilities remain split. Here, we evaluate LLMs using a distinction between formal linguistic competence knowledge of linguistic rules and patterns ? and functional linguistic competence ? understanding and using language in the world. We ground this distinction in human neuroscience, showing that formal and functional competence rely on different neural mechanisms. Although LLMs are surprisingly good at formal competence, their performance on functional competence tasks remains spotty and often requires specialized fine-tuning and/or coupling with external modules. In short, LLMs are good models of language but incomplete models of human thought. Link to paper: https://arxiv.org/abs/2301.06627 Best, Yi-Chien -------------- next part -------------- An HTML attachment was scrubbed... URL: From oh.531 at buckeyemail.osu.edu Thu Feb 8 14:11:52 2024 From: oh.531 at buckeyemail.osu.edu (Oh, Byung-Doh) Date: Thu, 8 Feb 2024 19:11:52 +0000 Subject: [CaCL] 2/15: Mamba: Linear-Time Sequence Modeling with Selective State Spaces Message-ID: Hi everyone, Next week, we'll discuss the following paper: Mamba: Linear-Time Sequence Modeling with Selective State Spaces https://arxiv.org/abs/2312.00752 Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements. First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token. Second, even though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even MLP blocks (Mamba). Mamba enjoys fast inference (5? higher throughput than Transformers) and linear scaling in sequence length, and its performance improves on real data up to million-length sequences. As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics. On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pretraining and downstream evaluation. Best, Byung-Doh ================= Byung-Doh Oh (he/him/his) Ph.D. Candidate Department of Linguistics The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From court.22 at buckeyemail.osu.edu Tue Feb 20 11:10:03 2024 From: court.22 at buckeyemail.osu.edu (Court, Sara) Date: Tue, 20 Feb 2024 16:10:03 +0000 Subject: [CaCL] This Week: Patel and Pavlick (2022) - Mapping Language Models to Grounded Conceptual Spaces Message-ID: Hi all, This week in CaCL we'll be discussing Patel and Pavlick (2022) "Mapping Language Models to Grounded Conceptual Spaces". Here's the link to the paper: https://openreview.net/pdf?id=gJcEM8sxHK Abstract: A fundamental criticism of text-only language models (LMs) is their lack of grounding?that is, the ability to tie a word for which they have learned a representation to its referent in the non-linguistic world. However, despite this limitation, large pre-trained LMs have been shown to have a remarkable grasp of the conceptual structure of language, as demonstrated by their ability to answer questions, generate fluent text, or make inferences about entities, objects, and properties that they have never physically observed. In this work we investigate the extent to which the rich conceptual structure that LMs learn indeed reflects the conceptual structure of the non-linguistic world?which is something that LMs have never observed. We do this by testing whether the LMs can learn to map an entire conceptual domain (e.g., direction or colour) onto a grounded world representation given only a small number of examples. For example, we show a model what the word ?left? means using a textual depiction of a grid world, and assess how well it can generalise to related concepts, for example, the word ?right?, in a similar grid world. We investigate a range of generative language models of varying sizes (including GPT-2 and GPT-3), and see that although the smaller models struggle to perform this mapping, the largest model can not only learn to ground the concepts that it is explicitly taught, but appears to generalise to several instances of unseen concepts as well. Our results suggest an alternative means of building grounded language models: rather than learning grounded representations ?from scratch?, it is possible that large text-only models learn a sufficiently rich conceptual structure that could allow them to be grounded in a data-efficient way See you Thursday, Sara -------------- next part -------------- An HTML attachment was scrubbed... URL: From clark.3664 at buckeyemail.osu.edu Thu Feb 22 15:37:55 2024 From: clark.3664 at buckeyemail.osu.edu (Clark, Christian) Date: Thu, 22 Feb 2024 20:37:55 +0000 Subject: [CaCL] Reading for 2/29 Message-ID: Hi CaCL members, On 2/29 we will discuss "Holographic CCG Parsing" by Yamaki et al. (2023). Paper: https://aclanthology.org/2023.acl-long.15.pdf Abstract: We propose a method for formulating CCG as a recursive composition in a continuous vector space. Recent CCG supertagging and parsing models generally demonstrate high performance, yet rely on black-box neural architectures to implicitly model phrase structure dependencies. Instead, we leverage the method of holographic embeddings as a compositional operator to explicitly model the dependencies between words and phrase structures in the embedding space. Experimental results revealed that holographic composition effectively improves the supertagging accuracy to achieve state-of-the-art parsing performance when using a C&C parser. The proposed span-based parsing algorithm using holographic composition achieves performance comparable to state-of-the-art neural parsing with Transformers. Furthermore, our model can semantically and syntactically infill text at the phrase level due to the decomposability of holographic composition. ---- Christian Clark Ph.D. Student Department of Linguistics The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From court.22 at buckeyemail.osu.edu Mon Feb 26 06:06:53 2024 From: court.22 at buckeyemail.osu.edu (Court, Sara) Date: Mon, 26 Feb 2024 11:06:53 +0000 Subject: [CaCL] Visual Guide to Mamba Message-ID: Hi all, I haven't had a chance to look at this closely yet, but I remember at least some folks were wishing for a more visual explanation of the mamba model we discussed recently... https://maartengrootendorst.substack.com/p/a-visual-guide-to-mamba-and-state - S -------------- next part -------------- An HTML attachment was scrubbed... URL: From oh.531 at buckeyemail.osu.edu Thu Feb 29 17:14:59 2024 From: oh.531 at buckeyemail.osu.edu (Oh, Byung-Doh) Date: Thu, 29 Feb 2024 22:14:59 +0000 Subject: [CaCL] 3/7: Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs Message-ID: Hello everyone, Next week, we'll discuss the following paper (which will apparently appear in ICLR '24): Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs https://arxiv.org/pdf/2309.07311.pdf Most interpretability research in NLP focuses on understanding the behavior and features of a fully trained model. However, certain insights into model behavior may only be accessible by observing the trajectory of the training process. We present a case study of syntax acquisition in masked language models (MLMs) that demonstrates how analyzing the evolution of interpretable artifacts throughout training deepens our understanding of emergent behavior. In particular, we study Syntactic Attention Structure (SAS), a naturally emerging property of MLMs wherein specific Transformer heads tend to focus on specific syntactic relations. We identify a brief window in pretraining when models abruptly acquire SAS, concurrent with a steep drop in loss. This breakthrough precipitates the subsequent acquisition of linguistic capabilities. We then examine the causal role of SAS by manipulating SAS during training, and demonstrate that SAS is necessary for the development of grammatical capabilities. We further find that SAS competes with other beneficial traits during training, and that briefly suppressing SAS improves model quality. These findings offer an interpretation of a real-world example of both simplicity bias and breakthrough training dynamics. Best, Byung-Doh ================= Byung-Doh Oh (he/him/his) Ph.D. Candidate Department of Linguistics The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: