From lin.4434 at buckeyemail.osu.edu Thu Mar 7 14:33:21 2024 From: lin.4434 at buckeyemail.osu.edu (Lin, Yi Chien) Date: Thu, 7 Mar 2024 19:33:21 +0000 Subject: [CaCL] Reading for 3/28: Contrastive Decoding: Open-ended Text Generation as Optimization Message-ID: Hi All, CaCL will not meet next week (3/14) and the week after (3/21). Our next meeting will be on 3/28 ? we will be discussing ?Contrastive Decoding: Open-ended Text Generation as Optimization? (Li et al., 2023). Paper: https://aclanthology.org/2023.acl-long.687/ Abstract: Given a language model (LM), maximum probability is a poor decoding objective for open-ended generation, because it produces short and repetitive text. On the other hand, sampling can often produce incoherent text that drifts from the original topics. We propose contrastive decoding (CD), a reliable decoding approach that optimizes a contrastive objective subject to a plausibility constraint. The contrastive objective returns the difference between the likelihood under a large LM (called the expert, e.g. OPT-13B) and a small LM (called the amateur, e.g. OPT-125M), and the constraint ensures that the outputs are plausible. CD is inspired by the fact that the failures of larger LMs (e.g., repetition, inco- herence) are even more prevalent in smaller LMs, and that this difference signals which texts should be preferred. CD requires zero additional training, and produces higher quality text than decoding from the larger LM alone. It also works across model scales (OPT-13B and GPT2-1.5B) and significantly outperforms four strong decoding algorithms (e.g., nucleus, top-k) in automatic and human evaluations across wikipedia, news and story domains. Best, Yi-Chien -------------- next part -------------- An HTML attachment was scrubbed... URL: From oh.531 at buckeyemail.osu.edu Fri Mar 22 16:44:33 2024 From: oh.531 at buckeyemail.osu.edu (Oh, Byung-Doh) Date: Fri, 22 Mar 2024 20:44:33 +0000 Subject: [CaCL] Venue of interest? (free registration) Message-ID: Highlights in the Language Sciences Conference 2024 https://hils2024.nl The Language in Interaction Consortium (LiI) organizes the Highlights in the Language Sciences Conference 2024, celebrating the conclusion of our 10-year Gravitation Programme and the advances made in language-related disciplines including genetics, neuroscience, psychology, linguistics and computational modeling. The conference will take place 8-11 July 2024 at the Radboud University in Nijmegen. We are putting together an exciting programme with top-level key experts in the relevant fields of research. Confirmed speakers include David Poeppel (NYU, Strungmann Institute, Frankfurt), Ghislaine Dehaene-Lambertz (CNRS Paris), Vera Demberg (Universit?t des Saarlandes), Uri Hasson (University of Princeton), Barbara Kaup (University T?bingen), Tal Linzen (NYU). Registration and attendance of the conference is free of charge, submission for poster abstracts and registration is now open. ================= Byung-Doh Oh (he/him/his) Ph.D. Candidate Department of Linguistics The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From schuler.77 at osu.edu Fri Mar 22 20:15:55 2024 From: schuler.77 at osu.edu (Schuler, William) Date: Sat, 23 Mar 2024 00:15:55 +0000 Subject: [CaCL] Venue of interest? (free registration) In-Reply-To: References: Message-ID: Yes, that?s certainly a credible speaker list! Get Outlook for iOS ________________________________ From: CaCL on behalf of Oh, Byung-Doh via CaCL Sent: Friday, March 22, 2024 4:44:33 PM To: cacl at lists.osu.edu Subject: [CaCL] Venue of interest? (free registration) Highlights in the Language Sciences Conference 2024 https://hils2024.nl The Language in Interaction Consortium (LiI) organizes the Highlights in the Language Sciences Conference 2024, celebrating the conclusion of our 10-year Gravitation Programme and the advances made in language-related disciplines including genetics, neuroscience, psychology, linguistics and computational modeling. The conference will take place 8-11 July 2024 at the Radboud University in Nijmegen. We are putting together an exciting programme with top-level key experts in the relevant fields of research. Confirmed speakers include David Poeppel (NYU, Strungmann Institute, Frankfurt), Ghislaine Dehaene-Lambertz (CNRS Paris), Vera Demberg (Universit?t des Saarlandes), Uri Hasson (University of Princeton), Barbara Kaup (University T?bingen), Tal Linzen (NYU). Registration and attendance of the conference is free of charge, submission for poster abstracts and registration is now open. ================= Byung-Doh Oh (he/him/his) Ph.D. Candidate Department of Linguistics The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From lin.4434 at buckeyemail.osu.edu Sun Mar 24 16:41:05 2024 From: lin.4434 at buckeyemail.osu.edu (Lin, Yi Chien) Date: Sun, 24 Mar 2024 20:41:05 +0000 Subject: [CaCL] Reading for 3/28: Contrastive Decoding: Open-ended Text Generation as Optimization In-Reply-To: References: Message-ID: Hi all, This is a friendly reminder that the paper we will discuss this week (3/28) is ?Contrastive Decoding: Open-ended Text Generation as Optimization? (Li et al., 2023). Please find the abstract and the link to the paper in the following. Best, Yi-Chien ???: CaCL ?? Lin, Yi Chien via CaCL ??: ???, 2024?3?7? ??2:33 ???: cacl at lists.osu.edu ??: [CaCL] Reading for 3/28: Contrastive Decoding: Open-ended Text Generation as Optimization Hi All, CaCL will not meet next week (3/14) and the week after (3/21). Our next meeting will be on 3/28 ? we will be discussing ?Contrastive Decoding: Open-ended Text Generation as Optimization? (Li et al., 2023). Paper: https://aclanthology.org/2023.acl-long.687/ Abstract: Given a language model (LM), maximum probability is a poor decoding objective for open-ended generation, because it produces short and repetitive text. On the other hand, sampling can often produce incoherent text that drifts from the original topics. We propose contrastive decoding (CD), a reliable decoding approach that optimizes a contrastive objective subject to a plausibility constraint. The contrastive objective returns the difference between the likelihood under a large LM (called the expert, e.g. OPT-13B) and a small LM (called the amateur, e.g. OPT-125M), and the constraint ensures that the outputs are plausible. CD is inspired by the fact that the failures of larger LMs (e.g., repetition, inco- herence) are even more prevalent in smaller LMs, and that this difference signals which texts should be preferred. CD requires zero additional training, and produces higher quality text than decoding from the larger LM alone. It also works across model scales (OPT-13B and GPT2-1.5B) and significantly outperforms four strong decoding algorithms (e.g., nucleus, top-k) in automatic and human evaluations across wikipedia, news and story domains. Best, Yi-Chien -------------- next part -------------- An HTML attachment was scrubbed... URL: From clark.3664 at buckeyemail.osu.edu Thu Mar 28 16:14:32 2024 From: clark.3664 at buckeyemail.osu.edu (Clark, Christian) Date: Thu, 28 Mar 2024 20:14:32 +0000 Subject: [CaCL] Reading for 4/4 Message-ID: Hi CaCL members, On 4/4 we will discuss "Pushdown Layers: Encoding Recursive Structure in Transformer Language Models " by Murty et al. (2023). Paper: https://aclanthology.org/2023.emnlp-main.195.pdf Abstract: Recursion is a prominent feature of human language, and fundamentally challenging for self-attention due to the lack of an explicit recursive-state tracking mechanism. Consequently, Transformer language models poorly capture long-tail recursive structure and exhibit sample-inefficient syntactic generalization. This work introduces Pushdown Layers, a new self-attention layer that models recursive state via a stack tape that tracks estimated depths of every token in an incremental parse of the observed prefix. Transformer LMs with Pushdown Layers are syntactic language models that autoregressively and synchronously update this stack tape as they predict new tokens, in turn using the stack tape to softly modulate attention over tokens?for instance, learning to ?skip? over closed constituents. When trained on a corpus of strings annotated with silver constituency parses, Transformers equipped with Pushdown Layers achieve dramatically better and 3-5x more sample-efficient syntactic generalization, while maintaining similar perplexities. Pushdown Layers are a drop-in replacement for standard self-attention. We illustrate this by finetuning GPT2-medium with Pushdown Layers on an automatically parsed WikiText-103, leading to improvements on several GLUE text classification tasks. ---- Christian Clark Ph.D. Student Department of Linguistics The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: