From oh.531 at buckeyemail.osu.edu Fri Oct 7 10:22:50 2022 From: oh.531 at buckeyemail.osu.edu (Oh, Byung-Doh) Date: Fri, 7 Oct 2022 14:22:50 +0000 Subject: [CaCL] Next two meetings (10/13, 10/20) canceled Message-ID: Dear CaCLers, Next week (10/13), we will not meet due to Autumn Break. The following week (10/20), student members are encouraged to attend the graduate student lunch that is part of the department program review (12:30-1:15pm, Oxley 122). Best, Byung-Doh ================= Byung-Doh Oh (he/him/his) Ph.D. Student Department of Linguistics The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From cheung.179 at buckeyemail.osu.edu Mon Oct 24 19:43:14 2022 From: cheung.179 at buckeyemail.osu.edu (Cheung, Willy) Date: Mon, 24 Oct 2022 23:43:14 +0000 Subject: [CaCL] paper for 10/27 Message-ID: Hi all, For this week in Cacl, I'll be leading discussion on Niu and Penn 2020 - https://aclanthology.org/2020.eval4nlp-1.11/ [https://aclanthology.org/thumb/2020.eval4nlp-1.11.jpg] Grammaticality and Language Modelling - ACL Anthology Abstract Ever since Pereira (2000) provided evidence against Chomsky?s (1957) conjecture that statistical language modelling is incommensurable with the aims of grammaticality prediction as a research enterprise, a new area of research has emerged that regards statistical language models as ?psycholinguistic subjects? and probes their ability to acquire syntactic knowledge. aclanthology.org ? Ever since Pereira (2000) provided evidence against Chomsky?s (1957) conjecture that statistical language modelling is incommensurable with the aims of grammaticality prediction as a research enterprise, a new area of research has emerged that regards statistical language models as ?psycholinguistic subjects? and probes their ability to acquire syntactic knowledge. The advent of The Corpus of Linguistic Acceptability (CoLA) (Warstadt et al., 2019) has earned a spot on the leaderboard for acceptability judgements, and the polemic between Lau et al. (2017) and Sprouse et al. (2018) has raised fundamental questions about the nature of grammaticality and how acceptability judgements should be elicited. All the while, we are told that neural language models continue to improve. That is not an easy claim to test at present, however, because there is almost no agreement on how to measure their improvement when it comes to grammaticality and acceptability judgements. The GLUE leaderboard bundles CoLA together with a Matthews correlation coefficient (MCC), although probably because CoLA?s seminal publication was using it to compute inter-rater reliabilities. Researchers working in this area have used other accuracy and correlation scores, often driven by a need to reconcile and compare various discrete and continuous variables with each other. The score that we will advocate for in this paper, the point biserial correlation, in fact compares a discrete variable (for us, acceptability judgements) to a continuous variable (for us, neural language model probabilities). The only previous work in this area to choose the PBC that we are aware of is Sprouse et al. (2018a), and that paper actually applied it backwards (with some justification) so that the language model probability was treated as the discrete binary variable by setting a threshold. With the PBC in mind, we will first reappraise some recent work in syntactically targeted linguistic evaluations (Hu et al., 2020), arguing that while their experimental design sets a new high watermark for this topic, their results may not prove what they have claimed. We then turn to the task-independent assessment of language models as grammaticality classifiers. Prior to the introduction of the GLUE leaderboard, the vast majority of this assessment was essentially anecdotal, and we find the use of the MCC in this regard to be problematic. We conduct several studies with PBCs to compare several popular language models. We also study the effects of several variables such as normalization and data homogeneity on PBC. -------------- next part -------------- An HTML attachment was scrubbed... URL: From white.1240 at osu.edu Thu Oct 27 12:54:11 2022 From: white.1240 at osu.edu (White, Michael) Date: Thu, 27 Oct 2022 16:54:11 +0000 Subject: [CaCL] paper for 10/27 In-Reply-To: References: Message-ID: Is there a delay in starting the zoom for today?s meeting, or has the zoom link changed? -Mike From: CaCL on behalf of Cheung, Willy via CaCL Date: Monday, October 24, 2022 at 7:43 PM To: cacl at lists.osu.edu Subject: [CaCL] paper for 10/27 Hi all, For this week in Cacl, I'll be leading discussion on Niu and Penn 2020 - https://aclanthology.org/2020.eval4nlp-1.11/ [https://aclanthology.org/thumb/2020.eval4nlp-1.11.jpg] Grammaticality and Language Modelling - ACL Anthology Abstract Ever since Pereira (2000) provided evidence against Chomsky?s (1957) conjecture that statistical language modelling is incommensurable with the aims of grammaticality prediction as a research enterprise, a new area of research has emerged that regards statistical language models as ?psycholinguistic subjects? and probes their ability to acquire syntactic knowledge. aclanthology.org ? Ever since Pereira (2000) provided evidence against Chomsky?s (1957) conjecture that statistical language modelling is incommensurable with the aims of grammaticality prediction as a research enterprise, a new area of research has emerged that regards statistical language models as ?psycholinguistic subjects? and probes their ability to acquire syntactic knowledge. The advent of The Corpus of Linguistic Acceptability (CoLA) (Warstadt et al., 2019) has earned a spot on the leaderboard for acceptability judgements, and the polemic between Lau et al. (2017) and Sprouse et al. (2018) has raised fundamental questions about the nature of grammaticality and how acceptability judgements should be elicited. All the while, we are told that neural language models continue to improve. That is not an easy claim to test at present, however, because there is almost no agreement on how to measure their improvement when it comes to grammaticality and acceptability judgements. The GLUE leaderboard bundles CoLA together with a Matthews correlation coefficient (MCC), although probably because CoLA?s seminal publication was using it to compute inter-rater reliabilities. Researchers working in this area have used other accuracy and correlation scores, often driven by a need to reconcile and compare various discrete and continuous variables with each other. The score that we will advocate for in this paper, the point biserial correlation, in fact compares a discrete variable (for us, acceptability judgements) to a continuous variable (for us, neural language model probabilities). The only previous work in this area to choose the PBC that we are aware of is Sprouse et al. (2018a), and that paper actually applied it backwards (with some justification) so that the language model probability was treated as the discrete binary variable by setting a threshold. With the PBC in mind, we will first reappraise some recent work in syntactically targeted linguistic evaluations (Hu et al., 2020), arguing that while their experimental design sets a new high watermark for this topic, their results may not prove what they have claimed. We then turn to the task-independent assessment of language models as grammaticality classifiers. Prior to the introduction of the GLUE leaderboard, the vast majority of this assessment was essentially anecdotal, and we find the use of the MCC in this regard to be problematic. We conduct several studies with PBCs to compare several popular language models. We also study the effects of several variables such as normalization and data homogeneity on PBC. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lewis.2799 at buckeyemail.osu.edu Thu Oct 27 14:21:34 2022 From: lewis.2799 at buckeyemail.osu.edu (Lewis, Ash) Date: Thu, 27 Oct 2022 18:21:34 +0000 Subject: [CaCL] Paper for Next Week 11/3 Message-ID: Hi all, Next week I?ll be leading discussion on the following paper, linked here: Prefix-Tuning: Optimizing Continuous Prompts for Generation Li and Liang, 2021 Abstract: Fine-tuning is the de facto way of leveraging large pretrained language models for downstream tasks. However, fine-tuning modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen and instead optimizes a sequence of continuous task-specific vectors, which we call the prefix. Prefix-tuning draws inspiration from prompting for language models, allowing subsequent tokens to attend to this prefix as if it were ?virtual tokens?. We apply prefix-tuning to GPT-2 for table-to-text generation and to BART for summarization. We show that by modifying only 0.1% of the parameters, prefix-tuning obtains comparable performance in the full data setting, outperforms fine-tuning in low-data settings, and extrapolates better to examples with topics that are unseen during training. ***** See you all on Thursday! Ash Ash Lewis (she/her/hers) PhD Student, Department of Linguistics The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: