From clark.3664 at buckeyemail.osu.edu Wed Feb 1 21:38:47 2023 From: clark.3664 at buckeyemail.osu.edu (Clark, Christian) Date: Thu, 2 Feb 2023 02:38:47 +0000 Subject: [CaCL] babyLM shared task Message-ID: The new BabyLM shared task might of interest to people on this list. Call for papers: https://arxiv.org/pdf/2301.11796.pdf Official site: https://babylm.github.io/ Description: We present the call for papers for the BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus. This shared task is intended for participants with an interest in small scale language modeling, human language acquisition, low-resource NLP, and cognitive modeling. In partnership with CoNLL and CMCL, we provide a platform for approaches to pretraining with a limited-size corpus sourced from data inspired by the input to children. The task has three tracks, two of which restrict the training data to pre-released datasets of 10M and 100M words and are dedicated to explorations of approaches such as architectural variations, self-supervised objectives, or curriculum learning. The final track only restricts the amount of text used, allowing innovation in the choice of the data, its domain, and even its modality (i.e., data from sources other than text is welcome). We will release a shared evaluation pipeline which scores models on a variety of benchmarks and tasks, including targeted syntactic evaluations and natural language understanding. ---- Christian Clark Ph.D. Student Department of Linguistics The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From schuler.77 at osu.edu Tue Feb 7 19:17:40 2023 From: schuler.77 at osu.edu (Schuler, William) Date: Wed, 8 Feb 2023 00:17:40 +0000 Subject: [CaCL] reading for cacl Message-ID: Hi all, Let's read this for thursday -- an introduction to CCG: https://homepages.inf.ed.ac.uk/steedman/papers/ccg/moravcsik2.pdf wm -------------- next part -------------- An HTML attachment was scrubbed... URL: From schuler.77 at osu.edu Wed Feb 8 20:00:12 2023 From: schuler.77 at osu.edu (Schuler, William) Date: Thu, 9 Feb 2023 01:00:12 +0000 Subject: [CaCL] Fwd: [Ling-Faculty] FW: [Folli] lecturer in Computational Linguistics in UCL In-Reply-To: <1E8CA3AD-A3BD-4457-AA7B-F2566FC44BEA@osu.edu> References: <1E8CA3AD-A3BD-4457-AA7B-F2566FC44BEA@osu.edu> Message-ID: may be of interest ________________________________ From: Ling-Faculty on behalf of Levine, Robert via Ling-Faculty Sent: Thursday, January 26, 2023 7:52 AM To: _ASC LING lingac Cc: Levine, Robert Subject: [Ling-Faculty] FW: [Folli] lecturer in Computational Linguistics in UCL FYI. --- Bob Dear colleagues, The Linguistics department of University College London (UCL) is recruiting a lecturer (permanent academic post) in Computational Linguistics. For more details of the position, see here: https:?//www.?ucl.?ac.?uk/work-at-ucl/search-ucl-jobs/details?jobId=3786&jobTitle=Lecturer%20in%20Linguistics%20(Computational%20Linguistics) ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. Report Suspicious ? ZjQcmQRYFpfptBannerEnd Dear colleagues, The Linguistics department of University College London (UCL) is recruiting a lecturer (permanent academic post) in Computational Linguistics. For more details of the position, see here: https://www.ucl.ac.uk/work-at-ucl/search-ucl-jobs/details?jobId=3786&jobTitle=Lecturer%20in%20Linguistics%20(Computational%20Linguistics) Best, Mehrnoosh -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ATT00001.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ATT00001.txt URL: From schuler.77 at osu.edu Mon Feb 13 20:54:37 2023 From: schuler.77 at osu.edu (Schuler, William) Date: Tue, 14 Feb 2023 01:54:37 +0000 Subject: [CaCL] paper for thursday on TAG Message-ID: Hi all, Let's read this for thursday: https://repository.upenn.edu/cgi/viewcontent.cgi?article=1463&context=cis_reports wm -------------- next part -------------- An HTML attachment was scrubbed... URL: From oh.531 at buckeyemail.osu.edu Fri Feb 17 10:32:29 2023 From: oh.531 at buckeyemail.osu.edu (Oh, Byung-Doh) Date: Fri, 17 Feb 2023 15:32:29 +0000 Subject: [CaCL] CaCL 2/23: Intro to Combinatory Categorial Grammar part 2 Message-ID: Hi everyone, Next week, we will continue our discussion of CCG based on Steedman (2022): https://homepages.inf.ed.ac.uk/steedman/papers/ccg/moravcsik2.pdf. Best, Byung-Doh ================= Byung-Doh Oh (he/him/his) Ph.D. Student Department of Linguistics The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From clark.3664 at buckeyemail.osu.edu Thu Feb 23 14:17:56 2023 From: clark.3664 at buckeyemail.osu.edu (Clark, Christian) Date: Thu, 23 Feb 2023 19:17:56 +0000 Subject: [CaCL] Reading for 3/2 Message-ID: Hi everyone, In next week's CaCL meeting we will discuss Meister and Cotterell 2021. Title: Language Model Evaluation Beyond Perplexity Link: https://aclanthology.org/2021.acl-long.414.pdf Abstract: We propose an alternate approach to quantifying how well language models learn natural language: we ask how well they match the statistical tendencies of natural language. To answer this question, we analyze whether text generated from language models exhibits the statistical tendencies present in the human-generated text on which they were trained. We provide a framework?paired with significance tests?for evaluating the fit of language models to these trends. We find that neural language models appear to learn only a subset of the tendencies considered, but align much more closely with empirical trends than proposed theoretical distributions (when present). Further, the fit to different distributions is highly-dependent on both model architecture and generation strategy. As concrete examples, text generated under the nucleus sampling scheme adheres more closely to the type?token relationship of natural language than text produced using standard ancestral sampling; text from LSTMs reflects the natural language distributions over length, stopwords, and symbols surprisingly well. ---- Christian Clark Ph.D. Student Department of Linguistics The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: