From schuler.77 at osu.edu Mon Aug 21 15:57:17 2023 From: schuler.77 at osu.edu (Schuler, William) Date: Mon, 21 Aug 2023 19:57:17 +0000 Subject: [CaCL] Cognitive and Computational Approaches to Language (CaCL) Discussion Group: Thursdays 12:45 in Oxley 122 In-Reply-To: References: Message-ID: Hello all, The reading group on Cognitive and Computational Approaches to Language (CaCL) will meet Thursdays at 12:45 starting this week in Oxley 122. The link is accessible via the Carmen site, and requires name.# authentication. If you do not want to enroll, but would still like to attend, please send me email to either make you a Carmen guest or send you a Zoom invitation. Topics include: broad-coverage computational models of sentence processing in human memory, computational models of human memory, statistical modeling for human linguistic performance data, neural sentence processing models, Bayesian and neural grammar induction models, experimental techniques in neurolinguistics and brain imaging, and much more. The first session will be an organizational meeting, during which we will vote on papers to discuss (feel free to suggest any papers you'd like to discuss!). We will read papers and give software tutorials and practice talks all semester. Please join us! Also, if you are a student this summer, and you are planning to attend, please enroll for 1-3 credits: CaCL is listed as "LING 7890.12?. You can also sign up for the CaCL mailing list at https://lists.osu.edu/mailman/listinfo/cacl, which continues to serve the regular CaCL reading group throughout the academic year. Hope to see you there! William -------------- next part -------------- An HTML attachment was scrubbed... URL: From clark.3664 at buckeyemail.osu.edu Thu Aug 24 15:49:12 2023 From: clark.3664 at buckeyemail.osu.edu (Clark, Christian) Date: Thu, 24 Aug 2023 19:49:12 +0000 Subject: [CaCL] Reading for 8/31 Message-ID: Hi CaCL members, In our meeting next week on 8/31 we will discuss "Meaning without reference in large language models" by Piantadosi and Hill (2023). Zoom: https://osu.zoom.us/j/93495474520?pwd=NWFRZzF6QnZrQWdheThxbWJyNjJ4dz09 Paper: https://arxiv.org/pdf/2208.02957.pdf Abstract: The widespread success of large language models (LLMs) has been met with skepticism that they possess anything like human concepts or meanings. Contrary to claims that LLMs possess no meaning whatsoever, we argue that they likely capture important aspects of meaning, and moreover work in a way that approximates a compelling account of human cognition in which meaning arises from conceptual role. Because conceptual role is defined by the relationships between internal representational states, meaning cannot be determined from a model?s architecture, training data, or objective function, but only by examination of how its internal states relate to each other. This approach may clarify why and how LLMs are so successful and suggest how they can be made more human-like. ---- Christian Clark Ph.D. Student Department of Linguistics The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From oh.531 at buckeyemail.osu.edu Mon Aug 28 04:42:40 2023 From: oh.531 at buckeyemail.osu.edu (Oh, Byung-Doh) Date: Mon, 28 Aug 2023 08:42:40 +0000 Subject: [CaCL] Reading for 8/31 In-Reply-To: References: Message-ID: I think this may be a good (optional) companion read to this week's paper: Do Language Models Refer? https://arxiv.org/pdf/2308.05576.pdf What do language models (LMs) do with language? Everyone agrees that they produce sequences of (mostly) coherent sentences. But are they saying anything with those strings or simply babbling in a convincing simulacrum of language use? This is a vague question, and there are many ways of making it precise. Here we will address one aspect of the question, namely, whether LMs? words refer: that is, whether the outputs of LMs achieve ?word-to-world? connections. There is prima facie reason to think they do not since LMs do not interact with the world in the way that ordinary language users do. Drawing on insights from the externalist tradition in philosophy of language, we argue that appearances are misleading and that there is good reason to think that LMs can refer. ================= Byung-Doh Oh (he/him/his) Ph.D. Candidate Department of Linguistics The Ohio State University ________________________________ From: CaCL on behalf of Clark, Christian via CaCL Sent: Thursday, August 24, 2023 3:49 PM To: Schuler, William via CaCL Subject: [CaCL] Reading for 8/31 Hi CaCL members, In our meeting next week on 8/31 we will discuss "Meaning without reference in large language models" by Piantadosi and Hill (2023). Zoom: https://osu.zoom.us/j/93495474520?pwd=NWFRZzF6QnZrQWdheThxbWJyNjJ4dz09 Paper: https://arxiv.org/pdf/2208.02957.pdf Abstract: The widespread success of large language models (LLMs) has been met with skepticism that they possess anything like human concepts or meanings. Contrary to claims that LLMs possess no meaning whatsoever, we argue that they likely capture important aspects of meaning, and moreover work in a way that approximates a compelling account of human cognition in which meaning arises from conceptual role. Because conceptual role is defined by the relationships between internal representational states, meaning cannot be determined from a model?s architecture, training data, or objective function, but only by examination of how its internal states relate to each other. This approach may clarify why and how LLMs are so successful and suggest how they can be made more human-like. ---- Christian Clark Ph.D. Student Department of Linguistics The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: From oh.531 at buckeyemail.osu.edu Thu Aug 31 14:32:57 2023 From: oh.531 at buckeyemail.osu.edu (Oh, Byung-Doh) Date: Thu, 31 Aug 2023 18:32:57 +0000 Subject: [CaCL] 9/7: Retentive Network: A Successor to Transformer for Large Language Models Message-ID: Hello everyone, Next week, we'll discuss the following paper on Retentive Network: Retentive Network: A Successor to Transformer for Large Language Models https://arxiv.org/pdf/2307.08621.pdf In this work, we propose Retentive Network (RETNET) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence and attention. Then we propose the retention mechanism for sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, and chunkwise recurrent. Specifically, the parallel representation allows for training parallelism. The recurrent representation enables low-cost O(1) inference, which improves decoding throughput, latency, and GPU memory without sacrificing performance. The chunkwise recurrent representation facilitates efficient long-sequence modeling with linear complexity, where each chunk is encoded parallelly while recurrently summarizing the chunks. Experimental results on language modeling show that RETNET achieves favorable scaling results, parallel training, low-cost deployment, and efficient inference. The intriguing properties make RETNET a strong successor to Transformer for large language models. Code will be available at https://aka.ms/retnet. Best, Byung-Doh ================= Byung-Doh Oh (he/him/his) Ph.D. Candidate Department of Linguistics The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: