From schuler.77 at osu.edu Mon Jan 10 21:07:53 2022 From: schuler.77 at osu.edu (Schuler, William) Date: Tue, 11 Jan 2022 02:07:53 +0000 Subject: [CaCL] CaCL organizational meeting this week *on-line* Message-ID: Hello mailing-list CaCLers, Although it is officially in-person, CaCL will meet *exclusively on-line this week* using the below Zoom link (which is the same one accessible from the Carmen site, but is *different* from that used last semester as it is linked to the 7890.12 course). Also, as usual, CaCL will continue to support on-line participation for those who find it convenient using this same link (historically this has included most participants): https://osu.zoom.us/j/95536921111?pwd=TG9YdVZ0Wk45R2hCdHhTYk5ubkhIQT09 The first meeting will be organizational, during which we will vote on papers to discuss in the following weeks. Please add any papers you would like to discuss to the below paper selection form (which is also accessible from the Carmen site, and which is *different* than that used in the previous semester, because the owner of that other link has now graduated!): https://docs.google.com/spreadsheets/d/1_GQV39sobbik4kKS5tIuCeEuRGRLcGvfgxq4rJt_tOU/edit#gid=0 See you on Thursday! wm -------------- next part -------------- An HTML attachment was scrubbed... URL: From clark.3664 at buckeyemail.osu.edu Fri Jan 14 09:02:26 2022 From: clark.3664 at buckeyemail.osu.edu (Clark, Christian) Date: Fri, 14 Jan 2022 14:02:26 +0000 Subject: [CaCL] Reading for 1/20 Message-ID: Hi CaCLers, For our next meeting on 1/20, we will be discussing "On logical inference over brains, behaviour, and artificial neural networks" by Olivia Guest and Andrea Martin. Paper link: https://psyarxiv.com/tbmcg/ Zoom link: https://osu.zoom.us/j/95536921111?pwd=TG9YdVZ0Wk45R2hCdHhTYk5ubkhIQT09 Abstract: In the cognitive, computational, and neuro- sciences, we often reason about what models (viz., formal and/or computational) represent, learn, or "know", as well as what algorithm they instantiate. The putative goal of such reasoning is to generalize claims about the model in question to claims about the mind and brain. This reasoning process typically presents as inference about the representations, processes, or algorithms the human mind and brain instantiate. Such inference is often based on a model's performance on a task, and whether that performance approximates human behaviour or brain activity. The model in question is often an artificial neural network (ANN) model, though the problems we discuss are generalizable to all reasoning over models. Arguments typically take the form "the brain does what the ANN does because the ANN reproduced the pattern seen in brain activity" or "cognition works this way because the ANN learned to approximate task performance." Then, the argument concludes that models achieve this outcome by doing what people do or having the capacities people have. At first blush, this might appear as a form of modus ponens, a valid deductive logical inference rule. However, as we explain in this article, this is not the case, and thus, this form of argument eventually results in affirming the consequent ? a logical or inferential fallacy. We discuss what this means broadly for research in cognitive science, neuroscience, and psychology; what it means for models when they lose the ability to mediate between theory and data in a meaningful way; and what this means for the logic, the metatheoretical calculus, our fields deploy in high-level scientific inference. ---- Christian Clark Ph.D. Student Department of Linguistics The Ohio State University https://christian-clark.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From cheung.179 at buckeyemail.osu.edu Fri Jan 21 09:45:01 2022 From: cheung.179 at buckeyemail.osu.edu (Cheung, Willy) Date: Fri, 21 Jan 2022 14:45:01 +0000 Subject: [CaCL] Reading for 1/27 Message-ID: Hi CaCLers, For our next meeting on 1/27, we will be discussing "Analysis Methods in Neural Language Processing: A Survey" by Yonatan Belinkov and James Glass. Paper link: https://aclanthology.org/Q19-1004.pdf Analysis Methods in Neural Language Processing: A Survey Analysis Methods in Neural Language Processing: A Survey Yonatan Belinkov 1;2 and James Glass 1MIT Computer Science and Artificial Intelligence Laboratory 2Harvard School of Engineering and Applied Sciences Cambridge, MA, USA fbelinkov, glassg at mit.edu Abstract The field of natural language processing has aclanthology.org ? Zoom link: https://osu.zoom.us/j/95536921111?pwd=TG9YdVZ0Wk45R2hCdHhTYk5ubkhIQT09 Abstract: The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems. A plethora of new models have been proposed, many of which are thought to be opaque compared to their feature rich counterparts. This has led researchers to analyze, interpret, and evaluate neural networks in novel and more fine-grained ways. In this survey paper, we review analysis methods in neural language processing, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work. -------------- next part -------------- An HTML attachment was scrubbed... URL: From white.1240 at osu.edu Fri Jan 21 16:19:32 2022 From: white.1240 at osu.edu (White, Michael) Date: Fri, 21 Jan 2022 21:19:32 +0000 Subject: [CaCL] Reading for 1/27 In-Reply-To: References: Message-ID: There is also this more recent CL squib by Belinkov that I was the editor for; this is the early access version: https://direct.mit.edu/coli/article/doi/10.1162/coli_a_00422/107571/Probing-Classifiers-Promises-Shortcomings-and From: CaCL on behalf of Cheung, Willy via CaCL Date: Friday, January 21, 2022 at 9:45 AM To: cacl at lists.osu.edu Subject: [CaCL] Reading for 1/27 Hi CaCLers, For our next meeting on 1/27, we will be discussing "Analysis Methods in Neural Language Processing: A Survey" by Yonatan Belinkov and James Glass. Paper link: https://aclanthology.org/Q19-1004.pdf Analysis Methods in Neural Language Processing: A Survey Analysis Methods in Neural Language Processing: A Survey Yonatan Belinkov 1;2 and James Glass 1MIT Computer Science and Artificial Intelligence Laboratory 2Harvard School of Engineering and Applied Sciences Cambridge, MA, USA fbelinkov, glassg at mit.edu Abstract The field of natural language processing has aclanthology.org ? Zoom link: https://osu.zoom.us/j/95536921111?pwd=TG9YdVZ0Wk45R2hCdHhTYk5ubkhIQT09 Abstract: The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems. A plethora of new models have been proposed, many of which are thought to be opaque compared to their feature rich counterparts. This has led researchers to analyze, interpret, and evaluate neural networks in novel and more fine-grained ways. In this survey paper, we review analysis methods in neural language processing, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oh.531 at buckeyemail.osu.edu Thu Jan 27 14:15:37 2022 From: oh.531 at buckeyemail.osu.edu (Oh, Byung-Doh) Date: Thu, 27 Jan 2022 19:15:37 +0000 Subject: [CaCL] CaCL 2/3: A Mathematical Framework for Transformer Circuits Message-ID: Hello everyone, Next week we'll be discussing the following html-format paper. I had thought this was from OpenAI, but apparently it's from a group called Anthropic. A Mathematical Framework for Transformer Circuits (Elhage et al., 2021): https://transformer-circuits.pub/2021/framework/index.html REVERSE ENGINEERING RESULTS To explore the challenge of reverse engineering transformers, we reverse engineer several toy, attention-only models. In doing so we find: * Zero layer transformers model bigram statistics. The bigram table can be accessed directly from the weights. * One layer attention-only transformers are an ensemble of bigram and ?skip-trigram? (sequences of the form "A? B C") models. The bigram and skip-trigram tables can be accessed directly from the weights, without running the model. These skip-trigrams can be surprisingly expressive. This includes implementing a kind of very simple in-context learning. * Two layer attention-only transformers can implement much more complex algorithms using compositions of attention heads. These compositional algorithms can also be detected directly from the weights. Notably, two layer models use attention head composition to create ?induction heads?, a very general in-context learning algorithm. * One layer and two layer attention-only transformers use very different algorithms to perform in-context learning. Two layer attention heads use qualitatively more sophisticated inference-time algorithms ? in particular, a special type of attention head we call an induction head ? to perform in-context-learning, forming an important transition point that will be relevant for larger models. Best, Byung-Doh ================= Byung-Doh Oh (he/him/his) Ph.D. Student Department of Linguistics The Ohio State University -------------- next part -------------- An HTML attachment was scrubbed... URL: