From clark.3664 at buckeyemail.osu.edu  Sun Sep  4 17:02:33 2022
From: clark.3664 at buckeyemail.osu.edu (Clark, Christian)
Date: Sun, 4 Sep 2022 21:02:33 +0000
Subject: [CaCL]  Reading for 9/8
Message-ID: <BL0PR01MB461196418C7AA22B39103E90D67C9@BL0PR01MB4611.prod.exchangelabs.com>

Hi CaCLers,

Our reading for this Thursday (9/8) is Caucheteux et al. (2021).

Link: https://arxiv.org/pdf/2103.01620.pdf

Title: Disentangling Syntax and Semantics in the Brain with Deep Networks

Abstract:
The activations of language transformers like GPT-2 have been shown to linearly map onto brain activity during speech comprehension. However, the nature of these activations remains largely unknown and presumably conflate distinct linguistic classes. Here, we propose a taxonomy to factorize the high-dimensional activations of language models into four combinatorial classes: lexical, compositional, syntactic, and semantic representations. We then introduce a statistical method to decompose, through the lens of GPT-2's activations, the brain activity of 345 subjects recorded with functional magnetic resonance imaging (fMRI) during the listening of ~4.6 hours of narrated text. The results highlight two findings. First, compositional representations recruit a more widespread cortical network than lexical ones, and encompass the bilateral temporal, parietal and prefrontal cortices. Second, contrary to previous claims, syntax and semantics are not associated with separated modules, but, instead, appear to share a common and distributed neural substrate. Overall, this study introduces a versatile framework to isolate, in the brain activity, the distributed representations of linguistic constructs.

----
Christian Clark
Ph.D. Student
Department of Linguistics
The Ohio State University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/cacl/attachments/20220904/647fc70c/attachment.html>

From oh.531 at buckeyemail.osu.edu  Thu Sep  8 14:09:21 2022
From: oh.531 at buckeyemail.osu.edu (Oh, Byung-Doh)
Date: Thu, 8 Sep 2022 18:09:21 +0000
Subject: [CaCL] CaCL 9/15: Shared computational principles for language
 processing in humans and deep language models
Message-ID: <CH2PR01MB5717C8AEAB08E32E4AFE9FA696409@CH2PR01MB5717.prod.exchangelabs.com>

Hi everyone,

Next week, we'll be reading the following article by Goldstein et al. (2022):

Shared computational principles for language processing in humans and deep language models
https://www.nature.com/articles/s41593-022-01026-4
Departing from traditional linguistic models, advances in deep learning have resulted in a new type of predictive (autoregressive) deep language models (DLMs). Using a self-supervised next-word prediction task, these models generate appropriate linguistic responses in a given context. In the current study, nine participants listened to a 30-min podcast while their brain responses were recorded using electrocorticography (ECoG). We provide empirical evidence that the human brain and autoregressive DLMs share three fundamental computational principles as they process the same natural narrative: (1) both are engaged in continuous next-word prediction before word onset; (2) both match their pre-onset predictions to the incoming word to calculate post-onset surprise; (3) both rely on contextual embeddings to represent words in natural contexts. Together, our findings suggest that autoregressive DLMs provide a new and biologically feasible computational framework for studying the neural basis of language.

Best,
Byung-Doh

=================
Byung-Doh Oh (he/him/his)
Ph.D. Student
Department of Linguistics
The Ohio State University

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/cacl/attachments/20220908/754271e5/attachment.html>

From cheung.179 at buckeyemail.osu.edu  Wed Sep 21 11:28:01 2022
From: cheung.179 at buckeyemail.osu.edu (Cheung, Willy)
Date: Wed, 21 Sep 2022 15:28:01 +0000
Subject: [CaCL] paper for Thursday 9/22 (tomorrow)
Message-ID: <CH0PR01MB71383714947AC1877B629EFE8B4F9@CH0PR01MB7138.prod.exchangelabs.com>


Hi CaCLers,

Sorry for the super late post of the paper - this Thursday we will discuss Tran et al 2022

Title: PLEX: Towards Reliability Using Pretrained Large Model Extensions

link: https://arxiv.org/pdf/2207.07411.pdf

Abstract: A recent trend in artificial intelligence is the use of pretrained models for language and vision tasks, which have achieved extraordinary performance but also puzzling failures. Probing these models? abilities in diverse ways is therefore critical to the field. In this paper, we explore the reliability of models, where we define a reliable model as one that not only achieves strong predictive performance but also performs well consistently over many decision-making tasks involving uncertainty (e.g., selective prediction, open set recognition), robust generalization (e.g., accuracy and proper scoring rules such as log-likelihood on inand out-of-distribution datasets), and adaptation (e.g., active learning, few-shot uncertainty). We devise 10 types of tasks over 40 datasets in order to evaluate different aspects of reliability on both vision and language domains. To improve reliability, we developed ViT-Plex and T5-Plex, pretrained large model ex tensions (plex) for vision and language modalities, respectively. Plex greatly improves the state-of-the-art across reliability tasks, and simplifies the traditional protocol as it improves the out-of-the-box performance and does not require designing scores or tuning the model for each task. We demonstrate scaling effects over model sizes up to 1B parameters and pretraining dataset sizes up to 4B examples. We also demonstrate Plex?s capabilities on challenging tasks including zero-shot open set recognition, active learning, and uncertainty in conversational language understanding.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/cacl/attachments/20220921/044b9d60/attachment.html>

From clark.3664 at buckeyemail.osu.edu  Thu Sep 22 14:09:29 2022
From: clark.3664 at buckeyemail.osu.edu (Clark, Christian)
Date: Thu, 22 Sep 2022 18:09:29 +0000
Subject: [CaCL]  Reading for 9/29
Message-ID: <BL0PR01MB4611894D5B55C10C39EF9FB8D64E9@BL0PR01MB4611.prod.exchangelabs.com>

Hi CaCLers,

Our reading for next Thursday (9/29) will be Aarohi Srivastava et al. (2022).

Link: https://arxiv.org/pdf/2206.04615.pdf

Title: Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Abstract:
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 442 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.

----
Christian Clark
Ph.D. Student
Department of Linguistics
The Ohio State University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/cacl/attachments/20220922/6125823f/attachment.html>

From oh.531 at buckeyemail.osu.edu  Thu Sep 29 14:07:21 2022
From: oh.531 at buckeyemail.osu.edu (Oh, Byung-Doh)
Date: Thu, 29 Sep 2022 18:07:21 +0000
Subject: [CaCL] 10/6: LLM.int8(): 8-bit Matrix Multiplication for
 Transformers at Scale
Message-ID: <CH2PR01MB5717EB6C11E461DA42B244C196579@CH2PR01MB5717.prod.exchangelabs.com>

Hello everyone,

Next week we'll be discussing the following paper by Dettmers et al. (2022).

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
https://arxiv.org/pdf/2208.07339.pdf
Large language models have been widely adopted but require significant GPU memory for inference. We develop a procedure for Int8 matrix multiplication for feed-forward and attention projection layers in transformers, which cut the memory needed for inference by half while retaining full precision performance. With our method, a 175B parameter 16/32-bit checkpoint can be loaded, converted to Int8, and used immediately without performance degradation. This is made possible by understanding and working around properties of highly systematic emergent features in transformer language models that dominate attention and transformer predictive performance. To cope with these features, we develop a two-part quantization procedure, LLM.int8(). We first use vector-wise quantization with separate normalization constants for each inner product in the matrix multiplication, to quantize most of the features. However, for the emergent outliers, we also include a new mixed-precision decomposition scheme, which isolates the outlier feature dimensions into a 16-bit matrix multiplication while still more than 99.9% of values are multiplied in 8-bit. Using LLM.int8(), we show empirically it is possible to perform inference in LLMs with up to 175B parameters without any performance degradation. This result makes such models much more accessible, for example making it possible to use OPT-175B/BLOOM on a single server with consumer GPUs. We open source our software.

Best,
Byung-Doh

=================
Byung-Doh Oh (he/him/his)
Ph.D. Student
Department of Linguistics
The Ohio State University

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/cacl/attachments/20220929/d0251bc7/attachment.html>