<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>

</head>

<body dir="ltr">

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof ContentPasted0">

Hi all, <br>

<br>

For this week in Cacl, I'll be leading discussion on Niu and Penn 2020 -  <a href="https://aclanthology.org/2020.eval4nlp-1.11/" id="LPlnkOWALinkPreview">https://aclanthology.org/2020.eval4nlp-1.11/</a><br>

<div class="_Entity _EType_OWALinkPreview _EId_OWALinkPreview _EReadonly_1">

<div id="LPBorder_GTaHR0cHM6Ly9hY2xhbnRob2xvZ3kub3JnLzIwMjAuZXZhbDRubHAtMS4xMS8." class="LPBorder118321" style="width: 100%; margin-top: 16px; margin-bottom: 16px; position: relative; max-width: 800px; min-width: 424px;">

<table id="LPContainer118321" role="presentation" style="padding: 12px 36px 12px 12px; width: 100%; border-width: 1px; border-style: solid; border-color: rgb(200, 200, 200); border-radius: 2px;">

<tbody>

<tr valign="top" style="border-spacing: 0px;">

<td>

<div id="LPImageContainer118321" style="position: relative; margin-right: 12px; height: 160px; overflow: hidden;">

<a target="_blank" id="LPImageAnchor118321" href="https://aclanthology.org/2020.eval4nlp-1.11/"><img id="LPThumbnailImageId118321" alt="" height="160" style="display: block;" width="160" src="https://aclanthology.org/thumb/2020.eval4nlp-1.11.jpg"></a></div>

</td>

<td style="width: 100%;">

<div id="LPTitle118321" style="font-size: 21px; font-weight: 300; margin-right: 8px; font-family: wf_segoe-ui_light, "Segoe UI Light", "Segoe WP Light", "Segoe UI", "Segoe WP", Tahoma, Arial, sans-serif; margin-bottom: 12px;">

<a target="_blank" id="LPUrlAnchor118321" href="https://aclanthology.org/2020.eval4nlp-1.11/" style="text-decoration: none; color: var(--themePrimary);">Grammaticality and Language Modelling - ACL Anthology</a></div>

<div id="LPDescription118321" style="font-size: 14px; max-height: 100px; color: rgb(102, 102, 102); font-family: wf_segoe-ui_normal, "Segoe UI", "Segoe WP", Tahoma, Arial, sans-serif; margin-bottom: 12px; margin-right: 8px; overflow: hidden;">

Abstract Ever since Pereira (2000) provided evidence against Chomsky’s (1957) conjecture that statistical language modelling is incommensurable with the aims of grammaticality prediction as a research enterprise, a new area of research has emerged that regards

 statistical language models as “psycholinguistic subjects” and probes their ability to acquire syntactic knowledge.</div>

<div id="LPMetadata118321" style="font-size: 14px; font-weight: 400; color: rgb(166, 166, 166); font-family: wf_segoe-ui_normal, "Segoe UI", "Segoe WP", Tahoma, Arial, sans-serif;">

aclanthology.org</div>

</td>

</tr>

</tbody>

</table>

<div id="LPCloseButtonContainer118321" class="uHkAz" tabindex="0" title="Remove link preview" role="button">

<i data-icon-name="Cancel" aria-hidden="true" id="LPCloseButton118321" class="Rm9Q1 root-200"></i></div>

</div>

</div>

<div class="card bg-light mb-2 mb-lg-3" style="box-sizing:border-box;display:flex;flex-direction:column;min-width:0px;overflow-wrap:break-word;background-color:rgb(248, 249, 250) !important;background-clip:border-box;border:1px solid rgba(0, 0, 0, 0.125);border-radius:0.25rem;margin-bottom:1rem !important;color:rgb(33, 37, 41);font-family:-apple-system, "system-ui", "segoe ui", Roboto, "helvetica neue", Arial, "noto sans", sans-serif, "apple color emoji", "segoe ui emoji", "segoe ui symbol", "noto color emoji";text-align:left">

<span class="card-body acl-abstract" style="box-sizing:border-box;flex:1 1 auto;padding:1.25rem"><span style="box-sizing:border-box" class="ContentPasted1">Ever since Pereira (2000) provided evidence against Chomsky’s (1957) conjecture that statistical language

 modelling is incommensurable with the aims of grammaticality prediction as a research enterprise, a new area of research has emerged that regards statistical language models as “psycholinguistic subjects” and probes their ability to acquire syntactic knowledge.

 The advent of The Corpus of Linguistic Acceptability (CoLA) (Warstadt et al., 2019) has earned a spot on the leaderboard for acceptability judgements, and the polemic between Lau et al. (2017) and Sprouse et al. (2018) has raised fundamental questions about

 the nature of grammaticality and how acceptability judgements should be elicited. All the while, we are told that neural language models continue to improve. That is not an easy claim to test at present, however, because there is almost no agreement on how

 to measure their improvement when it comes to grammaticality and acceptability judgements. The GLUE leaderboard bundles CoLA together with a Matthews correlation coefficient (MCC), although probably because CoLA’s seminal publication was using it to compute

 inter-rater reliabilities. Researchers working in this area have used other accuracy and correlation scores, often driven by a need to reconcile and compare various discrete and continuous variables with each other. The score that we will advocate for in this

 paper, the point biserial correlation, in fact compares a discrete variable (for us, acceptability judgements) to a continuous variable (for us, neural language model probabilities). The only previous work in this area to choose the PBC that we are aware of

 is Sprouse et al. (2018a), and that paper actually applied it backwards (with some justification) so that the language model probability was treated as the discrete binary variable by setting a threshold. With the PBC in mind, we will first reappraise some

 recent work in syntactically targeted linguistic evaluations (Hu et al., 2020), arguing that while their experimental design sets a new high watermark for this topic, their results may not prove what they have claimed. We then turn to the task-independent

 assessment of language models as grammaticality classifiers. Prior to the introduction of the GLUE leaderboard, the vast majority of this assessment was essentially anecdotal, and we find the use of the MCC in this regard to be problematic. We conduct several

 studies with PBCs to compare several popular language models. We also study the effects of several variables such as normalization and data homogeneity on PBC.</span></span></div>

<br class="ContentPasted1">

<dl style="box-sizing:border-box;margin-top:0px;margin-bottom:1rem;color:rgb(33, 37, 41);font-family:-apple-system, "system-ui", "segoe ui", Roboto, "helvetica neue", Arial, "noto sans", sans-serif, "apple color emoji", "segoe ui emoji", "segoe ui symbol", "noto color emoji";text-align:left;background-color:rgb(255, 255, 255)">

<br class="Apple-interchange-newline ContentPasted1">

</dl>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof ContentPasted0">

<br>

</div>

<br>

<br>

</div>

</body>

</html>