[CaCL] Tuesday's CaCL

Fri May 31 08:48:46 EDT 2019

Read it! Do it!

A Probabilistic Generative Model of Linguistic Typology
Johannes Bjerva<https://arxiv.org/search/cs?searchtype=author&query=Bjerva%2C+J>, Yova Kementchedjhieva<https://arxiv.org/search/cs?searchtype=author&query=Kementchedjhieva%2C+Y>, Ryan Cotterell<https://arxiv.org/search/cs?searchtype=author&query=Cotterell%2C+R>, Isabelle Augenstein<https://arxiv.org/search/cs?searchtype=author&query=Augenstein%2C+I>
(Submitted on 26 Mar 2019 (v1<https://arxiv.org/abs/1903.10950v1>), last revised 15 May 2019 (this version, v3))
In the principles-and-parameters framework, the structural features of languages depend on parameters that may be toggled on or off, with a single parameter often dictating the status of multiple features. The implied covariance between features inspires our probabilisation of this line of linguistic inquiry---we develop a generative model of language based on exponential-family matrix factorisation. By modelling all languages and features within the same architecture, we show how structural similarities between languages can be exploited to predict typological features with near-perfect accuracy, outperforming several baselines on the task of predicting held-out features. Furthermore, we show that language embeddings pre-trained on monolingual text allow for generalisation to unobserved languages. This finding has clear practical and also theoretical implications: the results confirm what linguists have hypothesised, i.e.~that there are significant correlations between typological features and languages.
https://arxiv.org/abs/1903.10950
A Probabilistic Generative Model of Linguistic Typology<https://arxiv.org/abs/1903.10950>
In the principles-and-parameters framework, the structural features of languages depend on parameters that may be toggled on or off, with a single parameter often dictating the status of multiple features. The implied covariance between features inspires our probabilisation of this line of linguistic inquiry---we develop a generative model of language based on exponential-family matrix factorisation. By modelling all languages and features within the same architecture, we show how structural similarities between languages can be exploited to predict typological features with near-perfect accuracy, outperforming several baselines on the task of predicting held-out features. Furthermore, we show that language embeddings pre-trained on monolingual text allow for generalisation to unobserved languages. This finding has clear practical and also theoretical implications: the results confirm what linguists have hypothesised, i.e.~that there are significant correlations between typological features and languages.
arxiv.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/cacl/attachments/20190531/1c053048/attachment.html>