Weakly-Supervised Grammar-Informed Bayesian CCG Parser Learning

Dan Garrette; Chris Dyer; Jason Baldridge; Noah Smith

doi:10.1609/aaai.v29i1.9516

Authors

Dan Garrette University of Texas at Austin
Chris Dyer Carnegie Mellon University
Jason Baldridge University of Texas at Austin
Noah Smith Carnegie Mellon University

DOI:

https://doi.org/10.1609/aaai.v29i1.9516

Keywords:

natural language processing, machine learning, computational linguistics, parsing, grammar, ccg, combinatory categorial grammar

Abstract

Combinatory Categorial Grammar (CCG) is a lexicalized grammar formalism in which words are associated with categories that, in combination with a small universal set of rules, specify the syntactic configurations in which they may occur. Categories are selected from a large, recursively-defined set; this leads to high word-to-category ambiguity, which is one of the primary factors that make learning CCG parsers difficult, especially in the face of little data. Previous work has shown that learning sequence models for CCG tagging can be improved by using linguistically-motivated prior probability distributions over potential categories. We extend this approach to the task of learning a CCG parser from weak supervision. We present a Bayesian formulation for CCG parser induction that assumes only supervision in the form of an incomplete tag dictionary mapping some word types to sets of potential categories. Our approach outperforms a baseline model trained with uniform priors by exploiting universal, intrinsic properties of the CCG formalism to bias the model toward simpler, more cross-linguistically common categories.

Weakly-Supervised Grammar-Informed Bayesian CCG Parser Learning

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information