Weakly-Supervised Grammar-Informed Bayesian CCG Parser Learning
Keywords:natural language processing, machine learning, computational linguistics, parsing, grammar, ccg, combinatory categorial grammar
Combinatory Categorial Grammar (CCG) is a lexicalized grammar formalism in which words are associated with categories that, in combination with a small universal set of rules, specify the syntactic configurations in which they may occur. Categories are selected from a large, recursively-defined set; this leads to high word-to-category ambiguity, which is one of the primary factors that make learning CCG parsers difficult, especially in the face of little data. Previous work has shown that learning sequence models for CCG tagging can be improved by using linguistically-motivated prior probability distributions over potential categories. We extend this approach to the task of learning a CCG parser from weak supervision. We present a Bayesian formulation for CCG parser induction that assumes only supervision in the form of an incomplete tag dictionary mapping some word types to sets of potential categories. Our approach outperforms a baseline model trained with uniform priors by exploiting universal, intrinsic properties of the CCG formalism to bias the model toward simpler, more cross-linguistically common categories.