Detecting Information-Dense Texts in Multiple News Domains

Yinfei Yang; Ani Nenkova

doi:10.1609/aaai.v28i1.8943

Detecting Information-Dense Texts in Multiple News Domains

Authors

Yinfei Yang Amazon Inc.
Ani Nenkova University of Pennsylvania

DOI:

https://doi.org/10.1609/aaai.v28i1.8943

Keywords:

Summarization, Lead informativeness prediction, news analysis

Abstract

We introduce the task of identifying information-dense texts,which report important factual information in direct, succinct manner. We describe a procedure that allows us to label automatically a large training corpus of New York Times texts.We train a classifier based on lexical, discourse and unlexicalized syntactic features and test its performance on a set of manually annotated articles from business, U.S. international relations, sports and science domains. Our results indicate that the task is feasible and that both syntactic and lexicalfeatures are highly predictive for the distinction. We observe considerable variation of prediction accuracy across domains and find that domain-specific models are more accurate.

Downloads

Published

2014-06-21

How to Cite

Yang, Y., & Nenkova, A. (2014). Detecting Information-Dense Texts in Multiple News Domains. Proceedings of the AAAI Conference on Artificial Intelligence, 28(1). https://doi.org/10.1609/aaai.v28i1.8943

Download Citation

Issue

Vol. 28 No. 1 (2014): Twenty-Eighth AAAI Conference on Artificial Intelligence

Section

Main Track: NLP and Text Mining