Detecting Information-Dense Texts in Multiple News Domains


  • Yinfei Yang Amazon Inc.
  • Ani Nenkova University of Pennsylvania



Summarization, Lead informativeness prediction, news analysis


We introduce the task of identifying information-dense texts,which report important factual information in direct, succinct manner. We describe a procedure that allows us to label automatically a large training corpus of New York Times texts.We train a classifier based on lexical, discourse and unlexicalized syntactic features and test its performance on a set of manually annotated articles from business, U.S. international relations, sports and science domains. Our results indicate that the task is feasible and that both syntactic and lexicalfeatures are highly predictive for the distinction. We observe considerable variation of prediction accuracy across domains and find that domain-specific models are more accurate.




How to Cite

Yang, Y., & Nenkova, A. (2014). Detecting Information-Dense Texts in Multiple News Domains. Proceedings of the AAAI Conference on Artificial Intelligence, 28(1).