Extracting Meta Statements from the Blogosphere
Information extraction systems have been recently proposed for organizing and exploring content in large online text corpora as information networks. In such networks, the nodes are named entities (e.g., people, organizations) while the edges correspond to statements indicating relations among such entities. To date, such systems extract rather primitive networks, capturing only those relations which are expressed by direct statements. In many applications, it is useful to also extract more subtle relations which are often expressed as meta statements in the text. These can, for instance provide the context for a statement (e.g., “Google acquired YouTube on October 2006”), or repercussion about a statement (e.g., “The US condemned Russia’s invasion of Georgia”). In this work, we report on a system for extracting relations expressed in both direct statements as well as in meta statements. We propose a method based on Conditional Random Fields that explores syntactic features to extract both kinds of statements seamlessly. We follow the Open Information Extraction paradigm, where a classifier is trained to recognize any type of relation instead of specific ones. Finally, our results show substantial improvements over a state-of-the-art information extraction system, both in terms of accuracy and, especially, recall.