SEEFT: Planned Social Event Discovery and Attribute Extraction by Fusing Twitter and Web Content
Social events comprise some of the most popular topics in social media. Automatically identifying planned social events and extracting structured information, such as event title, date, and location, would enable more effective index, display and search for social events. However, the informal and noisy nature of language used in social media can degrade the quality of event extraction, resulting in broken titles, incorrect or absent attributes - making the resulting event databases not suitable for realistic applications. Previous work mostly focused on event identification and categorization in Twitter. Yet, event title extraction, arguably one of the most useful and difficult tasks in this domain, has never been investigated. In this paper, we address the task of identifying and extracting structured information (titles, dates, locations) for planned social events, and introduce SEEFT, a social event extraction system, which uses social media content to discover events. To extract the event title and other attributes, SEEFT fuses the original social media content and the content of other Tweets and webpages. Experiments over multiple popular event types and more than a thousand of event instances show that SEEFT significantly outperforms the previous state-of-the-art system in event identification. Moreover, by fusing information from multiple sources, SEEFT is able to extract event titles with high accuracy, providing the foundation for practical applications such as event discovery, search, and recommendation.