Catching the Long-Tail: Extracting Local News Events from Twitter
Keywords:Twitter, Local News, Event Detection, Information Extraction, Event Correlation
Twitter, used in 200 countries with over 250 milliontweets a day, is a rich source of local news from aroundthe world. Many events of local importance are first reportedon Twitter, including many that never reach newschannels. Further, there are often only a few tweetsreporting each such event, in contrast with the largervolumes that follow events of wider significance. Eventhough such events may be primarily of local importance,they can also be of critical interest to some specificbut possibly far flung entities: For example, a firein a supplier’s factory half-way around the world maybe of interest even from afar. In this paper we describehow this ‘long tail’ of events can be detected in spite oftheir sparsity.We then extract and correlate informationfrom multiple tweets describing the same event. Ourgeneric architecture for converting a tweet-stream intoevent-objects uses locality sensitive hashing, classification,boosting, information extraction and clustering.Our results, based on millions of tweets monitored overmany months, appear to validate our approach and architecture:We achieved success-rates in the 80% rangefor event detection and 76% on event-correlation; we also reduced tweet-comparisons by 80% using LSH.