Discovering Serendipitous Information from Wikipedia by Using Its Network Structure
Many researchers conducted studies on extracting relevant information from web documents. However, there are few studies on extracting serendipitous information. We propose methods to discover unexpected information from Wikipedia by using its network structure, for example, the distance between two categories. We evaluated two methods: a classification-based method using support vector machines (SVMs), and a ranking-based method using regression. We demonstrate advantages of regression over classification.