Interactive Rare-Category-of-Interest Mining from Large Datasets
In the era of big data, rare category data examples are often of key importance despite their scarcity, e.g., rare bird audio is usually more valuable than common bird audio. However, existing efforts on rare category mining consider only the statistical characteristics of rare category data examples, while ignoring their ‘true’ interestingness to the user. Moreover, current approaches are unable to support real-time user interactions due to their prohibitive computational costs for answering a single user query.
In this paper, we contribute a new model named IRim, which can interactively mine rare category data examples of interest over large datasets. The mining process is carried out by two steps, namely rare category detection (RCD) followed by rare category exploration (RCE). In RCD, by introducing an offline phase and high-level knowledge abstractions, IRim reduces the time complexity of answering a user query from quadratic to logarithmic. In RCE, by proposing a collaborative-reconstruction based approach, we are able to explicitly encode both user preference and rare category characteristics. Extensive experiments on five diverse real-world datasets show that our method achieves the response time in seconds for user interactions, and outperforms state-of-the-art competitors significantly in accuracy and number of queries. As a side contribution, we construct and release two benchmark datasets which to our knowledge are the first public datasets tailored for rare category mining task.