Diversity Maximization in the Presence of Outliers

Daichi Amagata

doi:10.1609/aaai.v37i10.26454

Authors

Daichi Amagata Osaka University

DOI:

https://doi.org/10.1609/aaai.v37i10.26454

Keywords:

SO: Other Foundations of Search & Optimization, CSO: Constraint Optimization, DMKM: Scalability, Parallel & Distributed Systems

Abstract

Given a set X of n points in a metric space, the problem of diversity maximization is to extract a set S of k points from X so that the diversity of S is maximized. This problem is essential in AI-related fields, such as web search, databases, recommender systems, and data mining. Although there have been extensive studies of this problem, these studies assume that X is clean. This usually does not hold, because real-world datasets usually contain outliers. The state-of-the-art algorithm for the diversity maximization problem is based on furthest point retrieval, which is too sensitive to outliers. We therefore address the problem of diversity maximization with outliers and propose two algorithms with performance guarantee. The first algorithm runs in O((k+z)n) time, guarantees 1/2-approximation, and returns no outliers, where z is the number of outliers. The second algorithm runs in O(kz) time (which is independent of n), guarantees 1/6(1+epsilon)-approximation, and returns no outliers with constant probability. We conduct experiments on real datasets to demonstrate the effectiveness and efficiency of our algorithms.

Diversity Maximization in the Presence of Outliers

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information