Diversity Maximization in the Presence of Outliers

Authors

  • Daichi Amagata Osaka University

DOI:

https://doi.org/10.1609/aaai.v37i10.26454

Keywords:

SO: Other Foundations of Search & Optimization, CSO: Constraint Optimization, DMKM: Scalability, Parallel & Distributed Systems

Abstract

Given a set X of n points in a metric space, the problem of diversity maximization is to extract a set S of k points from X so that the diversity of S is maximized. This problem is essential in AI-related fields, such as web search, databases, recommender systems, and data mining. Although there have been extensive studies of this problem, these studies assume that X is clean. This usually does not hold, because real-world datasets usually contain outliers. The state-of-the-art algorithm for the diversity maximization problem is based on furthest point retrieval, which is too sensitive to outliers. We therefore address the problem of diversity maximization with outliers and propose two algorithms with performance guarantee. The first algorithm runs in O((k+z)n) time, guarantees 1/2-approximation, and returns no outliers, where z is the number of outliers. The second algorithm runs in O(kz) time (which is independent of n), guarantees 1/6(1+epsilon)-approximation, and returns no outliers with constant probability. We conduct experiments on real datasets to demonstrate the effectiveness and efficiency of our algorithms.

Downloads

Published

2023-06-26

How to Cite

Amagata, D. (2023). Diversity Maximization in the Presence of Outliers. Proceedings of the AAAI Conference on Artificial Intelligence, 37(10), 12338-12345. https://doi.org/10.1609/aaai.v37i10.26454

Issue

Section

AAAI Technical Track on Search and Optimization