PRISM: A Rich Class of Parameterized Submodular Information Measures for Guided Data Subset Selection

Authors

  • Suraj Kothawade University of Texas at Dallas
  • Vishal Kaushal Indian Institute of Technology, Bombay
  • Ganesh Ramakrishnan Indian Institute of Technology, Bombay
  • Jeff Bilmes University of Washington, Seattle
  • Rishabh Iyer University of Texas at Dallas Indian Institute of Technology, Bombay

DOI:

https://doi.org/10.1609/aaai.v36i9.21264

Keywords:

Search And Optimization (SO), Data Mining & Knowledge Management (DMKM), Machine Learning (ML), Computer Vision (CV)

Abstract

With ever-increasing dataset sizes, subset selection techniques are becoming increasingly important for a plethora of tasks. It is often necessary to guide the subset selection to achieve certain desiderata, which includes focusing or targeting certain data points, while avoiding others. Examples of such problems include: i)targeted learning, where the goal is to find subsets with rare classes or rare attributes on which the model is under performing, and ii)guided summarization, where data (e.g., image collection, text, document or video) is summarized for quicker human consumption with specific additional user intent. Motivated by such applications, we present PRISM, a rich class of PaRameterIzed Submodular information Measures. Through novel functions and their parameterizations, PRISM offers a variety of modeling capabilities that enable a trade-off between desired qualities of a subset like diversity or representation and similarity/dissimilarity with a set of data points. We demonstrate how PRISM can be applied to the two real-world problems mentioned above, which require guided subset selection. In doing so, we show that PRISM interestingly generalizes some past work, therein reinforcing its broad utility. Through extensive experiments on diverse datasets, we demonstrate the superiority of PRISM over the state-of-the-art in targeted learning and in guided image-collection summarization. PRISM is available as a part of the SUBMODLIB (https://github.com/decile-team/submodlib) and TRUST (https://github.com/decile-team/trust) toolkits.

Downloads

Published

2022-06-28

How to Cite

Kothawade, S., Kaushal, V., Ramakrishnan, G., Bilmes, J., & Iyer, R. (2022). PRISM: A Rich Class of Parameterized Submodular Information Measures for Guided Data Subset Selection. Proceedings of the AAAI Conference on Artificial Intelligence, 36(9), 10238-10246. https://doi.org/10.1609/aaai.v36i9.21264

Issue

Section

AAAI Technical Track on Search and Optimization