ImageSet2Text: Describing Sets of Images Through Text

Piera Riccio; Francesco Galati; Kajetan Schweighofer; Noa Garcia; Nuria M Oliver

doi:10.1609/aaai.v40i11.37826

Authors

Piera Riccio University of Amsterdam
Francesco Galati Independent Researcher
Kajetan Schweighofer Johannes Kepler University Linz
Noa Garcia The University of Osaka
Nuria M Oliver ELLIS Alicante

DOI:

https://doi.org/10.1609/aaai.v40i11.37826

Abstract

In the era of large-scale visual data, understanding collections of images is a challenging yet important task. To this end, we introduce ImageSet2Text, a novel method to automatically generate natural language descriptions of image sets. Based on large language models, visual-question answering chains, an external lexical graph, and CLIP-based verification, ImageSet2Text iteratively extracts key concepts from image subsets and organizes them into a structured concept graph. We conduct extensive experiments evaluating the quality of the generated descriptions in terms of accuracy, completeness, and user satisfaction. We also examine the method's behavior through ablation studies, scalability assessments, and failure analyses. Results demonstrate that ImageSet2Text combines data-driven AI and symbolic representations to reliably summarize large image collections for a wide range of applications.

ImageSet2Text: Describing Sets of Images Through Text

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information