ImageSet2Text: Describing Sets of Images Through Text

Authors

  • Piera Riccio University of Amsterdam
  • Francesco Galati Independent Researcher
  • Kajetan Schweighofer Johannes Kepler University Linz
  • Noa Garcia The University of Osaka
  • Nuria M Oliver ELLIS Alicante

DOI:

https://doi.org/10.1609/aaai.v40i11.37826

Abstract

In the era of large-scale visual data, understanding collections of images is a challenging yet important task. To this end, we introduce ImageSet2Text, a novel method to automatically generate natural language descriptions of image sets. Based on large language models, visual-question answering chains, an external lexical graph, and CLIP-based verification, ImageSet2Text iteratively extracts key concepts from image subsets and organizes them into a structured concept graph. We conduct extensive experiments evaluating the quality of the generated descriptions in terms of accuracy, completeness, and user satisfaction. We also examine the method's behavior through ablation studies, scalability assessments, and failure analyses. Results demonstrate that ImageSet2Text combines data-driven AI and symbolic representations to reliably summarize large image collections for a wide range of applications.

Downloads

Published

2026-03-14

How to Cite

Riccio, P., Galati, F., Schweighofer, K., Garcia, N., & Oliver, N. M. (2026). ImageSet2Text: Describing Sets of Images Through Text. Proceedings of the AAAI Conference on Artificial Intelligence, 40(11), 8731–8739. https://doi.org/10.1609/aaai.v40i11.37826

Issue

Section

AAAI Technical Track on Computer Vision VIII