MINGLE: VLMs for Semantically Complex Region Detection in Urban Scenes

Authors

  • Liu Liu Massachusetts Institute of Technology
  • Alexandra Schild Hasso Plattner Institute
  • Marco Cipriano Hasso Plattner Institute
  • Fatimeh Al Ghannam Massachusetts Institute of Technology
  • Freya Tan Massachusetts Institute of Technology
  • Gerard de Melo Hasso Plattner Institute
  • Andres Sevtsuk Massachusetts Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v40i45.41239

Abstract

Understanding group-level social interactions in public spaces is crucial for urban planning, informing the design of socially vibrant and inclusive environments. Detecting such interactions from images involves interpreting subtle visual cues such as relations, proximity and co-movement – semantically complex signals that go beyond traditional object detection. To address this challenge, we introduce a social group region detection task, which requires inferring and spatially grounding visual regions defined by abstract interpersonal relations. We propose MINGLE (Modeling INterpersonal Group-Level Engagement), a modular three-stage pipeline that integrates: (1) off-the-shelf human detection and depth estimation, (2) VLM-based reasoning to classify pairwise social affiliation, and (3) a lightweight spatial aggregation algorithm to localize socially connected groups. To support this task and encourage future research, we present a new dataset of 100K urban street-view images annotated with bounding boxes and labels for both individuals and socially interacting groups. The annotations combine human-created labels and outputs from the MINGLE pipeline, ensuring semantic richness and broad coverage of real world scenarios.

Downloads

Published

2026-03-14

How to Cite

Liu, L., Schild, A., Cipriano, M., Ghannam, F. A., Tan, F., de Melo, G., & Sevtsuk, A. (2026). MINGLE: VLMs for Semantically Complex Region Detection in Urban Scenes. Proceedings of the AAAI Conference on Artificial Intelligence, 40(45), 38935-38942. https://doi.org/10.1609/aaai.v40i45.41239

Issue

Section

AAAI Special Track on AI for Social Impact I