LoGoSeg: Integrating Local and Global Features for Open-Vocabulary Semantic Segmentation

Authors

  • Junyang Chen Southeast University Key Laboratory of Computer Network and Information Integration (Ministry of Education), Southeast University
  • Xiangbo Lv Southeast University Key Laboratory of Computer Network and Information Integration (Ministry of Education), Southeast University Lenovo Research
  • Zhiqiang Kou Southeast University Key Laboratory of Computer Network and Information Integration (Ministry of Education), Southeast University
  • Xingdong Sheng Lenovo Research
  • Ning Xu Southeast University Key Laboratory of Computer Network and Information Integration (Ministry of Education), Southeast University
  • Yiguo Qiao Southeast University Key Laboratory of Computer Network and Information Integration (Ministry of Education), Southeast University

DOI:

https://doi.org/10.1609/aaai.v40i4.37279

Abstract

Open-vocabulary semantic segmentation (OVSS) extends traditional closed-set segmentation by enabling pixel-wise annotation for both seen and unseen categories using arbitrary textual descriptions. While existing methods leverage vision-language models (VLMs) like CLIP, their reliance on image-level pretraining often results in imprecise spatial alignment, leading to mismatched segmentations in ambiguous or cluttered scenes. However, most existing approaches lack strong object priors and region-level constraints, which can lead to object hallucination or missed detections, further degrading performance. To address these challenges, we propose LoGoSeg, an efficient single-stage framework that integrates three key innovations: (i) an object existence prior that dynamically weights relevant categories through global image-text similarity, effectively reducing hallucinations; (ii) a region-aware alignment module that establishes precise region-level visual-textual correspondences; and (iii) a dual-stream fusion mechanism that optimally combines local structural information with global semantic context. Unlike prior works, LoGoSeg eliminates the need for external mask proposals, additional backbones, or extra datasets, ensuring efficiency. Extensive experiments on six benchmarks (A-847, PC-459, A-150, PC-59, PAS-20, and PAS-20b) demonstrate its competitive performance and strong generalization in open-vocabulary settings.

Published

2026-03-14

How to Cite

Chen, J., Lv, X., Kou, Z., Sheng, X., Xu, N., & Qiao, Y. (2026). LoGoSeg: Integrating Local and Global Features for Open-Vocabulary Semantic Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(4), 2886–2894. https://doi.org/10.1609/aaai.v40i4.37279

Issue

Section

AAAI Technical Track on Computer Vision I