Multi-View Dynamic Reflection Prior for Video Glass Surface Detection

Authors

  • Fang Liu City University of Hong Kong
  • Yuhao Liu City University of Hong Kong
  • Jiaying Lin City University of Hong Kong
  • Ke Xu City University of Hong Kong
  • Rynson W.H. Lau City University of Hong Kong

DOI:

https://doi.org/10.1609/aaai.v38i4.28148

Keywords:

CV: Video Understanding & Activity Analysis, CV: Applications, CV: Scene Analysis & Understanding

Abstract

Recent research has shown significant interest in image-based glass surface detection (GSD). However, detecting glass surfaces in dynamic scenes remains largely unexplored due to the lack of a high-quality dataset and an effective video glass surface detection (VGSD) method. In this paper, we propose the first VGSD approach. Our key observation is that reflections frequently appear on glass surfaces, but they change dynamically as the camera moves. Based on this observation, we propose to offset the excessive dependence on a single uncertainty reflection via joint modeling of temporal and spatial reflection cues. To this end, we propose the VGSD-Net with two novel modules: a Location-aware Reflection Extraction (LRE) module and a Context-enhanced Reflection Integration (CRI) module, for the position-aware reflection feature extraction and the spatial-temporal reflection cues integration, respectively. We have also created the first large-scale video glass surface dataset (VGSD-D), consisting of 19,166 image frames with accurately-annotated glass masks extracted from 297 videos. Extensive experiments demonstrate that VGSD-Net outperforms state-of-the-art approaches adapted from related fields. Code and dataset will be available at https://github.com/fawnliu/VGSD.

Published

2024-03-24

How to Cite

Liu, F., Liu, Y., Lin, J., Xu, K., & Lau, R. W. (2024). Multi-View Dynamic Reflection Prior for Video Glass Surface Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 38(4), 3594-3602. https://doi.org/10.1609/aaai.v38i4.28148

Issue

Section

AAAI Technical Track on Computer Vision III