Amodal Scene Analysis via Holistic Occlusion Relation Inference and Generative Mask Completion

Authors

  • Bowen Zhang The University of Adelaide
  • Qing Liu Adobe Research
  • Jianming Zhang Adobe Research
  • Yilin Wang Adobe Research
  • Liyang Liu The University of Adelaide
  • Zhe Lin Adobe Research
  • Yifan Liu The University of Adelaide

DOI:

https://doi.org/10.1609/aaai.v38i7.28526

Keywords:

CV: Segmentation, CV: Scene Analysis & Understanding

Abstract

Amodal scene analysis entails interpreting the occlusion relationship among scene elements and inferring the possible shapes of the invisible parts. Existing methods typically frame this task as an extended instance segmentation or a pair-wise object de-occlusion problem. In this work, we propose a new framework, which comprises a Holistic Occlusion Relation Inference (HORI) module followed by an instance-level Generative Mask Completion (GMC) module. Unlike previous approaches, which rely on mask completion results for occlusion reasoning, our HORI module directly predicts an occlusion relation matrix in a single pass. This approach is much more efficient than the pair-wise de-occlusion process and it naturally handles mutual occlusion, a common but often neglected situation. Moreover, we formulate the mask completion task as a generative process and use a diffusion-based GMC module for instance-level mask completion. This improves mask completion quality and provides multiple plausible solutions. We further introduce a large-scale amodal segmentation dataset with high-quality human annotations, including mutual occlusions. Experiments on our dataset and two public benchmarks demonstrate the advantages of our method. code public available at https://github.com/zbwxp/Amodal-AAAI.

Published

2024-03-24

How to Cite

Zhang, B., Liu, Q., Zhang, J., Wang, Y., Liu, L., Lin, Z., & Liu, Y. (2024). Amodal Scene Analysis via Holistic Occlusion Relation Inference and Generative Mask Completion. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 6997-7005. https://doi.org/10.1609/aaai.v38i7.28526

Issue

Section

AAAI Technical Track on Computer Vision VI