CondDiff-AMO: Integrating Conditional Diffusion Mechanism for Unified Amodal Mask Generation

Authors

  • CaiJie Zhao University of Macau
  • Bob Zhang University of Macau

DOI:

https://doi.org/10.1609/aaai.v40i15.38308

Abstract

Aiming to estimate the full extent of partially occluded objects, amodal segmentation is a critical capability for visual intelligence. Existing methods suffer from limitations in efficiency and precision, due to their reliance on auxiliary information or two-stage architectures. Furthermore, they lack generalizability, failing to meet practical requirements. To overcome these challenges, we proposed a new paradigm, CondDiff-AMO, that interprets amodal segmentation as a denoising problem by leveraging diffusion models. Methodologically, the designed novel framework consists of three key innovations to adapt the task characteristics and unlocks the diffusion models’ potential in amodal segmentation, including a masking strategy in the forward process, an adaptive transformer for conditional feature extraction, and visual-guided sampling. In the forward process, progressive masking strategy converts ground-truth masks to visible masks, simulating amodal segmentation process to enhance reasoning regarding occluded areas. For architectural design, a pyramid network with feature refinement extracts adaptive and representative conditional priors, improving the guidance in the denoising process of diffusion models. As for the sampling stage, a visible mask is incorporated with an ensemble strategy, restricting the prediction on occluded part. Experiments were conducted on five well-known datasets under supervised and zero-shot learning, with the results confirming that CondDiff-AMO outperforms state-of-the-art methods.

Published

2026-03-14

How to Cite

Zhao, C., & Zhang, B. (2026). CondDiff-AMO: Integrating Conditional Diffusion Mechanism for Unified Amodal Mask Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(15), 13079–13087. https://doi.org/10.1609/aaai.v40i15.38308

Issue

Section

AAAI Technical Track on Computer Vision XII