HSRDiff: A Hierarchical Self-Regulation Diffusion Model for Stochastic Semantic Segmentation
DOI:
https://doi.org/10.1609/aaai.v39i2.32163Abstract
In safety-critical domains such as medical diagnostics and autonomous driving, single-image evidence is sometimes insufficient to reflect the inherent ambiguity of vision problems. Therefore, multiple plausible assumptions that match the image semantics may be needed to reflect the actual distribution of targets and support downstream tasks. However, balancing and improving the diversity and consistency of segmentation predictions under the high-dimensional output spaces and potential multimodal distributions is still challenging. This paper presents Hierarchical Self-Regulation Diffusion (HSRDiff), a unified framework that simulates joint probability distribution over entire labels. Our model self-regulates the balance between the two modes of predicting the label and noise in a novel ``differentiation to unification" pipeline and dynamically fits the optimal path to model the aleatoric uncertainty rooted in observations. In addition, we preserve the high-fidelity reconstruction of the delicate structure in images by leveraging the hierarchical multi-scale condition priors. We validate HSRDiff in three different semantic scenarios. Experimental results show that HSRDiff is superior to the comparison method with a considerable performance gap.Published
2025-04-11
How to Cite
Yang, H., Yang, C., An, Z., Huang, L., & Xu, Y. (2025). HSRDiff: A Hierarchical Self-Regulation Diffusion Model for Stochastic Semantic Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 39(2), 1701–1709. https://doi.org/10.1609/aaai.v39i2.32163
Issue
Section
AAAI Technical Track on Computer Vision I