PhysDiff: Physiology-based Dynamicity Disentangled Diffusion Model for Remote Physiological Measurement
DOI:
https://doi.org/10.1609/aaai.v39i6.32704Abstract
Recent works on remote PhotoPlethysmoGraphy (rPPG) estimation typically use techniques like CNNs and Transformers to encode implicit features from facial videos for prediction. These methods learn to directly map facial videos to the static values of rPPG signals, overlooking the inherent dynamic characteristics of rPPG sequence. Moreover, the rPPG signal is extremely weak and highly susceptible to interference from various sources of noise, including illumination conditions, head movements, and variations in skin tone. To address these limitations, we propose a Physiology-based dynamicity disentangled diffusion (PhysDiff) model particularly designed for robust rPPG estimation. PhysDiff leverages the diffusion model to learn the distribution of quasi-periodic rPPG signal and uses a dynamicity disentanglement strategy to capture two dynamic characteristics in temporal rPPG signal, i.e., trend and amplitude. This disentanglement is motivated by the underlying dynamic physiological processes of vasodilation and vasoconstriction, ensuring a more precise representation of the rPPG signal. The disentangled components are then used as pivotal conditions in the proposed spatial-temporal hybrid denoiser for rPPG reconstruction. Besides, we introduce a periodicity-based multi-hypothesis selection strategy in model inference, which compares the natural periodicity of multiple generated rPPG hypotheses and selects the most favorable one as the final prediction. Extensive experiments on four datasets demonstrate that our PhysDiff significantly outperforms prior methods on both intra-dataset and cross-dataset testing.Downloads
Published
2025-04-11
How to Cite
Qian, W., Su, G., Guo, D., Zhou, J., Li, X., Hu, B., … Wang, M. (2025). PhysDiff: Physiology-based Dynamicity Disentangled Diffusion Model for Remote Physiological Measurement. Proceedings of the AAAI Conference on Artificial Intelligence, 39(6), 6568–6576. https://doi.org/10.1609/aaai.v39i6.32704
Issue
Section
AAAI Technical Track on Computer Vision V