Generalizable Fourier Augmentation for Unsupervised Video Object Segmentation
DOI:
https://doi.org/10.1609/aaai.v38i5.28295Keywords:
CV: Segmentation, ML: Unsupervised & Self-Supervised LearningAbstract
The performance of existing unsupervised video object segmentation methods typically suffers from severe performance degradation on test videos when tested in out-of-distribution scenarios. The primary reason is that the test data in real- world may not follow the independent and identically distribution (i.i.d.) assumption, leading to domain shift. In this paper, we propose a generalizable fourier augmentation method during training to improve the generalization ability of the model. To achieve this, we perform Fast Fourier Transform (FFT) over the intermediate spatial domain features in each layer to yield corresponding frequency representations, including amplitude components (encoding scene-aware styles such as texture, color, contrast of the scene) and phase components (encoding rich semantics). We produce a variety of style features via Gaussian sampling to augment the training data, thereby improving the generalization capability of the model. To further improve the cross-domain generalization performance of the model, we design a phase feature update strategy via exponential moving average using phase features from past frames in an online update manner, which could help the model to learn cross-domain-invariant features. Extensive experiments show that our proposed method achieves the state-of-the-art performance on popular benchmarks.Downloads
Published
2024-03-24
How to Cite
Song, H., Su, T., Zheng, Y., Zhang, K., Liu, B., & Liu, D. (2024). Generalizable Fourier Augmentation for Unsupervised Video Object Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 4918-4924. https://doi.org/10.1609/aaai.v38i5.28295
Issue
Section
AAAI Technical Track on Computer Vision IV