Keypoint Fusion for RGB-D Based 3D Hand Pose Estimation
DOI:
https://doi.org/10.1609/aaai.v38i4.28166Keywords:
CV: Biometrics, Face, Gesture & Pose, CV: Multi-modal VisionAbstract
Previous 3D hand pose estimation methods primarily rely on a single modality, either RGB or depth, and the comprehensive utilization of the dual modalities has not been extensively explored. RGB and depth data provide complementary information and thus can be fused to enhance the robustness of 3D hand pose estimation. However, there exist two problems for applying existing fusion methods in 3D hand pose estimation: redundancy of dense feature fusion and ambiguity of visual features. First, pixel-wise feature interactions introduce high computational costs and ineffective calculations of invalid pixels. Second, visual features suffer from ambiguity due to color and texture similarities, as well as depth holes and noise caused by frequent hand movements, which interferes with modeling cross-modal correlations. In this paper, we propose Keypoint-Fusion for RGB-D based 3D hand pose estimation, which leverages the unique advantages of dual modalities to mutually eliminate the feature ambiguity, and performs cross-modal feature fusion in a more efficient way. Specifically, we focus cross-modal fusion on sparse yet informative spatial regions (i.e. keypoints). Meanwhile, by explicitly extracting relatively more reliable information as disambiguation evidence, depth modality provides 3D geometric information for RGB feature pixels, and RGB modality complements the precise edge information lost due to the depth noise. Keypoint-Fusion achieves state-of-the-art performance on two challenging hand datasets, significantly decreasing the error compared with previous single-modal methods.Downloads
Published
2024-03-24
How to Cite
Liu, X., Ren, P., Gao, Y., Wang, J., Sun, H., Qi, Q., Zhuang, Z., & Liao, J. (2024). Keypoint Fusion for RGB-D Based 3D Hand Pose Estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(4), 3756-3764. https://doi.org/10.1609/aaai.v38i4.28166
Issue
Section
AAAI Technical Track on Computer Vision III