Hyperbolic Hierarchical Alignment Reasoning Network for Text-3D Retrieval

Authors

  • Wenrui Li Harbin Institute of Technology
  • Yidan Lu Harbin Institute of Technology
  • Yeyu Chai Harbin Institute of Technology
  • Rui Zhao Nanyang Technological University
  • Hengyu Man Harbin Institute of Technology
  • Xiaopeng Fan Harbin Institute of Technology Peng Cheng Laboratory Harbin Institute of Technology Suzhou Research Institute

DOI:

https://doi.org/10.1609/aaai.v40i8.37576

Abstract

With the daily influx of 3D data on the internet, text-3D retrieval has gained increasing attention. However, current methods face two major challenges: Hierarchy Representation Collapse (HRC) and Redundancy-Induced Saliency Dilution (RISD). HRC compresses abstract-to-specific and whole-to-part hierarchies in Euclidean embeddings, while RISD averages noisy fragments, obscuring critical semantic cues and diminishing the model’s ability to distinguish hard negatives. To address these challenges, we introduce the Hyperbolic Hierarchical Alignment Reasoning Network (H2ARN) for text-3D retrieval. H2ARN embeds both text and 3D data in a Lorentz-model hyperbolic space, where exponential volume growth inherently preserves hierarchical distances. A hierarchical ordering loss constructs a shrinking entailment cone around each text vector, ensuring that the matched 3D instance falls within the cone, while an instance-level contrastive loss jointly enforces separation from non-matching samples. To tackle RISD, we propose a contribution-aware hyperbolic aggregation module that leverages Lorentzian distance to assess the relevance of each local feature and applies contribution-weighted aggregation guided by hyperbolic geometry, enhancing discriminative regions while suppressing redundancy without additional supervision. We also release the expanded T3DR-HIT v2 benchmark, which contains 8,935 text-to-3D pairs, 2.6 times the original size, covering both fine-grained cultural artefacts and complex indoor scenes.

Downloads

Published

2026-03-14

How to Cite

Li, W., Lu, Y., Chai, Y., Zhao, R., Man, H., & Fan, X. (2026). Hyperbolic Hierarchical Alignment Reasoning Network for Text-3D Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 40(8), 6477-6485. https://doi.org/10.1609/aaai.v40i8.37576

Issue

Section

AAAI Technical Track on Computer Vision V