GeoNum: Bridging Numerical Continuity and Language Semantics via Geometric Embedding

Authors

  • Shengkai Jin Beihang University
  • Tianyu Chen Beihang University
  • Chonghan Gao Beihang University
  • Jun Han Beihang University

DOI:

https://doi.org/10.1609/aaai.v40i27.39401

Abstract

Large language models excel at semantic reasoning yet struggle with numerical tasks because tokenization disrupts geometric continuity. Traditional methods fragment numerically close values into inconsistent token sequences, severing the correspondence between numerical proximity and representational similarity, which is essential for numerical cognition. We introduce GeoNum, a geometrically coherent numerical embedding based on polar coordinate decomposition. By encoding integer magnitudes through classification and fractional components via trigonometric regression, GeoNum constructs a continuous manifold where numerical distance is preserved geometrically. A three-stage framework progressively integrates GeoNum into pretrained language models via self-supervised pretraining, projection alignment, and efficient adaptation. Experimental results across diverse arithmetic benchmarks demonstrate consistent gains in high-precision accuracy and improved interpolation and extrapolation, underscoring the promising benefits of geometric continuity for numerical modeling in large language models.

Downloads

Published

2026-03-14

How to Cite

Jin, S., Chen, T., Gao, C., & Han, J. (2026). GeoNum: Bridging Numerical Continuity and Language Semantics via Geometric Embedding. Proceedings of the AAAI Conference on Artificial Intelligence, 40(27), 22426-22434. https://doi.org/10.1609/aaai.v40i27.39401

Issue

Section

AAAI Technical Track on Machine Learning IV