Dynamic Semantic Tokenization for Time Series via Elastic Sampling on Physics-aware Perception
DOI:
https://doi.org/10.1609/aaai.v40i28.39517Abstract
Despite the remarkable success of semantic token learning in NLP and vision domains, token-level representation mechanisms face fundamental challenges when extended to continuous time series analysis. We identify a core limitation lies in the intrinsic absence of semantically meaningful tokenization boundaries within time-series, which differs substantially from discrete text tokens and presents unique complexities compared to spatially coherent image patches. While existing works mechanically apply fixed-length partitioning, recent evidence from time series foundation models reveals performance ceilings in prediction tasks under such paradigms. This paper introduces a novel tokenization framework known as physics-aware tokenization (PATK), designed to implement adaptive time-frequency tokenization via distribution-sensitive sampling strategies. Key innovations include: 1) A Rate-of-Variation (RoV) distribution is meticulously structured to encompass multi-scale temporal dynamics in the time domain, alongside a Spectral Energy Intensity (SEI) distribution devised to reveal global seasonal patterns within the frequency domain; 2) A physics-aware hidden Markov modeling (PA-HMM) is then established to adaptively breaks down continuous time-series into distinct tokens with elastic lengths, responding to physics-aware probabilities sampled from RoV and SEI distributions. The proposed PATK allows steady integration with both conventional Transformers and advanced large-scale time series models (including LLM-transferred methods and pretrained time series foundation models). Simulations across various datasets demonstrate that PATK excels in classification and forecasting tasks, showing notable adaptability to model long-term dependencies, strengthening resilience against disturbances, and robustness to missing data events.Published
2026-03-14
How to Cite
Liao, H., Yang, Z., Xia, J., Sun, Y., Zhang, Y., Li, S., & Liu, Y. (2026). Dynamic Semantic Tokenization for Time Series via Elastic Sampling on Physics-aware Perception. Proceedings of the AAAI Conference on Artificial Intelligence, 40(28), 23460-23468. https://doi.org/10.1609/aaai.v40i28.39517
Issue
Section
AAAI Technical Track on Machine Learning V