BiHiTo: Biomolecular Hierarchy-inspired Tokenization

Authors

  • Ruochong Zheng School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen, China AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School , China
  • Yutian Liu School of Computer Science, Peking University, Beijing, China
  • Yian Zhao School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen, China AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School , China
  • Zhiwei Nie School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen, China
  • Xuehan Hou School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen, China AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School , China
  • Chang Liu Department of Automation, Tsinghua University, Beijing, China
  • Siwei Ma School of Computer Science, Peking University, Beijing, China
  • Youdong Mao School of Physics, Peking University, Beijing, China Center for Quantitative Biology, Peking University, Beijing, China National Biomedical Imaging Center, Peking University, Beijing, China Peking-Tsinghua Joint Center for Life Sciences, Peking University, Beijing, China
  • Jie Chen School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen, China AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School , China

DOI:

https://doi.org/10.1609/aaai.v40i34.40119

Abstract

Three-dimensional atomic arrangements of biomolecules are key to demystifying biological functions. The rapid expansion of accessible structural data, driven by advances in AI for science, highlights the critical challenge of efficiently modeling large-scale biomolecular structures, which are high-dimensional systems shaped by biological assembly principles. To address this, we introduce BiHiTo, a multi-level Biomolecular Hierarchy-inspired Tokenizer that intrinsically mimics natural biological assembly hierarchies. Specifically, we design a multi-codebook quantizer that mirrors the natural hierarchy of biomolecular structure, enabling simultaneous capture of representations spanning atomic motifs to global conformational variations. This hierarchical alignment markedly improves the biological interpretability and reconstruction fidelity of biomolecular structure.Extensive experiments demonstrate that BiHiTo delivers state-of-the-art performance and robust generalization across molecular dynamics trajectories and macromolecular complexes, facilitating advances in structure generation and dynamic conformation exploration. In the reconstruction of the CASP14 and OOD test set FastFolding protein multi-conformation data, our method achieves a 17% and 51% reduction in RMSD compared to Bio2Token, respectively.

Downloads

Published

2026-03-14

How to Cite

Zheng, R., Liu, Y., Zhao, Y., Nie, Z., Hou, X., Liu, C., … Chen, J. (2026). BiHiTo: Biomolecular Hierarchy-inspired Tokenization. Proceedings of the AAAI Conference on Artificial Intelligence, 40(34), 28848–28856. https://doi.org/10.1609/aaai.v40i34.40119

Issue

Section

AAAI Technical Track on Machine Learning XI