Explainable Oracle Bone Script Recognition via Multimodal Pictographic Reasoning

Authors

  • Yin Wu The Hong Kong University of Science and Technology (Guangzhou)
  • Zhengxuan Zhang The Hong Kong University of Science and Technology (Guangzhou)
  • Jiayu Chen The Hong Kong University of Science and Technology (Guangzhou)
  • Chang Xu The Hong Kong University of Science and Technology (Guangzhou)
  • Yuyu Luo The Hong Kong University of Science and Technology (Guangzhou) The Hong Kong University of Science and Technology
  • Nan Tang The Hong Kong University of Science and Technology (Guangzhou) The Hong Kong University of Science and Technology
  • Hui Xiong The Hong Kong University of Science and Technology (Guangzhou) The Hong Kong University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v40i46.41296

Abstract

Oracle Bone Script, East Asia's earliest mature writing system from over 3,500 years ago, encodes ancient cognition through visual metaphors, yet remains largely undeciphered and inaccessible, severing modern society from its cultural roots. Traditional AI methods, while accurate in classification, treat glyphs as opaque data, neglecting their pictographic essence and failing to foster public understanding—exacerbating a heritage crisis amid linguistic evolution. We pioneer a paradigm shift toward AI-driven cultural democratization, introducing OracleVis, the first human-validated multimodal dataset of glyph-image-explanation triplets, curated through expert collaborations to overcome data scarcity, bias, and incompleteness in archaeological sources. Building on this, OBS-VM, an explainability-centric multimodal large language model fine-tuned on Qwen2-VL-7B, models pictographic reasoning by balancing semantic fidelity with interpretive transparency, transforming black-box predictions into cognition-aligned narratives. Rigorous evaluations, including benchmarks and a user study with 24 non-experts, reveal our system's superiority: it outperforms GPT-4o in pictographic rationality (3.79 vs. 3.58 in human evaluation) and achieves a 35.3% relative improvement in recognition accuracy, while interactive learning boosts knowledge gains (+5.5 vs. +1.7), interest (+1.9 vs. +0.4), and confidence (+2.0 vs. +0.3) over static methods. This work illuminates AI's potential to bridge ancient wisdom and contemporary audiences, redefining heritage preservation as an inclusive, socially impactful endeavor that turns cultural alienation into enlightened engagement.

Published

2026-03-14

How to Cite

Wu, Y., Zhang, Z., Chen, J., Xu, C., Luo, Y., Tang, N., & Xiong, H. (2026). Explainable Oracle Bone Script Recognition via Multimodal Pictographic Reasoning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(46), 39460–39467. https://doi.org/10.1609/aaai.v40i46.41296