Prototype Entropy Alignment: Reinforcing Structured Uncertainty in LLM Reasoning

Authors

  • Zhengyuan Pan School of Film, Xiamen University, Xiamen, China
  • Yanhao Chen School of Film, Xiamen University, Xiamen, China
  • Zhongquan Jian School of Computer and Data Science, Minjiang University, Fuzhou, China
  • Wanru Zhao School of Informatics, Xiamen University, Xiamen, China
  • Haonan Ma University of Chinese Academy of Sciences, Beijing, China
  • Meihong Wang School of Informatics, Xiamen University, Xiamen, China
  • Qingqiang Wu School of Film, Xiamen University, Xiamen, China School of Informatics, Xiamen University, Xiamen, China Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan, Ministry of Culture and Tourism, Xiamen University, China

DOI:

https://doi.org/10.1609/aaai.v40i29.39656

Abstract

Recent research reveals that a minority of high-entropy tokens significantly influence the reasoning quality of large language models (LLMs). Inspired by this, we propose Prototype Entropy Alignment (PEA), a reinforcement learning framework that models effective reasoning not as a single path but as a collection of learnable "entropy signatures." PEA identifies these signatures by clustering expert trajectories' uncertainty patterns into a diverse and continuously updated set of prototypes. The model is then rewarded for aligning its own reasoning process with these evolving targets, creating a self-improvement loop. Instead of replacing traditional outcome-based rewards, PEA provides a complementary, process-oriented signal. Our experiments show that this synergy is crucial: PEA substantially boosts performance on creative and general reasoning tasks and, when combined with outcome rewards, achieves SOTA results on structured tasks such as mathematics. By rewarding alignment with diverse and evolving reasoning structures, PEA offers a robust, verifier-free pathway to enhance reasoning's adaptability.

Published

2026-03-14

How to Cite

Pan, Z., Chen, Y., Jian, Z., Zhao, W., Ma, H., Wang, M., & Wu, Q. (2026). Prototype Entropy Alignment: Reinforcing Structured Uncertainty in LLM Reasoning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(29), 24709-24717. https://doi.org/10.1609/aaai.v40i29.39656

Issue

Section

AAAI Technical Track on Machine Learning VI