Prototype Entropy Alignment: Reinforcing Structured Uncertainty in LLM Reasoning
DOI:
https://doi.org/10.1609/aaai.v40i29.39656Abstract
Recent research reveals that a minority of high-entropy tokens significantly influence the reasoning quality of large language models (LLMs). Inspired by this, we propose Prototype Entropy Alignment (PEA), a reinforcement learning framework that models effective reasoning not as a single path but as a collection of learnable "entropy signatures." PEA identifies these signatures by clustering expert trajectories' uncertainty patterns into a diverse and continuously updated set of prototypes. The model is then rewarded for aligning its own reasoning process with these evolving targets, creating a self-improvement loop. Instead of replacing traditional outcome-based rewards, PEA provides a complementary, process-oriented signal. Our experiments show that this synergy is crucial: PEA substantially boosts performance on creative and general reasoning tasks and, when combined with outcome rewards, achieves SOTA results on structured tasks such as mathematics. By rewarding alignment with diverse and evolving reasoning structures, PEA offers a robust, verifier-free pathway to enhance reasoning's adaptability.Downloads
Published
2026-03-14
How to Cite
Pan, Z., Chen, Y., Jian, Z., Zhao, W., Ma, H., Wang, M., & Wu, Q. (2026). Prototype Entropy Alignment: Reinforcing Structured Uncertainty in LLM Reasoning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(29), 24709-24717. https://doi.org/10.1609/aaai.v40i29.39656
Issue
Section
AAAI Technical Track on Machine Learning VI