Prototype Entropy Alignment: Reinforcing Structured Uncertainty in LLM Reasoning

Zhengyuan Pan; Yanhao Chen; Zhongquan Jian; Wanru Zhao; Haonan Ma; Meihong Wang; Qingqiang Wu

doi:10.1609/aaai.v40i29.39656

Authors

Zhengyuan Pan School of Film, Xiamen University, Xiamen, China
Yanhao Chen School of Film, Xiamen University, Xiamen, China
Zhongquan Jian School of Computer and Data Science, Minjiang University, Fuzhou, China
Wanru Zhao School of Informatics, Xiamen University, Xiamen, China
Haonan Ma University of Chinese Academy of Sciences, Beijing, China
Meihong Wang School of Informatics, Xiamen University, Xiamen, China
Qingqiang Wu School of Film, Xiamen University, Xiamen, China School of Informatics, Xiamen University, Xiamen, China Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan, Ministry of Culture and Tourism, Xiamen University, China

DOI:

https://doi.org/10.1609/aaai.v40i29.39656

Abstract

Recent research reveals that a minority of high-entropy tokens significantly influence the reasoning quality of large language models (LLMs). Inspired by this, we propose Prototype Entropy Alignment (PEA), a reinforcement learning framework that models effective reasoning not as a single path but as a collection of learnable "entropy signatures." PEA identifies these signatures by clustering expert trajectories' uncertainty patterns into a diverse and continuously updated set of prototypes. The model is then rewarded for aligning its own reasoning process with these evolving targets, creating a self-improvement loop. Instead of replacing traditional outcome-based rewards, PEA provides a complementary, process-oriented signal. Our experiments show that this synergy is crucial: PEA substantially boosts performance on creative and general reasoning tasks and, when combined with outcome rewards, achieves SOTA results on structured tasks such as mathematics. By rewarding alignment with diverse and evolving reasoning structures, PEA offers a robust, verifier-free pathway to enhance reasoning's adaptability.

Prototype Entropy Alignment: Reinforcing Structured Uncertainty in LLM Reasoning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information