CEC-Zero: Zero-Supervision Character Error Correction with Self-Generated Rewards

Authors

  • Zhiming Lin Nankai University
  • Kai Zhao Western Sydney University
  • Sophie Zhang Shanghai High School International Division
  • Peilai Yu Ludwig Maximilian University of Munich
  • Canran Xiao Shenzhen Campus of Sun Yat-sen University

DOI:

https://doi.org/10.1609/aaai.v40i28.39534

Abstract

Large-scale Chinese spelling correction (CSC) remains critical for real-world text processing, yet existing LLMs and supervised methods lack robustness to novel errors and rely on costly annotations. We introduce CEC-Zero, a zerosupervision reinforcement learning framework that addresses this by enabling LLMs to correct their own mistakes. CEC-Zero synthesizes errorful inputs from clean text, computes cluster-consensus rewards via semantic similarity and candidate agreement, and optimizes the policy with PPO. It outperforms supervised baselines by 10–13 F1 points and strong LLM fine-tunes by 5–8 points across 9 benchmarks, with theoretical guarantees of unbiased rewards and convergence.CEC-Zero establishes a label-free paradigm for robust, scalable CSC, unlocking LLM potential in noisy text pipelines.

Downloads

Published

2026-03-14

How to Cite

Lin, Z., Zhao, K., Zhang, S., Yu, P., & Xiao, C. (2026). CEC-Zero: Zero-Supervision Character Error Correction with Self-Generated Rewards. Proceedings of the AAAI Conference on Artificial Intelligence, 40(28), 23612–23620. https://doi.org/10.1609/aaai.v40i28.39534

Issue

Section

AAAI Technical Track on Machine Learning V