(1)
Wu, M.; Zhang, Z.; Dong, Q.; Xi, Z.; Zhao, J.; Jin, S.; Fan, X.; Zhou, Y.; Lv, H.; Zhang, M. Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination. AAAI 2026, 40, 33944-33952.