Wu, M., Zhang, Z., Dong, Q., Xi, Z., Zhao, J., Jin, S., … Zhang, Q. (2026). Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination. Proceedings of the AAAI Conference on Artificial Intelligence, 40(40), 33944–33952. https://doi.org/10.1609/aaai.v40i40.40687