Can Go AIs Be Adversarially Robust?

Tom Tseng; Euan McLean; Kellin Pelrine; Tony Tong Wang; Adam Gleave

doi:10.1609/aaai.v39i26.34980

Can Go AIs Be Adversarially Robust?

Authors

Tom Tseng FAR AI
Euan McLean FAR AI
Kellin Pelrine McGill University
Tony Tong Wang Massachusetts Institute of Technology
Adam Gleave FAR AI

DOI:

https://doi.org/10.1609/aaai.v39i26.34980

Abstract

Prior work found that superhuman Go AIs like KataGo can be defeated by simple adversarial strategies. In this paper, we study if defenses can improve KataGo's worst-case performance. We test three natural defenses: adversarial training on hand-constructed positions, iterated adversarial training, and changing the network architecture. We find that though some of these defenses protect against previously discovered attacks, none withstand adaptive attacks. In particular, we are able to train new adversaries that reliably defeat our defended agents by causing them to blunder in ways humans would not. Our results suggest that building robust AI systems is challenging even for superhuman systems in narrow domains like Go.

AAAI-25 / IAAI-25 / EAAI-25 Proceedings Cover

Downloads

Published

2025-04-11

How to Cite

Tseng, T., McLean, E., Pelrine, K., Wang, T. T., & Gleave, A. (2025). Can Go AIs Be Adversarially Robust?. Proceedings of the AAAI Conference on Artificial Intelligence, 39(26), 27662–27670. https://doi.org/10.1609/aaai.v39i26.34980

Download Citation

Issue

Vol. 39 No. 26: AAAI-25 Special Track on AI Alignment

Section

AAAI Technical Track on AI Alignment

Can Go AIs Be Adversarially Robust?

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information