On the Computational, Informational, and Physical Foundations for AI Safety

Authors

  • Robin Young University of Cambridge

DOI:

https://doi.org/10.1609/aies.v8i3.36802

Abstract

Current approaches to AI safety predominantly focus on specifying correct behavior through software, data, and rules. This work argues that this approach faces theoretically fundamental, and not merely practical, limitations. I present a multi-layered analysis of this paradigm, demonstrating its inherent barriers from the perspectives of computational complexity, information theory, and physical engineering. In ongoing work, I prove that even simplified forms of semantic self-verification are computationally intractable (NP-complete). I use information theory to show that any specification of an external, ambiguous concept like "harm" is necessarily incomplete. To address these limits, I develop a framework for reasoning about verifiable, physically-enforced safety bounds that are independent of software state.

Downloads

Published

2025-10-15

How to Cite

Young, R. (2025). On the Computational, Informational, and Physical Foundations for AI Safety. Proceedings of the AAAI ACM Conference on AI, Ethics, and Society, 8(3), 2944–2946. https://doi.org/10.1609/aies.v8i3.36802