On the Computational, Informational, and Physical Foundations for AI Safety
DOI:
https://doi.org/10.1609/aies.v8i3.36802Abstract
Current approaches to AI safety predominantly focus on specifying correct behavior through software, data, and rules. This work argues that this approach faces theoretically fundamental, and not merely practical, limitations. I present a multi-layered analysis of this paradigm, demonstrating its inherent barriers from the perspectives of computational complexity, information theory, and physical engineering. In ongoing work, I prove that even simplified forms of semantic self-verification are computationally intractable (NP-complete). I use information theory to show that any specification of an external, ambiguous concept like "harm" is necessarily incomplete. To address these limits, I develop a framework for reasoning about verifiable, physically-enforced safety bounds that are independent of software state.Downloads
Published
2025-10-15
How to Cite
Young, R. (2025). On the Computational, Informational, and Physical Foundations for AI Safety. Proceedings of the AAAI ACM Conference on AI, Ethics, and Society, 8(3), 2944–2946. https://doi.org/10.1609/aies.v8i3.36802
Issue
Section
Student Abstracts 25