Towards Scalable Web Accessibility Audit with MLLMs as Copilots

Ming Gu; Ziwei Wang; Sicen Lai; Zirui Gao; Sheng Zhou; Jiajun Bu

doi:10.1609/aaai.v40i45.41193

Authors

Ming Gu Zhejiang Key Laboratory of Accessible Perception and Intelligent Systems, Zhejiang University College of Computer Science and Technology, Zhejiang University
Ziwei Wang Zhejiang Key Laboratory of Accessible Perception and Intelligent Systems, Zhejiang University College of Computer Science and Technology, Zhejiang University
Sicen Lai Zhejiang Key Laboratory of Accessible Perception and Intelligent Systems, Zhejiang University School of Software Technology, Zhejiang University
Zirui Gao Zhejiang Key Laboratory of Accessible Perception and Intelligent Systems, Zhejiang University College of Computer Science and Technology, Zhejiang University
Sheng Zhou Zhejiang Key Laboratory of Accessible Perception and Intelligent Systems, Zhejiang University School of Software Technology, Zhejiang University
Jiajun Bu Zhejiang Key Laboratory of Accessible Perception and Intelligent Systems, Zhejiang University College of Computer Science and Technology, Zhejiang University

DOI:

https://doi.org/10.1609/aaai.v40i45.41193

Abstract

Ensuring web accessibility is crucial for advancing social welfare, justice, and equality in digital spaces, yet the vast majority of website user interfaces remain non-compliant, due in part to the resource-intensive and unscalable nature of current auditing practices. While WCAG-EM offers a structured methodology for site-wise conformance evaluation, it involves great human efforts and lacks practical support for execution at scale. In this work, we present an auditing framework, AAA, which operationalizes WCAG-EM through a human-AI partnership model. AAA is anchored by two key innovations: GRASP, a graph-based multimodal sampling method that ensures representative page coverage via learned embeddings of visual, textual, and relational cues; and MaC, a multimodal large language model-based copilot strategy that supports auditors through cross-modal reasoning and intelligent assistance in high-effort tasks. Together, these components enable scalable, end-to-end web accessibility auditing, empowering human auditors with AI-enhanced assistance for real-world impact. We further contribute four novel datasets designed for benchmarking core stages of the audit pipeline. Extensive experiments demonstrate the effectiveness of our methods, providing insights that small-scale language models can serve as capable experts when fine-tuned.

Towards Scalable Web Accessibility Audit with MLLMs as Copilots

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information