GEM: Gaussian Embedding Modeling for Out-of-Distribution Detection in GUI Agents

Authors

  • Zheng Wu Shanghai Jiao Tong University
  • Pengzhou Cheng Shanghai Jiao Tong University
  • Zongru Wu Shanghai Jiao Tong University
  • Lingzhong Dong Shanghai Jiao Tong University
  • Zhuosheng Zhang Shanghai Jiao Tong University

DOI:

https://doi.org/10.1609/aaai.v40i40.40692

Abstract

Graphical user interface (GUI) agents have recently emerged as an intriguing paradigm for human-computer interaction, capable of automatically executing user instructions to operate intelligent terminal devices. However, when encountering out-of-distribution (OOD) instructions that violate environmental constraints or exceed the current capabilities of agents, GUI agents may suffer task breakdowns or even pose security threats. Therefore, effective OOD detection for GUI agents is essential. Traditional OOD detection methods perform suboptimally in this domain due to the complex embedding space and evolving GUI environments. In this work, we observe that the in-distribution input semantic space of GUI agents exhibits a clustering pattern with respect to the distance from the centroid. Based on the finding, we propose GEM, a novel method based on fitting a Gaussian mixture model over input embedding distances extracted from the GUI Agent that reflect its capability boundary. Evaluated on eight datasets spanning smartphones, computers, and web browsers, our method achieves an average accuracy improvement of 23.70% over the best-performing baseline while only increasing training time by 4.9% and testing time by 6.5%. We also experimentally demonstrate that GEM can improve the step-wise success rate by 9.40% by requesting assistance from the cloud model when encountering OOD samples. Analysis verifies the generalization ability of our method through experiments on nine different backbones.

Downloads

Published

2026-03-14

How to Cite

Wu, Z., Cheng, P., Wu, Z., Dong, L., & Zhang, Z. (2026). GEM: Gaussian Embedding Modeling for Out-of-Distribution Detection in GUI Agents. Proceedings of the AAAI Conference on Artificial Intelligence, 40(40), 33989–33997. https://doi.org/10.1609/aaai.v40i40.40692

Issue

Section

AAAI Technical Track on Natural Language Processing V