On the Feasibility of Using MultiModal LLMs to Execute AR Social Engineering Attacks

Authors

  • Ting Bi Huazhong University of Science and Technology
  • Chenghang Ye Hubei University
  • Zheyu Yang Hubei University
  • Ziyi Zhou Hubei University
  • Cui Tang Hubei University
  • Zui Tao Huazhong University of Science and Technology
  • Jun Zhang Huazhong University of Science and Technology
  • Kailong Wang Huazhong University of Science and Technology
  • Liting Zhou Dublin City University
  • Yang Yang Hubei University
  • Tianlong Yu Hubei University

DOI:

https://doi.org/10.1609/aaai.v40i45.41164

Abstract

Augmented Reality (AR) and Multimodal Large Language Models (LLMs) are rapidly evolving, providing unprecedented capabilities for human-computer interaction. However, their integration introduces a new attack surface for Social Engineering (SE). In this paper, we systematically investigate the feasibility of orchestrating AR-driven Social Engineering attacks using Multimodal LLM for the first time, via our proposed SEAR framework, which operates through three key phases: (1) AR-based social context synthesis, which fuses Multimodal inputs (visual, auditory and environmental cues); (2) role-based Multimodal RAG (Retrieval-Augmented Generation), which dynamically retrieves and integrates social context; and (3) ReInteract social engineering agents, which execute adaptive multiphase attack strategies through inference interaction loops. To verify SEAR, we conducted an IRB-approved study with 60 participants and build a novel dataset of 180 annotated conversations in different social scenarios (e.g., coffee shops, networking events). Our results show that SEAR is highly effective at eliciting high-risk behaviors (e.g., 93.3% of participants susceptible to email phishing). The framework was particularly effective in building trust, with 85% of targets willing to accept an attacker's call after an interaction. Also, we identified notable limitations such as authenticity gaps. This work provides proof-of-concept for AR-LLM driven social engineering attacks and insights for developing defenses against next-generation AR/LLM-based SE threats.

Published

2026-03-14

How to Cite

Bi, T., Ye, C., Yang, Z., Zhou, Z., Tang, C., Tao, Z., Zhang, J., Wang, K., Zhou, L., Yang, Y., & Yu, T. (2026). On the Feasibility of Using MultiModal LLMs to Execute AR Social Engineering Attacks. Proceedings of the AAAI Conference on Artificial Intelligence, 40(45), 38252-38260. https://doi.org/10.1609/aaai.v40i45.41164

Issue

Section

AAAI Special Track on AI for Social Impact I