On the Feasibility of Using MultiModal LLMs to Execute AR Social Engineering Attacks

Ting Bi; Chenghang Ye; Zheyu Yang; Ziyi Zhou; Cui Tang; Zui Tao; Jun Zhang; Kailong Wang; Liting Zhou; Yang Yang; Tianlong Yu

doi:10.1609/aaai.v40i45.41164

Authors

Ting Bi Huazhong University of Science and Technology
Chenghang Ye Hubei University
Zheyu Yang Hubei University
Ziyi Zhou Hubei University
Cui Tang Hubei University
Zui Tao Huazhong University of Science and Technology
Jun Zhang Huazhong University of Science and Technology
Kailong Wang Huazhong University of Science and Technology
Liting Zhou Dublin City University
Yang Yang Hubei University
Tianlong Yu Hubei University

DOI:

https://doi.org/10.1609/aaai.v40i45.41164

Abstract

Augmented Reality (AR) and Multimodal Large Language Models (LLMs) are rapidly evolving, providing unprecedented capabilities for human-computer interaction. However, their integration introduces a new attack surface for Social Engineering (SE). In this paper, we systematically investigate the feasibility of orchestrating AR-driven Social Engineering attacks using Multimodal LLM for the first time, via our proposed SEAR framework, which operates through three key phases: (1) AR-based social context synthesis, which fuses Multimodal inputs (visual, auditory and environmental cues); (2) role-based Multimodal RAG (Retrieval-Augmented Generation), which dynamically retrieves and integrates social context; and (3) ReInteract social engineering agents, which execute adaptive multiphase attack strategies through inference interaction loops. To verify SEAR, we conducted an IRB-approved study with 60 participants and build a novel dataset of 180 annotated conversations in different social scenarios (e.g., coffee shops, networking events). Our results show that SEAR is highly effective at eliciting high-risk behaviors (e.g., 93.3% of participants susceptible to email phishing). The framework was particularly effective in building trust, with 85% of targets willing to accept an attacker's call after an interaction. Also, we identified notable limitations such as authenticity gaps. This work provides proof-of-concept for AR-LLM driven social engineering attacks and insights for developing defenses against next-generation AR/LLM-based SE threats.

On the Feasibility of Using MultiModal LLMs to Execute AR Social Engineering Attacks

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information