Assessing Vulnerabilities in State-of-the-Art Large Language Models Through Hex Injection (Student Abstract)

Authors

  • Da Cheng Gu University of Technology Sydney
  • Wei Liu University of Technology Sydney

DOI:

https://doi.org/10.1609/aaai.v39i28.35257

Abstract

State-of-the-art large language models (LLMs) are designed with robust safeguards to prevent the disclosure of harmful information and dangerous procedures. However, "jailbreaking" techniques can circumvent these protections by exploiting vulnerabilities in the models. This paper introduces a novel method, Hex Injection, which leverages a specific weakness in LLMs' ability to decode encoded text to uncover concealed dangerous instructions. Hex Injection distinguishes itself from traditional methods by combining encoded instructions with plaintext prompts to reveal unsafe content more effectively. Our approach involves encoding potentially malicious prompts in hexadecimal and integrating them. We observe a 94% average success rate (ASR) with a combination of plaintext, encoded, and role-play for Llama 3 and 3.1 models, and an 86% ASR for the Gemma 2 model. This research not only advances the understanding of LLM security but also offers valuable insights for improving safety mechanisms in artificial intelligence systems.

Published

2025-04-11

How to Cite

Gu, D. C., & Liu, W. (2025). Assessing Vulnerabilities in State-of-the-Art Large Language Models Through Hex Injection (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 39(28), 29377–29378. https://doi.org/10.1609/aaai.v39i28.35257