CognitionCapturer: Decoding Visual Stimuli from Human EEG Signal with Multimodal Information

Authors

  • Kaifan Zhang School of Electronic Engineering, Xidian University, Xi’an, China
  • Lihuo He School of Electronic Engineering, Xidian University, Xi’an, China
  • Xin Jiang School of Electronic Engineering, Xidian University, Xi’an, China
  • Wen Lu School of Electronic Engineering, Xidian University, Xi’an, China
  • Di Wang School of Computer Science and Technology, Xidian University, Xi’an, China
  • Xinbo Gao School of Electronic Engineering, Xidian University, Xi’an, China Chongqing University of Posts and Telecommunications, Chongqing, China

DOI:

https://doi.org/10.1609/aaai.v39i13.33587

Abstract

Electroencephalogram (EEG) signals have attracted significant attention from researchers due to their non-invasive nature and high temporal sensitivity in decoding visual stimuli. However, most recent studies have focused solely on the relationship between EEG and image data pairs, neglecting the valuable "beyond-image-modality" information embedded in EEG signals. This results in the loss of critical multimodal information in EEG. To address the limitation, this paper proposes a unified framework that fully leverages multimodal data to represent EEG signals, named CognitionCapturer. Specifically, CognitionCapturer trains modality expert encoders for each modality to extract cross-modal information from the EEG modality. Then, it introduces a diffusion prior to map the EEG embedding space to the CLIP embedding space, followed by using a pretrained generative model, the proposed framework can reconstruct visual stimuli with high semantic and structural fidelity. Notably, the framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities. Through extensive experiments, we demonstrate that CognitionCapturer outperforms state-of-the-art methods both qualitatively and quantitatively.

Downloads

Published

2025-04-11

How to Cite

Zhang, K., He, L., Jiang, X., Lu, W., Wang, D., & Gao, X. (2025). CognitionCapturer: Decoding Visual Stimuli from Human EEG Signal with Multimodal Information. Proceedings of the AAAI Conference on Artificial Intelligence, 39(13), 14486-14493. https://doi.org/10.1609/aaai.v39i13.33587

Issue

Section

AAAI Technical Track on Humans and AI