CognitionCapturer: Decoding Visual Stimuli from Human EEG Signal with Multimodal Information
DOI:
https://doi.org/10.1609/aaai.v39i13.33587Abstract
Electroencephalogram (EEG) signals have attracted significant attention from researchers due to their non-invasive nature and high temporal sensitivity in decoding visual stimuli. However, most recent studies have focused solely on the relationship between EEG and image data pairs, neglecting the valuable "beyond-image-modality" information embedded in EEG signals. This results in the loss of critical multimodal information in EEG. To address the limitation, this paper proposes a unified framework that fully leverages multimodal data to represent EEG signals, named CognitionCapturer. Specifically, CognitionCapturer trains modality expert encoders for each modality to extract cross-modal information from the EEG modality. Then, it introduces a diffusion prior to map the EEG embedding space to the CLIP embedding space, followed by using a pretrained generative model, the proposed framework can reconstruct visual stimuli with high semantic and structural fidelity. Notably, the framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities. Through extensive experiments, we demonstrate that CognitionCapturer outperforms state-of-the-art methods both qualitatively and quantitatively.Downloads
Published
2025-04-11
How to Cite
Zhang, K., He, L., Jiang, X., Lu, W., Wang, D., & Gao, X. (2025). CognitionCapturer: Decoding Visual Stimuli from Human EEG Signal with Multimodal Information. Proceedings of the AAAI Conference on Artificial Intelligence, 39(13), 14486-14493. https://doi.org/10.1609/aaai.v39i13.33587
Issue
Section
AAAI Technical Track on Humans and AI