EVOKE: Efficient and High-Fidelity EEG-to-Video Reconstruction via Decoupling Implicit Neural Representation
DOI:
https://doi.org/10.1609/aaai.v40i7.37472Abstract
Visual neural decoding is an important research topic at the intersection of cognitive neuroscience and machine learning. While recent progress has been made in EEG-based neural decoding, reconstructing dynamic visual content remains challenging. In the field of EEG decoding, current models either utilize pre-trained encoders for feature extraction or employ graph neural networks to represent the spatio-temporal information embedding, resulting in poor model representation and high complexity. We propose EVOKE -- an innovative framework for zero-shot decoding of high-fidelity videos from EEG signals. EVOKE employs Implicit Neural Representations to perform complete spatial modeling of EEG and continuously decouples information in the EEG-INR perceptual space. Additionally, we construct a Hierarchical-aware Attention Module (HAM) to decode EEG from three feature anchors: visual, semantic, motion, and progressively control task inference. The Motion Attention Flow (MAF) we developed overcomes the limitations of capturing motion features in dynamic stimuli, creating a more robust representation that enhances reconstruction consistency. Comprehensive experiments prove that SOTA performance of EVOKE (0.353 SSIM, 0.715 CLIP-pcc). We provide an effective method for converting brain activity into rich visual experiences and set a new benchmark for brain multimodal generation.Published
2026-03-14
How to Cite
Jing, H., Yang, P., Jiang, D., Liu, Z., Zheng, N., & Ma, Y. (2026). EVOKE: Efficient and High-Fidelity EEG-to-Video Reconstruction via Decoupling Implicit Neural Representation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(7), 5539-5547. https://doi.org/10.1609/aaai.v40i7.37472
Issue
Section
AAAI Technical Track on Computer Vision IV