VBF++: Variational Bayesian Fusion with Context-Aware Priors and Recommendation-Guided Adversarial Refinement for Multimodal Video Recommendation
DOI:
https://doi.org/10.1609/aaai.v40i17.38469Abstract
Multimodal video recommendation systems face fundamental challenges in determining optimal fusion strategies across diverse content types and user preferences. Existing methods suffer from two critical limitations: (1) their fusion strategies are guided by context-agnostic priors that ignore the semantic structure of content, assuming the same simple distribution (typically a standard multivariate Gaussian prior) governs optimal fusion for all video types, and (2) their optimization objectives, particularly the Evidence Lower Bound (ELBO), are misaligned with the final recommendation goal, optimizing for feature reconstruction rather than ranking performance. To address these fundamental issues, this work proposes VBF++, a novel framework that introduces context-aware structured priors and recommendation-guided adversarial refinement. First, the method designs context-aware priors that learn cluster-specific distributions based on video semantic categories, replacing uninformative priors with structured, content-aware prior distributions. Second, it introduces a Recommendation-Guided Adversarial Refinement (RAR) paradigm that explicitly steers the learning process towards generating recommendation-optimal fusion strategies, resolving the objective misalignment inherent in variational learning. Enhanced with domain-adaptive meta-learning, extensive experiments on three real-world datasets demonstrate consistent improvements of 4.7-8.3 percent in Precision@10 over state-of-the-art methods. Analysis reveals that learned fusion strategies exhibit semantically meaningful patterns, prioritizing visual features for action content, acoustic information for music videos, and textual descriptions for documentary material.Published
2026-03-14
How to Cite
Cao, Z., Liu, R., & Chen, Y. (2026). VBF++: Variational Bayesian Fusion with Context-Aware Priors and Recommendation-Guided Adversarial Refinement for Multimodal Video Recommendation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(17), 14520–14528. https://doi.org/10.1609/aaai.v40i17.38469
Issue
Section
AAAI Technical Track on Data Mining & Knowledge Management I