Personalize Before Retrieve: LLM-based Personalized Query Expansion for User-Centric Retrieval

Authors

  • Yingyi Zhang School of Economics and Management, Dalian University of Technology Department of Data Science, City University of Hong Kong
  • Pengyue Jia Department of Data Science, City University of Hong Kong
  • Derong Xu School of Artificial Intelligence and Data Science, University of Science and Technology of China Department of Data Science, City University of Hong Kong
  • Yi Wen Department of Data Science, City University of Hong Kong
  • Xianneng Li School of Economics and Management, Dalian University of Technology
  • Yichao Wang Huawei Technologies Ltd.
  • Wenlin Zhang Department of Data Science, City University of Hong Kong
  • Xiaopeng Li Department of Data Science, City University of Hong Kong
  • Weinan Gan Huawei Technologies Ltd.
  • Huifeng Guo Huawei Technologies Ltd.
  • Yong Liu Huawei Technologies Ltd.
  • Xiangyu Zhao Department of Data Science, City University of Hong Kong

DOI:

https://doi.org/10.1609/aaai.v40i19.38679

Abstract

Retrieval-Augmented Generation (RAG) critically depends on effective query expansion to retrieve relevant information. However, existing expansion methods adopt uniform strategies that overlook user-specific semantics, ignoring individual expression styles, preferences, and historical context. In practice, identical queries in text can express vastly different intentions across users. This representational rigidity limits the ability of current RAG systems to generalize effectively in personalized settings. Specifically, we identify two core challenges for personalization: 1) user expression styles are inherently diverse, making it difficult for standard expansions to preserve personalized intent. 2) user corpora induce heterogeneous semantic structures—varying in topical focus and lexical organization—which hinders the effective anchoring of expanded queries within the user’s corpora space. To address these challenges, we propose Personalize Before Retrieve (PBR), a framework that incorporates user-specific signals into query expansion prior to retrieval. PBR consists of two components: P-PRF, which generates stylistically aligned pseudo feedback using user history for simulating user expression style, and P-Anchor, which performs graph-based structure alignment over user corpora to capture its structure. Together, they produce personalized query representations tailored for retrieval. Experiments on two personalized benchmarks show that PBR consistently outperforms strong baselines, with up to 10% gains on PersonaBench across retrievers. Our findings demonstrate the value of modeling personalization before retrieval to close the semantic gap in user-adaptive RAG systems.

Published

2026-03-14

How to Cite

Zhang, Y., Jia, P., Xu, D., Wen, Y., Li, X., Wang, Y., … Zhao, X. (2026). Personalize Before Retrieve: LLM-based Personalized Query Expansion for User-Centric Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 40(19), 16406–16414. https://doi.org/10.1609/aaai.v40i19.38679

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management III