ComLQ: Benchmarking Complex Logical Queries in Information Retrieval
DOI:
https://doi.org/10.1609/aaai.v40i40.40706Abstract
Information retrieval (IR) systems play a critical role in navigating information overload across various applications. Existing IR benchmarks primarily focus on simple queries that are semantically analogous to single- and multi-hop relations, overlooking complex logical queries involving first-order logic operations such as conjunction (∧), disjunction (∨), and negation (¬). Thus, these benchmarks can not be used to sufficiently evaluate the performance of IR models on complex queries in real-world scenarios. To address this problem, we propose a novel method leveraging large language models (LLMs) to construct a new IR dataset ComLQ for Complex Logical Queries, which comprises 2,909 queries and 11,251 candidate passages. A key challenge in constructing the dataset lies in capturing the underlying logical structures within unstructured text. Therefore, by designing the subgraph-guided prompt with the subgraph indicator, an LLM (such as GPT-4o) is guided to generate queries with specific logical structures based on selected passages. All query-passage pairs in ComLQ are ensured structure conformity and evidence distribution through expert annotation. To better evaluate whether retrievers can handle queries with negation, we further propose a new evaluation metric, Log-Scaled Negation Consistency (LSNC@K). As a supplement to standard relevance-based metrics (such as nDCG and mAP), LSNC@K measures whether top-K retrieved passages violate negation conditions in queries. Our experimental results under zero-shot settings demonstrate existing retrieval models' limited performance on complex logical queries, especially on queries with negation, exposing their inferior capabilities of modeling exclusion. In summary, our ComLQ offers a comprehensive and fine-grained exploration, paving the way for future research on complex logical queries in IR.Downloads
Published
2026-03-14
How to Cite
Xu, G., Yin, Z., Zhang, L., Liang, J., Lu, W., Zhang, X., … Yang, D. (2026). ComLQ: Benchmarking Complex Logical Queries in Information Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 40(40), 34115–34123. https://doi.org/10.1609/aaai.v40i40.40706
Issue
Section
AAAI Technical Track on Natural Language Processing V