RETQA: A Large-Scale Open-Domain Tabular Question Answering Dataset for Real Estate Sector

Authors

  • Zhensheng Wang School of Artificial Intelligence, Beijing Normal University, Beijing
  • Wenmian Yang Institute of Artificial Intelligence and Future Networks, Beijing Normal University, Zhuhai
  • Kun Zhou School of Artificial Intelligence, Beijing Normal University, Beijing
  • Yiquan Zhang Elmleaf Ltd., Shanghai
  • Weijia Jia Institute of Artificial Intelligence and Future Networks, Beijing Normal University, Zhuhai BNU-UIC Institute of Artificial Intelligence and Future Networks, Beijing Normal University (Zhuhai), Guangdong Key Lab of AI and Multi-Modal Data Processing, BNU-HKBU United International College, Zhuhai, Guang Dong, PR China.

DOI:

https://doi.org/10.1609/aaai.v39i24.34734

Abstract

The real estate market relies heavily on structured data, such as property details, market trends, and price fluctuations. However, the lack of specialized Tabular Question Answering datasets in this domain limits the development of automated question-answering systems. To fill this gap, we introduce RETQA, the first large-scale open-domain Chinese Tabular Question Answering dataset for Real Estate. RETQA comprises 4,932 tables and 20,762 question-answer pairs across 16 sub-fields within three major domains: property information, real estate company finance information and land auction information. Compared with existing tabular question answering datasets, RETQA poses greater challenges due to three key factors: long-table structures, open-domain retrieval, and multi-domain queries. To tackle these challenges, we propose the SLUTQA framework, which integrates large language models with spoken language understanding tasks to enhance retrieval and answering accuracy. Extensive experiments demonstrate that SLUTQA significantly improves the performance of large language models on RETQA by in-context learning. RETQA and SLUTQA provide essential resources for advancing tabular question answering research in the real estate domain, addressing critical challenges in open-domain and long-table question-answering.

Downloads

Published

2025-04-11

How to Cite

Wang, Z., Yang, W., Zhou, K., Zhang, Y., & Jia, W. (2025). RETQA: A Large-Scale Open-Domain Tabular Question Answering Dataset for Real Estate Sector. Proceedings of the AAAI Conference on Artificial Intelligence, 39(24), 25452–25460. https://doi.org/10.1609/aaai.v39i24.34734

Issue

Section

AAAI Technical Track on Natural Language Processing III