Rethink Representation Learning for Questionnaire Data

Authors

  • Guanhua Ye Beijing University of Posts and Telecommunications
  • Jifeng He Beijing University of Posts and Telecommunications
  • Yan Li Beijing University of Posts and Telecommunications
  • Junping Du Beijing University of Posts and Telecommunications
  • Zhe Xue Beijing University of Posts and Telecommunications
  • Yingxia Shao Beijing University of Posts and Telecommunications
  • Meiyu Liang Beijing University of Posts and Telecommunications
  • Yawen Li Beijing University of Posts and Telecommunications

DOI:

https://doi.org/10.1609/aaai.v40i33.40000

Abstract

Questionnaire data serve as a valuable resource across numerous scientific domains, offering insights into human behavior, health, and social trends. Traditional downsampling-based representation learning methods—such as standardization and one-hot encoding—reformat these data into tabular structures that inherently discard semantic richness and obscure inter-sample and inter-feature relationships. Consequently, advanced deep learning models often underperform compared to simpler approaches like gradient-boosted decision trees (GBDT), due to their limited capacity to extract meaningful representations from semantically sparse inputs. To address this limitation, we introduce SemantiQ, a novel upsampling-based representation learning framework that embeds questionnaire responses into a unified semantic space. Leveraging Retrieval-Augmented Generation (RAG) in conjunction with large language models (LLMs), SemantiQ transforms question text, option text, and external knowledge into semantically enriched natural language statements. These statements are then encoded into semantic embeddings, which are further refined through a three-stage training mechanism and test-time training (TTT), enabling the model to capture complex sample- and feature-wise dependencies. Extensive experiments on multiple real-world datasets demonstrate that SemantiQ consistently outperforms state-of-the-art baselines.

Downloads

Published

2026-03-14

How to Cite

Ye, G., He, J., Li, Y., Du, J., Xue, Z., Shao, Y., … Li, Y. (2026). Rethink Representation Learning for Questionnaire Data. Proceedings of the AAAI Conference on Artificial Intelligence, 40(33), 27782–27790. https://doi.org/10.1609/aaai.v40i33.40000

Issue

Section

AAAI Technical Track on Machine Learning X