A Hybrid Probabilistic Approach for Table Understanding


  • Kexuan Sun University of Southern California
  • Harsha Rayudu University of Southern California
  • Jay Pujara University of Southern California




Other Foundations of Data Mining & Knowledge Mana, Applications, Probabilistic Graphical Models, Neuro-Symbolic AI (NSAI)


Tables of data are used to record vast amounts of socioeconomic, scientific, and governmental information. Although humans create tables using underlying organizational principles, unfortunately AI systems struggle to understand the contents of these tables. This paper introduces an end-to-end system for table understanding, the process of capturing the relational structure of data in tables. We introduce models that identify cell types, group these cells into blocks of data that serve a similar functional role, and predict the relationships between these blocks. We introduce a hybrid, neuro-symbolic approach, combining embedded representations learned from thousands of tables with probabilistic constraints that capture regularities in how humans organize tables. Our neuro-symbolic model is better able to capture positional invariants of headers and enforce homogeneity of data types. One limitation in this research area is the lack of rich datasets for evaluating end-to-end table understanding, so we introduce a new benchmark dataset comprised of 431 diverse tables from data.gov. The evaluation results show that our system achieves the state-of-the-art performance on cell type classification, block identification, and relationship prediction, improving over prior efforts by up to 7% of macro F1 score.




How to Cite

Sun, K., Rayudu, H., & Pujara, J. (2021). A Hybrid Probabilistic Approach for Table Understanding. Proceedings of the AAAI Conference on Artificial Intelligence, 35(5), 4366-4374. https://doi.org/10.1609/aaai.v35i5.16562



AAAI Technical Track on Data Mining and Knowledge Management