An Empirical Study of Content Understanding in Conversational Question Answering
With a lot of work about context-free question answering systems, there is an emerging trend of conversational question answering models in the natural language processing field. Thanks to the recently collected datasets, including QuAC and CoQA, there has been more work on conversational question answering, and recent work has achieved competitive performance on both datasets. However, to best of our knowledge, two important questions for conversational comprehension research have not been well studied: 1) How well can the benchmark dataset reflect models' content understanding? 2) Do the models well utilize the conversation content when answering questions? To investigate these questions, we design different training settings, testing settings, as well as an attack to verify the models' capability of content understanding on QuAC and CoQA. The experimental results indicate some potential hazards in the benchmark datasets, QuAC and CoQA, for conversational comprehension research. Our analysis also sheds light on both what models may learn and how datasets may bias the models. With deep investigation of the task, it is believed that this work can benefit the future progress of conversation comprehension. The source code is available at https://github.com/MiuLab/CQA-Study.