Segment-Then-Rank: Non-Factoid Question Answering on Instructional Videos


  • Kyungjae Lee Yonsei University
  • Nan Duan Microsoft Research Asia
  • Lei Ji University of Chinese Academy of Science
  • Jason Li Microsoft STCA Multimedia Group
  • Seung-won Hwang Yonsei University



We study the problem of non-factoid QA on instructional videos. Existing work focuses either on visual or textual modality of video content, to find matching answers to the question. However, neither is flexible enough for our problem setting of non-factoid answers with varying lengths. Motivated by this, we propose a two-stage model: (a) multimodal segmentation of video into span candidates and (b) length-adaptive ranking of the candidates to the question. First, for segmentation, we propose Segmenter for generating span candidates of diverse length, considering both textual and visual modality. Second, for ranking, we propose Ranker to score the candidates, dynamically combining the two models with complementary strength for both short and long spans respectively. Experimental result demonstrates that our model achieves state-of-the-art performance.




How to Cite

Lee, K., Duan, N., Ji, L., Li, J., & Hwang, S.- won. (2020). Segment-Then-Rank: Non-Factoid Question Answering on Instructional Videos. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 8147-8154.



AAAI Technical Track: Natural Language Processing