Gesture Annotation With a Visual Search Engine for Multimodal Communication Research


  • Sergiy Turchyn Case Western Reserve University
  • Inés Olza Moreno Institute for Culture and Society, University of Navarra
  • Cristóbal Pagán Cánovas Institute for Culture and Society, University of Navarra
  • Francis Steen University of California-Los Angeles
  • Mark Turner Case Western Reserve University
  • Javier Valenzuela University of Murcia
  • Soumya Ray Case Western Reserve University



multimodal communication, gesture recognition, computer vision


Human communication is multimodal and includes elements such as gesture and facial expression along with spoken language. Modern technology makes it feasible to capture all such aspects of communication in natural settings. As a result, similar to fields such as genetics, astronomy and neuroscience, scholars in areas such as linguistics and communication studies are on the verge of a data-driven revolution in their fields. These new approaches require analytical support from machine learning and artificial intelligence to develop tools to help process the vast data repositories. The Distributed Little Red Hen Lab project is an international team of interdisciplinary researchers building a large-scale infrastructure for data-driven multimodal communications research. In this paper, we describe a machine learning system developed to automatically annotate a large database of television program videos as part of this project. The annotations mark regions where people or speakers are on screen along with body part motions including head, hand and shoulder motion. We also annotate a specific class of gestures known as timeline gestures. An existing gesture annotation tool, ELAN, can be used with these annotations to quickly locate gestures of interest. Finally, we provide an update mechanism for the system based on human feedback. We empirically evaluate the accuracy of the system as well as present data from pilot human studies to show its effectiveness at aiding gesture scholars in their work.




How to Cite

Turchyn, S., Olza Moreno, I., Pagán Cánovas, C., Steen, F., Turner, M., Valenzuela, J., & Ray, S. (2018). Gesture Annotation With a Visual Search Engine for Multimodal Communication Research. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1).