Gesture Annotation With a Visual Search Engine for Multimodal Communication Research
Keywords:multimodal communication, gesture recognition, computer vision
Human communication is multimodal and includes elements such as gesture and facial expression along with spoken language. Modern technology makes it feasible to capture all such aspects of communication in natural settings. As a result, similar to fields such as genetics, astronomy and neuroscience, scholars in areas such as linguistics and communication studies are on the verge of a data-driven revolution in their fields. These new approaches require analytical support from machine learning and artificial intelligence to develop tools to help process the vast data repositories. The Distributed Little Red Hen Lab project is an international team of interdisciplinary researchers building a large-scale infrastructure for data-driven multimodal communications research. In this paper, we describe a machine learning system developed to automatically annotate a large database of television program videos as part of this project. The annotations mark regions where people or speakers are on screen along with body part motions including head, hand and shoulder motion. We also annotate a specific class of gestures known as timeline gestures. An existing gesture annotation tool, ELAN, can be used with these annotations to quickly locate gestures of interest. Finally, we provide an update mechanism for the system based on human feedback. We empirically evaluate the accuracy of the system as well as present data from pilot human studies to show its effectiveness at aiding gesture scholars in their work.