Real-Time Coordination in Human-Robot Interaction Using Face and Voice


  • Gabriel Skantze Royal Institute of Technology (KTH)



When humans interact and collaborate with each other, they coordinate their turn-taking behaviors using verbal and nonverbal signals, expressed in the face and voice. If robots of the future are supposed to engage in social interaction with humans, it is essential that they can generate and understand these behaviors. In this article, I give an overview of several studies that show how humans in interaction with a humanlike robot make use of the same coordination signals typically found in studies on human-human interaction, and that it is possible to automatically detect and combine these cues to facilitate real-time coordination. The studies also show that humans react naturally to such signals when used by a robot, without being given any special instructions. They follow the gaze of the robot to disambiguate referring expressions, they conform when the robot selects the next speaker using gaze, and they respond naturally to subtle cues, such as gaze aversion, breathing, facial gestures and hesitation sounds.




How to Cite

Skantze, G. (2017). Real-Time Coordination in Human-Robot Interaction Using Face and Voice. AI Magazine, 37(4), 19-31.