A Testbed for Learning by Demonstration from Natural Language and RGB-Depth Video

Young Chol Song; Henry Kautz

doi:10.1609/aaai.v26i1.8430

A Testbed for Learning by Demonstration from Natural Language and RGB-Depth Video

Authors

Young Chol Song University of Rochester
Henry Kautz University of Rochester

DOI:

https://doi.org/10.1609/aaai.v26i1.8430

Abstract

We are developing a testbed for learning by demonstration combining spoken language and sensor data in a natural real-world environment. Microsoft Kinect RGB-Depth cameras allow us to infer high-level visual features, such as the relative position of objects in space, with greater precision and less training than required by traditional systems. Speech is recognized and parsed using a “deep” parsing system, so that language features are available at the word, syntactic, and semantic levels. We collected an initial data set of 10 episodes of 7 individuals demonstrating how to “make tea”, and created a “gold standard” hand annotation of the actions performed in each. Finally, we are constructing “baseline” HMM-based activity recognition models using the visual and language features, in order to be ready to evaluate the performance of our future work on deeper and more structured models.

Downloads

Published

2021-09-20

How to Cite

Song, Y. C., & Kautz, H. (2021). A Testbed for Learning by Demonstration from Natural Language and RGB-Depth Video. Proceedings of the AAAI Conference on Artificial Intelligence, 26(1), 2457-2458. https://doi.org/10.1609/aaai.v26i1.8430

Download Citation

Issue

Vol. 26 No. 1 (2012): Twenty-Sixth AAAI Conference on Artificial Intelligence

Section

Student Abstract Track

A Testbed for Learning by Demonstration from Natural Language and RGB-Depth Video

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription