Automated Construction of Visual-Linguistic Knowledge via Concept Learning from Cartoon Videos

Jung-Woo Ha; Kyung-Min Kim; Byoung-Tak Zhang

doi:10.1609/aaai.v29i1.9225

Authors

Jung-Woo Ha Seoul National University
Kyung-Min Kim Seoul National University
Byoung-Tak Zhang Seoul National University

DOI:

https://doi.org/10.1609/aaai.v29i1.9225

Keywords:

Deep Concept Hierarchy, Multimodal Concept Learning, Hypergraphs, Graph Monte Carlo, Visual-Linguistic Knowledge, Vision-Language Translation, Cartoon Videos

Abstract

Learning mutually-grounded vision-language knowledge is a foundational task for cognitive systems and human-level artificial intelligence. Most of knowledge-learning techniques are focused on single modal representations in a static environment with a fixed set of data. Here, we explore an ecologically more-plausible setting by using a stream of cartoon videos to build vision-language concept hierarchies continuously. This approach is motivated by the literature on cognitive development in early childhood. We present the model of deep concept hierarchy (DCH) that enables the progressive abstraction of concept knowledge in multiple levels. We develop a stochastic method for graph construction, i.e. a graph Monte Carlo algorithm, to search efficiently the huge compositional space of the vision-language concepts. The concept hierarchies are built incrementally and can handle concept drift, allowing for being deployed in lifelong learning environments. Using a series of approximately 200 episodes of educational cartoon videos we demonstrate the emergence and evolution of the concept hierarchies as the video stories unfold. We also present the application of the deep concept hierarchies for context-dependent translation between vision and language, i.e. the transcription of a visual scene into text and the generation of visual imagery from text.

Automated Construction of Visual-Linguistic Knowledge via Concept Learning from Cartoon Videos

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information