Data-Efficient and Contact-Rich Manipulation Through Diffusion Augmentation and Vision-Language Models

Daniel Seita

doi:10.1609/aaai.v40i47.41353

Data-Efficient and Contact-Rich Manipulation Through Diffusion Augmentation and Vision-Language Models

Authors

Daniel Seita University of Southern California

DOI:

https://doi.org/10.1609/aaai.v40i47.41353

Abstract

Recent progress in robot learning has produced impressive results, yet many systems still require learning from large datasets of demonstrations and are less effective in clutter or with highly deformable objects. This talk presents work on data-efficient manipulation using (i) diffusion-based augmentation that synthesizes geometrically consistent images and action labels to reduce demonstration requirements and (ii) Vision-Language Models (VLMs) that inject high-level semantics for contact-rich motion planning in clutter. We will also introduce ManipBench, which evaluates VLMs’ abilities for low-level manipulation. Together, we show how to move the community towards achieving robot manipulators that can learn and operate with reduced demonstration requirements across cluttered and real-world environments.

AAAI-26 / IAAI-26 / EAAI-26 Proceedings Cover

Downloads

PDF
Video

Published

2026-03-14

How to Cite

Seita, D. (2026). Data-Efficient and Contact-Rich Manipulation Through Diffusion Augmentation and Vision-Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(47), 39830–39830. https://doi.org/10.1609/aaai.v40i47.41353

Download Citation

Issue

Vol. 40 No. 47: AAAI-26 New Faculty Highlights, Journal Track, IAAI-26 and EAAI-26 Main Track

Section

New Faculty Highlights

Data-Efficient and Contact-Rich Manipulation Through Diffusion Augmentation and Vision-Language Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information