Learning Diffusion Policy from Primitive Skills for Robot Manipulation

Authors

  • Zhihao Gu Department of Computer Science, School of Computing and Data Science, The University of Hong Kong
  • Ming Yang School of Software, Beihang University
  • Difan Zou Department of Computer Science, School of Computing and Data Science, The University of Hong Kong
  • Dong Xu Department of Computer Science, School of Computing and Data Science, The University of Hong Kong

DOI:

https://doi.org/10.1609/aaai.v40i22.38889

Abstract

Diffusion policies have recently shown great promise for generating actions in robotic manipulation. However, existing approaches often rely on global instructions to produce short-term control signals, which can result in misalignment in action generation. We conjecture that the primitive skills, referred to as fine-grained, short-horizon manipulations, such as "move up" and "open the gripper", provide a more intuitive and effective interface for robot learning. To bridge this gap, we propose SDP, a skill-conditioned diffusion policy that integrates interpretable skill learning with conditional action planning. SDP abstracts eight reusable primitive skills across tasks and employs a vision-language model to extract discrete representations from visual observations and language instructions. Based on the representations, a lightweight router network is designed to assign a desired primitive skill for each state, which helps construct a single-skill policy to generate skill-aligned actions. By decomposing complex tasks into a sequence of primitive skills and selecting a single-skill policy, the proposed SDP ensures skill-consistent behavior across diverse tasks. Extensive experiments on two challenging simulation benchmarks and real-world robot deployments demonstrate that SDP consistently outperforms state-of-the-art methods, providing a new paradigm for skill-based robot learning with diffusion policies.

Downloads

Published

2026-03-14

How to Cite

Gu, Z., Yang, M., Zou, D., & Xu, D. (2026). Learning Diffusion Policy from Primitive Skills for Robot Manipulation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(22), 18262–18270. https://doi.org/10.1609/aaai.v40i22.38889

Issue

Section

AAAI Technical Track on Intelligent Robotics