Learning Diffusion Policy from Primitive Skills for Robot Manipulation
DOI:
https://doi.org/10.1609/aaai.v40i22.38889Abstract
Diffusion policies have recently shown great promise for generating actions in robotic manipulation. However, existing approaches often rely on global instructions to produce short-term control signals, which can result in misalignment in action generation. We conjecture that the primitive skills, referred to as fine-grained, short-horizon manipulations, such as "move up" and "open the gripper", provide a more intuitive and effective interface for robot learning. To bridge this gap, we propose SDP, a skill-conditioned diffusion policy that integrates interpretable skill learning with conditional action planning. SDP abstracts eight reusable primitive skills across tasks and employs a vision-language model to extract discrete representations from visual observations and language instructions. Based on the representations, a lightweight router network is designed to assign a desired primitive skill for each state, which helps construct a single-skill policy to generate skill-aligned actions. By decomposing complex tasks into a sequence of primitive skills and selecting a single-skill policy, the proposed SDP ensures skill-consistent behavior across diverse tasks. Extensive experiments on two challenging simulation benchmarks and real-world robot deployments demonstrate that SDP consistently outperforms state-of-the-art methods, providing a new paradigm for skill-based robot learning with diffusion policies.Published
2026-03-14
How to Cite
Gu, Z., Yang, M., Zou, D., & Xu, D. (2026). Learning Diffusion Policy from Primitive Skills for Robot Manipulation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(22), 18262–18270. https://doi.org/10.1609/aaai.v40i22.38889
Issue
Section
AAAI Technical Track on Intelligent Robotics