DGV: Fusing Dynamic Graphs and Vision-Language Models for Collaborative Dual-Arm Task Planning

Authors

  • Yapeng Pang East China Normal University
  • Junjie Xu East China Normal University
  • Zhidong Qiao Harbin Institute of Technology
  • Peng Du Zhejiang University
  • Xinyu Zhang East China Normal University

DOI:

https://doi.org/10.1609/icaps.v36i1.42895

Abstract

Dual-arm collaborative manipulation in dynamic, unstructured environments is profoundly challenging, requiring real-time handling of high-dimensional physical constraints alongside dynamic scene understanding and adaptation to high-level natural language instructions. To address these challenges, we propose the Dynamic Graph Vision-Language Model (DGV), a novel dynamic task planning framework that seamlessly integrates GNNs and VLMs. It first leverages a pre-trained VLM to integrate perceptual and semantic processing, accurately extracting object states and complex manipulation intents from the environment. This extracted information is then encoded into a dynamic spatio-temporal graph that models the robot's kinematic structure, environmental object relations, and temporal dependencies within a single, unified representation. We propose a real-time local subgraph update mechanism, which is designed to cope with rapid environmental changes. This mechanism ensures immediate action adjustments and efficient replanning based on fresh visual feedback, dramatically improving dynamic adaptability. Utilizing the updated graph structure, DGV performs efficient reasoning to generate continuous, stable, and robust dual-arm collaborative motion sequences. Our experimental results across both simulation and real-world robot platforms demonstrate that DGV achieves a task success rate nearly 20% higher than current state-of-the-art methods, while exhibiting superior performance in dynamic adaptability and robustness.

Downloads

Published

2026-06-08

How to Cite

Pang, Y., Xu, J., Qiao, Z., Du, P., & Zhang, X. (2026). DGV: Fusing Dynamic Graphs and Vision-Language Models for Collaborative Dual-Arm Task Planning. Proceedings of the International Conference on Automated Planning and Scheduling, 36(1), 747–756. https://doi.org/10.1609/icaps.v36i1.42895