TCoT: Trajectory Chain-of-Thoughts for Robotic Manipulation with Failure Recovery in Vision-Language-Action Model

Xiang Li; Ya-Li Li; Yuan Wang; Huaqiang Wang; Shengjin Wang

doi:10.1609/aaai.v40i8.37577

Authors

Xiang Li Department of Electronic Engineering , Tsinghua University, China Beijing National Research Center for Information Science and Technology (BNRist), China
Ya-Li Li Department of Electronic Engineering , Tsinghua University, China Beijing National Research Center for Information Science and Technology (BNRist), China
Yuan Wang Department of Electronic Engineering , Tsinghua University, China Beijing National Research Center for Information Science and Technology (BNRist), China
Huaqiang Wang Department of Electronic Engineering , Tsinghua University, China Beijing National Research Center for Information Science and Technology (BNRist), China
Shengjin Wang Department of Electronic Engineering , Tsinghua University, China Beijing National Research Center for Information Science and Technology (BNRist), China

DOI:

https://doi.org/10.1609/aaai.v40i8.37577

Abstract

Recent advances in vision-language-action (VLA) models have demonstrated impressive generalization for robotic manipulation. However, these models often operate by directly mapping visual and linguistic inputs to subsequent actions, lacking intermediate task planning, along with failure detection and recovery ability. These limitations prevent them from effectively decomposing complex tasks, recognizing problems, and correcting erroneous actions, ultimately resulting in complete task failure. This significantly hinders their ability to perform long-horizon tasks and generalization ability. To this end, we introduce TCoT: Trajectory Chain-of-Thought, a unified VLA framework that enhances this direct mapping with trajectory planning as well as failure detection and recovery. TCoT leverages hierarchy trajectories as a precise and compact representation of CoT reasoning for manipulation: global planning provides a high-level, goal-oriented trajectory to guide the robot toward its task objective, while local planning focuses on real-time adjustments to address dynamic changes. Moreover, we designed the Global-Local Switching Recovery algorithm that detects and effectively recovers from failures. Experimental results reveal that TCoT surpasses the state-of-the-art methods across both real and simulated scenarios and exhibits superior generalization capabilities.

TCoT: Trajectory Chain-of-Thoughts for Robotic Manipulation with Failure Recovery in Vision-Language-Action Model

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information