Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution
DOI:
https://doi.org/10.1609/aaai.v40i22.38906Abstract
Task scheduling has become increasingly critical for embodied AI, where agents need to follow natural language instructions and execute actions efficiently in 3D physical worlds. Existing datasets for task planning in 3D environments often simplify the problem, lacking operations research knowledge for task scheduling and 3D grounding for real-world applications. In this work, we propose Operations Research Knowledge-based 3D Grounded Task Scheduling (OKS3D), a new task that requires synerization of language understanding, 3D grounding, and efficiency optimization for embodied agents. OKS3D reflects real-world demands by requiring agents to generate efficient, step-by-step schedules that are grounded in 3D space. To facilitate research on OKS3D, we construct a large-scale dataset called OKS3D-60K, comprising 60K tasks across 4K real-world scenes. Furthermore, we propose GRANT, an embodied multi-modal large language model equipped with a simple yet effective scheduling token mechanism to generate efficient task schedules and grounded actions. Extensive experiments on the OKS3D-60K dataset validate the effectiveness of GRANT across language understanding, 3D grounding, and scheduling efficiency.Downloads
Published
2026-03-14
How to Cite
Liang, D., Zhang, C., Xu, X., Ju, J., Luo, Z., & Bai, X. (2026). Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution. Proceedings of the AAAI Conference on Artificial Intelligence, 40(22), 18415-18424. https://doi.org/10.1609/aaai.v40i22.38906
Issue
Section
AAAI Technical Track on Intelligent Robotics