Object-Centric Latent Action Learning

Albina Klepach; Alexander Nikulin; Ilya Zisman; Denis Tarasov; Alexander Derevyagin; Andrei Polubarov; Nikita Lyubaykin; Igor Kiselev; Vladislav Kurenkov

doi:10.1609/aaai.v40i27.39423

Authors

Albina Klepach dunnolab.ai
Alexander Nikulin dunnolab.ai Moscow State University
Ilya Zisman dunnolab.ai
Denis Tarasov dunnolab.ai
Alexander Derevyagin dunnolab.ai Higher School of Economics
Andrei Polubarov dunnolab.ai
Nikita Lyubaykin dunnolab.ai Innopolis University
Igor Kiselev Accenture
Vladislav Kurenkov dunnolab.ai Innopolis University

DOI:

https://doi.org/10.1609/aaai.v40i27.39423

Abstract

Leveraging vast amounts of unlabeled internet video data for embodied AI is currently bottlenecked by the lack of action labels and the presence of action-correlated visual distractors. Although recent latent action policy optimization (LAPO) has shown promise in inferring proxy action labels from visual observations, its performance degrades significantly when distractors are present. To address this limitation, we propose a novel object-centric latent action learning framework that centers on objects rather than pixels. We leverage self-supervised object-centric pretraining to disentangle the movement of the agent and distracting background dynamics. This allows LAPO to focus on task-relevant interactions, resulting in more robust proxy-action labels, enabling better imitation learning and efficient adaptation of the agent with just a few action-labeled trajectories. We evaluated our method in eight visually complex tasks across the Distracting Control Suite (DCS) and Distracting MetaWorld (DMW). Our results show that object-centric pretraining mitigates the negative effects of distractors by 50%, as measured by downstream task performance: average return (DCS) and success rate (DMW).

Object-Centric Latent Action Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information