TB-HSU: Hierarchical 3D Scene Understanding with Contextual Affordances

Wenting Xu; Viorela Ila; Luping Zhou; Craig T. Jin

doi:10.1609/aaai.v39i9.32969

Authors

Wenting Xu School of Electrical and Computer Engineering, The University of Sydney
Viorela Ila School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney
Luping Zhou School of Electrical and Computer Engineering, The University of Sydney
Craig T. Jin School of Electrical and Computer Engineering, The University of Sydney

DOI:

https://doi.org/10.1609/aaai.v39i9.32969

Abstract

The concept of function and affordance is a critical aspect of 3D scene understanding and supports task-oriented objectives. In this work, we develop a model that learns to structure and vary functional affordance across a 3D hierarchical scene graph representing the spatial organization of a scene. The varying functional affordance is designed to integrate with the varying spatial context of the graph. More specifically, we develop an algorithm that learns to construct a 3D hierarchical scene graph (3DHSG) that captures the spatial organization of the scene. Starting from segmented object point clouds and object semantic labels, we develop a 3DHSG with a top node that identifies the room label, child nodes that define local spatial regions inside the room with region-specific affordances, and grand-child nodes indicating object locations and object-specific affordances. To support this work, we create a custom 3DHSG dataset that provides ground truth data for local spatial regions with region-specific affordances and also object-specific affordances for each object. We employ a Transformer Based Hierarchical Scene Understanding (TB-HSU) model to learn the 3DHSG. We use a multi-task learning framework that learns both room classification and learns to define spatial regions within the room with region-specific affordances. Our work improves on the performance of state-of-the-art baseline models and shows one approach for applying transformer models to 3D scene understanding and the generation of 3DHSGs that capture the spatial organization of a room. The code and dataset are publicly available.

TB-HSU: Hierarchical 3D Scene Understanding with Contextual Affordances

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information