PUPS: Point Cloud Unified Panoptic Segmentation

Authors

  • Shihao Su Zhejiang University
  • Jianyun Xu DAMO Academy, Alibaba Group
  • Huanyu Wang Zhejiang University
  • Zhenwei Miao DAMO Academy, Alibaba Group
  • Xin Zhan DAMO Academy, Alibaba Group
  • Dayang Hao DAMO Academy, Alibaba Group
  • Xi Li Zhejiang University Shanghai Institute for Advanced Study, Zhejiang University Shanghai AI Laboratory

DOI:

https://doi.org/10.1609/aaai.v37i2.25329

Keywords:

CV: Vision for Robotics & Autonomous Driving, CV: Segmentation

Abstract

Point cloud panoptic segmentation is a challenging task that seeks a holistic solution for both semantic and instance segmentation to predict groupings of coherent points. Previous approaches treat semantic and instance segmentation as surrogate tasks, and they either use clustering methods or bounding boxes to gather instance groupings with costly computation and hand-craft designs in the instance segmentation task. In this paper, we propose a simple but effective point cloud unified panoptic segmentation (PUPS) framework, which use a set of point-level classifiers to directly predict semantic and instance groupings in an end-to-end manner. To realize PUPS, we introduce bipartite matching to our training pipeline so that our classifiers are able to exclusively predict groupings of instances, getting rid of hand-crafted designs, e.g. anchors and Non-Maximum Suppression (NMS). In order to achieve better grouping results, we utilize a transformer decoder to iteratively refine the point classifiers and develop a context-aware CutMix augmentation to overcome the class imbalance problem. As a result, PUPS achieves 1st place on the leader board of SemanticKITTI panoptic segmentation task and state-of-the-art results on nuScenes.

Downloads

Published

2023-06-26

How to Cite

Su, S., Xu, J., Wang, H., Miao, Z., Zhan, X., Hao, D., & Li, X. (2023). PUPS: Point Cloud Unified Panoptic Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2), 2339-2347. https://doi.org/10.1609/aaai.v37i2.25329

Issue

Section

AAAI Technical Track on Computer Vision II