SOIT: Segmenting Objects with Instance-Aware Transformers

Xiaodong Yu; Dahu Shi; Xing Wei; Ye Ren; Tingqun Ye; Wenming Tan

doi:10.1609/aaai.v36i3.20227

Authors

Xiaodong Yu Hikvision Research Institute
Dahu Shi Hikvision Research Institute
Xing Wei Xi'an Jiaotong University
Ye Ren Hikvision Research Institute
Tingqun Ye Hikvision Research Institute
Wenming Tan Hikvision Research Institute

DOI:

https://doi.org/10.1609/aaai.v36i3.20227

Keywords:

Computer Vision (CV)

Abstract

This paper presents an end-to-end instance segmentation framework, termed SOIT, that Segments Objects with Instance-aware Transformers. Inspired by DETR, our method views instance segmentation as a direct set prediction problem and effectively removes the need for many hand-crafted components like RoI cropping, one-to-many label assignment, and non-maximum suppression (NMS). In SOIT, multiple queries are learned to directly reason a set of object embeddings of semantic category, bounding-box location, and pixel-wise mask in parallel under the global image context. The class and bounding-box can be easily embedded by a fixed-length vector. The pixel-wise mask, especially, is embedded by a group of parameters to construct a lightweight instance-aware transformer. Afterward, a full-resolution mask is produced by the instance-aware transformer without involving any RoI-based operation. Overall, SOIT introduces a simple single-stage instance segmentation framework that is both RoI- and NMS-free. Experimental results on the MS COCO dataset demonstrate that SOIT outperforms state-of-the-art instance segmentation approaches significantly. Moreover, the joint learning of multiple tasks in a unified query embedding can also substantially improve the detection performance. Code is available at https://github.com/yuxiaodongHRI/SOIT.

SOIT: Segmenting Objects with Instance-Aware Transformers

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription