MODNet: Real-Time Trimap-Free Portrait Matting via Objective Decomposition

Authors

  • Zhanghan Ke Department of Computer Science, City University of Hong Kong SenseTime Research
  • Jiayu Sun Department of Computer Science, City University of Hong Kong
  • Kaican Li SenseTime Research
  • Qiong Yan SenseTime Research
  • Rynson W.H. Lau Department of Computer Science, City University of Hong Kong

DOI:

https://doi.org/10.1609/aaai.v36i1.19999

Keywords:

Computer Vision (CV)

Abstract

Existing portrait matting methods either require auxiliary inputs that are costly to obtain or involve multiple stages that are computationally expensive, making them less suitable for real-time applications. In this work, we present a light-weight matting objective decomposition network (MODNet) for portrait matting in real-time with a single input image. The key idea behind our efficient design is by optimizing a series of sub-objectives simultaneously via explicit constraints. In addition, MODNet includes two novel techniques for improving model efficiency and robustness. First, an Efficient Atrous Spatial Pyramid Pooling (e-ASPP) module is introduced to fuse multi-scale features for semantic estimation. Second, a self-supervised sub-objectives consistency (SOC) strategy is proposed to adapt MODNet to real-world data to address the domain shift problem common to trimap-free methods. MODNet is easy to be trained in an end-to-end manner. It is much faster than contemporaneous methods and runs at 67 frames per second on a 1080Ti GPU. Experiments show that MODNet outperforms prior trimap-free methods by a large margin on both Adobe Matting Dataset and a carefully designed photographic portrait matting (PPM-100) benchmark proposed by us. Further, MODNet achieves remarkable results on daily photos and videos.

Downloads

Published

2022-06-28

How to Cite

Ke, Z., Sun, J., Li, K., Yan, Q., & Lau, R. W. (2022). MODNet: Real-Time Trimap-Free Portrait Matting via Objective Decomposition. Proceedings of the AAAI Conference on Artificial Intelligence, 36(1), 1140-1147. https://doi.org/10.1609/aaai.v36i1.19999

Issue

Section

AAAI Technical Track on Computer Vision I