Residual Encoder Decoder Network and Adaptive Prior for Face Parsing

Authors

  • Tianchu Guo Beijing Samsung Telecommunication
  • Youngsung Kim Samsung Advanced Institute of Technology
  • Hui Zhang Beijing Samsung Telecommunication
  • Deheng Qian Beijing Samsung Telecommunication
  • ByungIn Yoo Samsung Advanced Insitute of Technology
  • Jingtao Xu Beijing Samsung Telecommunication
  • Dongqing Zou Beijing Samsung Telecommunication
  • Jae-Joon Han Samsung Advanced Institute of Technology
  • Changkyu Choi Samsung Advanced Institue of Technology

DOI:

https://doi.org/10.1609/aaai.v32i1.12268

Keywords:

face parsing, encoder decoder, redisual network, adaptive prior

Abstract

Face Parsing assigns every pixel in a facial image with a semantic label, which could be applied in various applications including face recognition, facial beautification, affective computing and animation. While lots of progress have been made in this field, current state-of-the-art methods still fail to extract real effective feature and restore accurate score map, especially for those facial parts which have large variations of deformation and fairly similar appearance, e.g. mouth, eyes and thin eyebrows. In this paper, we propose a novel pixel-wise face parsing method called Residual Encoder Decoder Network (RED-Net), which combines a feature-rich encoder-decoder framework with adaptive prior mechanism. Our encoder-decoder framework extracts feature with ResNet and decodes the feature by elaborately fusing the residual architectures in to deconvolution. This framework learns more effective feature comparing to that learnt by decoding with interpolation or classic deconvolution operations. To overcome the appearance ambiguity between facial parts, an adaptive prior mechanism is proposed in term of the decoder prediction confidence, allowing refining the final result. The experimental results on two public datasets demonstrate that our method outperforms the state-of-the-arts significantly, achieving improvements of F-measure from 0.854 to 0.905 on Helen dataset, and pixel accuracy from 95.12% to 97.59% on the LFW dataset. In particular, convincing qualitative examples show that our method parses eye, eyebrow, and lip regins more accurately.

Downloads

Published

2018-04-27

How to Cite

Guo, T., Kim, Y., Zhang, H., Qian, D., Yoo, B., Xu, J., Zou, D., Han, J.-J., & Choi, C. (2018). Residual Encoder Decoder Network and Adaptive Prior for Face Parsing. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.12268