UniAP: Towards Universal Animal Perception in Vision via Few-Shot Learning

Authors

  • Meiqi Sun Zhejiang University-University of Illinois Urbana Champaign Institute, Zhejiang University
  • Zhonghan Zhao College of Computer Science and Technology, Zhejiang University
  • Wenhao Chai Electrical and Computer Engineering Department, University of Washington
  • Hanjun Luo Zhejiang University-University of Illinois Urbana Champaign Institute, Zhejiang University
  • Shidong Cao Zhejiang University-University of Illinois Urbana Champaign Institute, Zhejiang University
  • Yanting Zhang Department of Computer Science and Technology, Donghua University
  • Jenq-Neng Hwang Electrical and Computer Engineering Department, University of Washington
  • Gaoang Wang Zhejiang University-University of Illinois Urbana Champaign Institute, Zhejiang University College of Computer Science and Technology, Zhejiang University Shanghai Artificial Intelligence Laboratory

DOI:

https://doi.org/10.1609/aaai.v38i5.28305

Keywords:

CV: Other Foundations of Computer Vision, CV: Biometrics, Face, Gesture & Pose, CV: Segmentation

Abstract

Animal visual perception is an important technique for automatically monitoring animal health, understanding animal behaviors, and assisting animal-related research. However, it is challenging to design a deep learning-based perception model that can freely adapt to different animals across various perception tasks, due to the varying poses of a large diversity of animals, lacking data on rare species, and the semantic inconsistency of different tasks. We introduce UniAP, a novel Universal Animal Perception model that leverages few-shot learning to enable cross-species perception among various visual tasks. Our proposed model takes support images and labels as prompt guidance for a query image. Images and labels are processed through a Transformer-based encoder and a lightweight label encoder, respectively. Then a matching module is designed for aggregating information between prompt guidance and the query image, followed by a multi-head label decoder to generate outputs for various tasks. By capitalizing on the shared visual characteristics among different animals and tasks, UniAP enables the transfer of knowledge from well-studied species to those with limited labeled data or even unseen species. We demonstrate the effectiveness of UniAP through comprehensive experiments in pose estimation, segmentation, and classification tasks on diverse animal species, showcasing its ability to generalize and adapt to new classes with minimal labeled examples.

Published

2024-03-24

How to Cite

Sun, M., Zhao, Z., Chai, W., Luo, H., Cao, S., Zhang, Y., Hwang, J.-N., & Wang, G. (2024). UniAP: Towards Universal Animal Perception in Vision via Few-Shot Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 5008-5016. https://doi.org/10.1609/aaai.v38i5.28305

Issue

Section

AAAI Technical Track on Computer Vision IV