Cross-Modal Object Tracking: Modality-Aware Representations and a Unified Benchmark

Chenglong Li; Tianhao Zhu; Lei Liu; Xiaonan Si; Zilin Fan; Sulan Zhai

doi:10.1609/aaai.v36i2.20016

Authors

Chenglong Li Anhui University
Tianhao Zhu Anhui University
Lei Liu Anhui University
Xiaonan Si Anhui University
Zilin Fan Anhui University
Sulan Zhai Anhui University

DOI:

https://doi.org/10.1609/aaai.v36i2.20016

Keywords:

Computer Vision (CV)

Abstract

In many visual systems, visual tracking often bases on RGB image sequences, in which some targets are invalid in low-light conditions, and tracking performance is thus affected significantly. Introducing other modalities such as depth and infrared data is an effective way to handle imaging limitations of individual sources, but multi-modal imaging platforms usually require elaborate designs and cannot be applied in many real-world applications at present. Near-infrared (NIR) imaging becomes an essential part of many surveillance cameras, whose imaging is switchable between RGB and NIR based on the light intensity. These two modalities are heterogeneous with very different visual properties and thus bring big challenges for visual tracking. However, existing works have not studied this challenging problem. In this work, we address the cross-modal object tracking problem and contribute a new video dataset, including 654 cross-modal image sequences with over 481K frames in total, and the average video length is more than 735 frames. To promote the research and development of cross-modal object tracking, we propose a new algorithm, which learns the modality-aware target representation to mitigate the appearance gap between RGB and NIR modalities in the tracking process. It is plug-and-play and could thus be flexibly embedded into different tracking frameworks. Extensive experiments on the dataset are conducted, and we demonstrate the effectiveness of the proposed algorithm in two representative tracking frameworks against 19 state-of-the-art tracking methods. Dataset, code, model and results are available at https://github.com/mmic-lcl/source-code.

Cross-Modal Object Tracking: Modality-Aware Representations and a Unified Benchmark

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information