Learning to LEAP: Efficient Dense Point Tracking by Focusing Where It Matters
DOI:
https://doi.org/10.1609/aaai.v40i15.38311Abstract
Tracking Any Point (TAP) is a foundational task in computer vision with broad applicability. The state-of-the-art self-supervised TAP method leverages a global matching transformer and contrastive random walks to learn point correspondences. However, its dense all-pairs attention and correlation volume computation tend to introduce irrelevant features and produce less informative training signals, degrading both learning efficiency and tracking accuracy. To address these limitations, we introduce LEAP-Track, a self-supervised TAP approach that computes the attention matrices and correlation volume over adaptively selected sparse pairs. It consists of two core designs: (1) Curriculum-based Sparse Attention (CSA), which dynamically focuses on the most relevant keys, promoting the learning of discriminative features; and (2) Progressive k-NN Transition (PkT), which reformulates the contrastive random walk to operate on an increasingly sparse k-NN affinity graph to reinforce the learning of the most informative correspondences. By integrating the above two designs into a two-stage training paradigm, LEAP-Track is shown both theoretically and empirically to effectively boost learning efficiency, achieving superior tracking accuracy over existing self-supervised TAP methods.Published
2026-03-14
How to Cite
Zhao, C., Wang, W., Zhang, B., & Wang, W. (2026). Learning to LEAP: Efficient Dense Point Tracking by Focusing Where It Matters. Proceedings of the AAAI Conference on Artificial Intelligence, 40(15), 13108–13116. https://doi.org/10.1609/aaai.v40i15.38311
Issue
Section
AAAI Technical Track on Computer Vision XII