RacketVision: A Multiple Racket Sports Benchmark for Unified Ball and Racket Analysis

Authors

  • Linfeng Dong Zhejiang University Shanghai AI Laboratory
  • Yuchen Yang Fudan University Shanghai AI Laboratory
  • Hao Wu University of Science and Technology of China Shanghai AI Laboratory
  • Wei Wang Shanghai AI Laboratory
  • Yuenan Hou Shanghai AI Laboratory
  • Zhihang Zhong Shanghai AI Laboratory
  • Xiao Sun Shanghai AI Laboratory

DOI:

https://doi.org/10.1609/aaai.v40i5.37362

Abstract

We introduce RacketVision, a novel dataset and benchmark for advancing computer vision in sports analytics, covering table tennis, tennis, and badminton. The dataset is the first to provide large-scale, fine-grained annotations for racket pose alongside traditional ball positions, enabling research into complex human-object interactions. It is designed to tackle three interconnected tasks: fine-grained ball tracking, articulated racket pose estimation, and predictive ball trajectory forecasting. Our evaluation of established baselines reveals a critical insight for multi-modal fusion: while naively concatenating racket pose features degrades performance, a Cross-Attention mechanism is essential to unlock their value, leading to trajectory prediction results that surpass strong unimodal baselines. RacketVision provides a versatile resource and a strong starting point for future research in dynamic object tracking, conditional motion forecasting, and multi-modal analysis in sports.

Published

2026-03-14

How to Cite

Dong, L., Yang, Y., Wu, H., Wang, W., Hou, Y., Zhong, Z., & Sun, X. (2026). RacketVision: A Multiple Racket Sports Benchmark for Unified Ball and Racket Analysis. Proceedings of the AAAI Conference on Artificial Intelligence, 40(5), 3632–3640. https://doi.org/10.1609/aaai.v40i5.37362

Issue

Section

AAAI Technical Track on Computer Vision II