Advancing Video Synchronization with Fractional Frame Analysis: Introducing a Novel Dataset and Model

Authors

  • Yuxuan Liu Key Laboratory of Pervasive Computing, Ministry of Education Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
  • Haizhou Ai Key Laboratory of Pervasive Computing, Ministry of Education Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
  • Junliang Xing Key Laboratory of Pervasive Computing, Ministry of Education Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
  • Xuri Li Beijing University of Technology, Beijing 100124, China
  • Xiaoyi Wang Independent Researcher, Haidian District, Beijing, China
  • Pin Tao Key Laboratory of Pervasive Computing, Ministry of Education Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

DOI:

https://doi.org/10.1609/aaai.v38i4.28174

Keywords:

CV: Video Understanding & Activity Analysis, CV: 3D Computer Vision

Abstract

Multiple views play a vital role in 3D pose estimation tasks. Ideally, multi-view 3D pose estimation tasks should directly utilize naturally collected videos for pose estimation. However, due to the constraints of video synchronization, existing methods often use expensive hardware devices to synchronize the initiation of cameras, which restricts most 3D pose collection scenarios to indoor settings. Some recent works learn deep neural networks to align desynchronized datasets derived from synchronized cameras and can only produce frame-level accuracy. For fractional frame video synchronization, this work proposes an Inter-Frame and Intra-Frame Desynchronized Dataset (IFID), which labels fractional time intervals between two video clips. IFID is the first dataset that annotates inter-frame and intra-frame intervals, with a total of 382,500 video clips annotated, making it the largest dataset to date. We also develop a novel model based on the Transformer architecture, named InSynFormer, for synchronizing inter-frame and intra-frame. Extensive experimental evaluations demonstrate its promising performance. The dataset and source code of the model are available at https://github.com/yuxuan-cser/InSynFormer.

Published

2024-03-24

How to Cite

Liu, Y., Ai, H., Xing, J., Li, X., Wang, X., & Tao, P. (2024). Advancing Video Synchronization with Fractional Frame Analysis: Introducing a Novel Dataset and Model. Proceedings of the AAAI Conference on Artificial Intelligence, 38(4), 3828–3836. https://doi.org/10.1609/aaai.v38i4.28174

Issue

Section

AAAI Technical Track on Computer Vision III