Advancing Video Synchronization with Fractional Frame Analysis: Introducing a Novel Dataset and Model

Yuxuan Liu; Haizhou Ai; Junliang Xing; Xuri Li; Xiaoyi Wang; Pin Tao

doi:10.1609/aaai.v38i4.28174

Authors

Yuxuan Liu Key Laboratory of Pervasive Computing, Ministry of Education Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
Haizhou Ai Key Laboratory of Pervasive Computing, Ministry of Education Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
Junliang Xing Key Laboratory of Pervasive Computing, Ministry of Education Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
Xuri Li Beijing University of Technology, Beijing 100124, China
Xiaoyi Wang Independent Researcher, Haidian District, Beijing, China
Pin Tao Key Laboratory of Pervasive Computing, Ministry of Education Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

DOI:

https://doi.org/10.1609/aaai.v38i4.28174

Keywords:

CV: Video Understanding & Activity Analysis, CV: 3D Computer Vision

Abstract

Multiple views play a vital role in 3D pose estimation tasks. Ideally, multi-view 3D pose estimation tasks should directly utilize naturally collected videos for pose estimation. However, due to the constraints of video synchronization, existing methods often use expensive hardware devices to synchronize the initiation of cameras, which restricts most 3D pose collection scenarios to indoor settings. Some recent works learn deep neural networks to align desynchronized datasets derived from synchronized cameras and can only produce frame-level accuracy. For fractional frame video synchronization, this work proposes an Inter-Frame and Intra-Frame Desynchronized Dataset (IFID), which labels fractional time intervals between two video clips. IFID is the first dataset that annotates inter-frame and intra-frame intervals, with a total of 382,500 video clips annotated, making it the largest dataset to date. We also develop a novel model based on the Transformer architecture, named InSynFormer, for synchronizing inter-frame and intra-frame. Extensive experimental evaluations demonstrate its promising performance. The dataset and source code of the model are available at https://github.com/yuxuan-cser/InSynFormer.

Advancing Video Synchronization with Fractional Frame Analysis: Introducing a Novel Dataset and Model

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information