Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark

Authors

  • Yongliang Wu Southeast University Opus AI Research
  • Wenbo Zhu Opus AI Research
  • Jiawang Cao Opus AI Research
  • Yi Lu University of Toronto Opus AI Research
  • Bozheng Li Brown University Opus AI Research
  • Weiheng Chi National University of Singapore Opus AI Research
  • Zihan Qiu Opus AI Research
  • Lirian Su Opus AI Research
  • Haolin Zheng Opus AI Research
  • Jay Wu Opus AI Research
  • Xu Yang Southeast University

DOI:

https://doi.org/10.1609/aaai.v39i8.32916

Abstract

The demand for producing short-form videos for sharing on social media platforms has experienced significant growth in recent times. Despite notable advancements in the fields of video summarization and highlight detection, which can create partially usable short films from raw videos, these approaches are often domain-specific and require an in-depth understanding of real-world video content. To tackle this predicament, we propose Repurpose-10K, an extensive dataset comprising over 10,000 videos with more than 120,000 annotated clips aimed at resolving the video long-to-short task. Recognizing the inherent constraints posed by untrained human annotators, which can result in inaccurate annotations for repurposed videos, we propose a two-stage solution to obtain annotations from real-world user-generated content. Furthermore, we offer a baseline model to address this challenging task by integrating audio, visual, and caption aspects through a cross-modal fusion and alignment framework. We aspire for our work to ignite groundbreaking research in the lesser-explored realms of video repurposing.

Downloads

Published

2025-04-11

How to Cite

Wu, Y., Zhu, W., Cao, J., Lu, Y., Li, B., Chi, W., … Yang, X. (2025). Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark. Proceedings of the AAAI Conference on Artificial Intelligence, 39(8), 8487–8495. https://doi.org/10.1609/aaai.v39i8.32916

Issue

Section

AAAI Technical Track on Computer Vision VII