DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents

Authors

  • Tsu-Jui Fu UC Santa Barbara
  • William Yang Wang UC Santa Barbara
  • Daniel McDuff Microsoft Research
  • Yale Song Microsoft Research

DOI:

https://doi.org/10.1609/aaai.v36i1.19943

Keywords:

Computer Vision (CV), Speech & Natural Language Processing (SNLP)

Abstract

Creating presentation materials requires complex multimodal reasoning skills to summarize key concepts and arrange them in a logical and visually pleasing manner. Can machines learn to emulate this laborious process? We present a novel task and approach for document-to-slide generation. Solving this involves document summarization, image and text retrieval, slide structure and layout prediction to arrange key elements in a form suitable for presentation. We propose a hierarchical sequence-to-sequence approach to tackle our task in an end-to-end manner. Our approach exploits the inherent structures within documents and slides and incorporates paraphrasing and layout prediction modules to generate slides. To help accelerate research in this domain, we release a dataset about 6K paired documents and slide decks used in our experiments. We show that our approach outperforms strong baselines and produces slides with rich content and aligned imagery.

Downloads

Published

2022-06-28

How to Cite

Fu, T.-J., Wang, W. Y., McDuff, D., & Song, Y. (2022). DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents. Proceedings of the AAAI Conference on Artificial Intelligence, 36(1), 634-642. https://doi.org/10.1609/aaai.v36i1.19943

Issue

Section

AAAI Technical Track on Computer Vision I