Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation

Goran Glavaš; Swapna Somasundaran

doi:10.1609/aaai.v34i05.6284

Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation

Authors

Goran Glavaš University of Mannheim
Swapna Somasundaran Educational Testing Service

DOI:

https://doi.org/10.1609/aaai.v34i05.6284

Abstract

Breaking down the structure of long texts into semantically coherent segments makes the texts more readable and supports downstream applications like summarization and retrieval. Starting from an apparent link between text coherence and segmentation, we introduce a novel supervised model for text segmentation with simple but explicit coherence modeling. Our model – a neural architecture consisting of two hierarchically connected Transformer networks – is a multi-task learning model that couples the sentence-level segmentation objective with the coherence objective that differentiates correct sequences of sentences from corrupt ones. The proposed model, dubbed Coherence-Aware Text Segmentation (CATS), yields state-of-the-art segmentation performance on a collection of benchmark datasets. Furthermore, by coupling CATS with cross-lingual word embeddings, we demonstrate its effectiveness in zero-shot language transfer: it can successfully segment texts in languages unseen in training.

Downloads

Published

2020-04-03

How to Cite

Glavaš, G., & Somasundaran, S. (2020). Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 7797–7804. https://doi.org/10.1609/aaai.v34i05.6284

Download Citation

Issue

Vol. 34 No. 05: AAAI-20 Technical Tracks 5

Section

AAAI Technical Track: Natural Language Processing

Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information