Self-Supervised Knowledge Assimilation for Expert-Layman Text Style Transfer

Wenda Xu; Michael Saxon; Misha Sra; William Yang Wang

doi:10.1609/aaai.v36i10.21410

Authors

Wenda Xu University of California, Santa Barbara
Michael Saxon University of California, Santa Barbara
Misha Sra University of California, Santa Barbara
William Yang Wang University of California, Santa Barbara

DOI:

https://doi.org/10.1609/aaai.v36i10.21410

Keywords:

Speech & Natural Language Processing (SNLP)

Abstract

Expert-layman text style transfer technologies have the potential to improve communication between members of scientific communities and the general public. High-quality information produced by experts is often filled with difficult jargon laypeople struggle to understand. This is a particularly notable issue in the medical domain, where layman are often confused by medical text online. At present, two bottlenecks interfere with the goal of building high-quality medical expert-layman style transfer systems: a dearth of pretrained medical-domain language models spanning both expert and layman terminologies and a lack of parallel corpora for training the transfer task itself. To mitigate the first issue, we propose a novel language model (LM) pretraining task, Knowledge Base Assimilation, to synthesize pretraining data from the edges of a graph of expert- and layman-style medical terminology terms into an LM during self-supervised learning. To mitigate the second issue, we build a large-scale parallel corpus in the medical expert-layman domain using a margin-based criterion. Our experiments show that transformer-based models pretrained on knowledge base assimilation and other well-established pretraining tasks fine-tuning on our new parallel corpus leads to considerable improvement against expert-layman transfer benchmarks, gaining an average relative improvement of our human evaluation, the Overall Success Rate (OSR), by 106%.

Self-Supervised Knowledge Assimilation for Expert-Layman Text Style Transfer

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription