An Extraction and Representation Pipeline for Literary Characters


  • Funing Yang Wellesley College



Natural Language Processing, Narrative Understanding, Information Extraction, Information Retrival, Machine Learning


Readers of novels need to identify and learn about the characters as they develop an understanding of the plot. The paper presents an end-to-end automated pipeline for literary character identification and ongoing work for extracting and comparing character representations for full-length English novels. The character identification pipeline involves a named entity recognition (NER) module with F1 score of 0.85, a coreference resolution module with F1 score of 0.76, and a disambiguation module using both heuristic and algorithmic approaches. Ongoing work compares event extraction as well as speech extraction pipelines for literary characters representations with case studies. The paper is the first to my knowledge that combines a modular pipeline for automated character identification, representation extraction and comparisons for full-length English novels.




How to Cite

Yang, F. (2022). An Extraction and Representation Pipeline for Literary Characters. Proceedings of the AAAI Conference on Artificial Intelligence, 36(11), 13146-13147.