Corpus Annotation in Service of Intelligent Narrative Technologies
Keywords:Corpus Construction, Corpus Annotation, Knowledge Representation, Collection Methods, Interactive Narrative, Narrative Generation
Annotated corpora have stimulated great advances in the language sciences. The time is ripe to bring that same stimulation, and consequent benefits, to computational approaches to narrative. I describe an effort to construct a corpus of semantically annotated stories. I outline the structure of the corpus, a structure which colloquially can be described as a "handful of handfuls." One handful of the corpus has already been constructed, viz., 18k words of Russian folktales. There are two handfuls under construction: legal cases focused on the area of probable cause, and stories from Islamist Extremist Jihadists. Four more handfuls are being planned: folktales from Chinese, English, and a West Asian culture, and stories of international conventional and cyber conflicts. There are numerous additional handfuls under discussion. The main focus of the corpus so far has been on textual materials that are annotated for their surface semantics using conventional annotation tools and techniques; nonetheless, there are numerous novel dimensions along which the corpus might grow and become useful for different communities. In particular I propose for discussion the outlines of a few novel sources, annotation schemes, and collection methodologies that could potentially make the corpus of great use to the interactive narrative or narrative generation communities.