The Story Workbench: An Extensible Semi-Automatic Text Annotation Tool
Keywords:Semi-Automatic Corpus Annotation, Knowledge Representation, Annotation Tools
Text annotations are of great use to researchers in the language sciences, and much effort has been invested in creating annotated corpora for an wide variety of purposes. Unfortunately, software support for these corpora tends to be quite limited: it is usually ad-hoc, poorly designed and documented, or not released for public use. I describe an annotation tool, the Story Workbench, which provides a generic platform for text annotation. It is free, open-source, cross-platform, and user friendly. It provides a number of common text annotation operations, including representations (e.g., tokens, sentences, parts of speech), functions (e.g., generation of initial annotations by algorithm, checking annotation validity by rule, fully manual manipulation of annotations) and tools (e.g., distributing texts to annotators via version control, merging doubly-annotated texts into a single file). The tool is extensible at many different levels, admitting new representations, algorithm, and tools. I enumerate ten important features and illustrate how they support the annotation process at three levels: (1) annotation of individual texts by a single annotator, (2) double-annotation of texts by two annotators and an adjudicator, and (3) annotation scheme development. The Story Workbench is scheduled for public release in March 2012.