Detecting Errors in Discontinuous Structural Annotation

Markus Dickinson and Walt Detmar Meurers

Proceedings of ACL'05.

Consistency of corpus annotation is an essential property for the many uses of annotated corpora in computational and theoretical linguistics. While some research addresses the detection of inconsistencies in positional annotation (e.g., part-of-speech) and continuous structural annotation (e.g., syntactic constituency), no approach has yet been developed for automatically detecting annotation errors in discontinuous structural annotation. This is significant since the annotation of potentially discontinuous stretches of material is increasingly relevant, from treebanks for free-word order languages to semantic and discourse annotation.

In this paper we discuss how the variation n-gram error detection approach (Dickinson and Meurers, 2003) can be extended to discontinuous structural annotation. We exemplify the approach by showing how it successfully detects errors in the syntactic annotation of the German TIGER corpus (Brants et al., 2002).


Electronically available file formats:


Bibtex entry:

@InProceedings{dickinson:meurers:05,
  author =       {Markus Dickinson and W. Detmar Meurers},
  title =        {Detecting Errors in Discontinuous Structural Annotation},
  booktitle =    {Proceedings of the 43rd Annual Meeting of the Association 
                  for Computational Linguistics (ACL-05)},
  pages=         {322-329},
  address =      {Ann Arbor, MI, USA},
  url =          {\url{http://cl.indiana.edu/~md7/papers/dickinson-meurers-05.html}},
  year =         {2005}
}


The code used for this paper is freely available. Simply go to: the DECCA software page