Detecting Errors in Part-of-Speech Annotation

Markus Dickinson and Walt Detmar Meurers

Proceedings of EACL'03.

We propose a new method for detecting errors in ``gold-standard'' part-of-speech annotation. The approach locates errors with high precision based on n-grams occurring in the corpus with multiple taggings. Two further techniques, closed-class analysis and finite-state tagging guide patterns, are discussed. The success of the three approaches is illustrated for the Wall Street Journal corpus as part of the Penn Treebank.


Electronically available file formats:


Bibtex entry:

@InProceedings{dickinson:meurers:03,
  author =       {Markus Dickinson and W. Detmar Meurers},
  title =        {Detecting Errors in Part-of-Speech Annotation},
  booktitle =    {Proceedings of the 10th Conference of the European 
                  Chapter of the Association for Computational Linguistics 
                  (EACL-03)},
  pages =        {107-114},
  address =      {Budapest, Hungary},
  year =         {2003},
  url =  {http://cl.indiana.edu/~md7/papers/dickinson-meurers-03.html}
}


The variation n-gram code used in the paper is freely available. Simply go to: the DECCA software page