On Detecting Errors in Dependency Treebanks

Adriane Boyd, Markus Dickinson, and Detmar Meurers

Research on Language and Computation. 6(2).

Dependency relations between words are increasingly recognized as an important level of linguistic representation that is close to the data and at the same time to the semantic functor-argument structure as a target of syntactic analysis and processing. Correspondingly, dependency structures play an important role in parser evaluation and for the training and evaluation of tools based on dependency treebanks. Gold standard dependency treebanks have been created for some languages, most notably Czech, and annotation efforts for other languages are under way. At the same time, general techniques for detecting errors in dependency annotation have not yet been developed.

We address this gap by exploring how a technique proposed for detecting errors in constituency-based syntactic annotation can be adapted to systematically detect errors in dependency annotation. Building on an analysis of key properties and differences between constituency and dependency annotation, we discuss results for dependency treebanks for Swedish, Czech, and German. Complementing the focus on detecting errors in dependency treebanks to improve these gold standard resources, the discussion of dependency error detection for different languages and annotation schemes also raises questions of standardization for some aspects of dependency annotation, in particular regarding the locality of annotation, the assumption of a single head for each dependency relation, and phenomena such as coordination.

Electronically available file formats:

Note: The electronic version of the publication linked on this page is the last version I had the copyright for. Where a publisher copyedited and/or typeset the papers, the electronic copies linked here are NOT identical to the officially published version, which should be used for any quotes, references to page numbers, etc.

Bibtex entry:

   author  = {Adriane Boyd and Markus Dickinson and Detmar Meurers},
   title   = {On Detecting Errors in Dependency Treebanks},
   journal = {Research on Language and Computation},
   volume  = {6},
   number  = {2},
   pages   = {113--137},
   year    = {2008},
   url     = {http://cl.indiana.edu/~md7/papers/boyd-et-al-08.html}