On Detecting Errors in Dependency Treebanks

Adriane Boyd, Markus Dickinson, and Detmar Meurers

Research on Language and Computation. 6(2).

Dependency relations between words are increasingly recognized as an important level of linguistic representation that is close to the data and at the same time to the semantic functor-argument structure as a target of syntactic analysis and processing. Correspondingly, dependency structures play an important role in parser evaluation and for the training and evaluation of tools based on dependency treebanks. Gold standard dependency treebanks have been created for some languages, most notably Czech, and annotation efforts for other languages are under way. At the same time, general techniques for detecting errors in dependency annotation have not yet been developed.

We address this gap by exploring how a technique proposed for detecting errors in constituency-based syntactic annotation can be adapted to systematically detect errors in dependency annotation. Building on an analysis of key properties and differences between constituency and dependency annotation, we discuss results for dependency treebanks for Swedish, Czech, and German. Complementing the focus on detecting errors in dependency treebanks to improve these gold standard resources, the discussion of dependency error detection for different languages and annotation schemes also raises questions of standardization for some aspects of dependency annotation, in particular regarding the locality of annotation, the assumption of a single head for each dependency relation, and phenomena such as coordination.

