Similarity and Dissimilarity in Treebank Grammars

Markus Dickinson

Proceedings of CIL 18 (International Congress of Linguists).

To uncover rules in a treebank grammar which are of dubious quality, we investigate two methods for detecting problematic structures, both based on the same notion of similarity. The first is based on the notion that similar rules should receive the same annotation. The second is based on the idea that rules which are dissimilar to other rules are likely problematic. We show these two methods to be effective in detecting erroneous rules, rules used for ungrammatical or otherwise non-standard constructions, and rules which reveal non-uniform decisions made in the annotation scheme.

