Determining Ambiguity Classes for Part-of-Speech Tagging

Markus Dickinson

Proceedings of RANLP-07.

We examine how words group together in the lexicon, in terms of ambiguity classes, and use this information in a redefined tagset to improve POS tagging. In light of errors in the training data and a limited amount of annotated data, we investigate ways to define ambiguity classes for words which consider the lexicon as a whole and predict unknown uses of words. Fitting words to typical ambiguity classes is shown to provide more accurate ambiguity classes for words and to significantly improve tagging performance.

