Evaluating Distributional Properties of Tagsets

Markus Dickinson and Charles Jochim

Proceedings of the 7th Language Resources and Evaluation Conference (LREC 2010). Marrakech, Morocco.

We investigate which distributional properties should be present in a tagset by examinging different mappings of current part-of-speech tagsets. Given the importance of distributional information, we present a simple model for evaluating how a tagset mapping captures distribution. In addition to an accuracy metric capturing the internal quality of a tagset, we introduce a way to evaluate the external quality of tagset mappings so that we can ensure that the mapping retains linguistically important information from the original tagset.

