Annotating Errors in a Hungarian Learner Corpus

Markus Dickinson and Scott Ledbetter

Proceedings of the 8th Language Resources and Evaluation Conference (LREC 2012). Istanbul, Turkey.

We are developing and annotating a learner corpus of Hungarian, composed of student journals from three different proficiency levels written at Indiana University. Our annotation marks learner errors that are of different linguistic categories, including phonology, morphology, and syntax, but defining the annotation for an agglutinative language presents several issues. First, we must adapt an analysis that is centered on the morpheme rather than the word. Second, and more importantly, we see a need to distinguish errors from secondary corrections. We argue that although certain learner errors require a series of corrections to reach a target form, these secondary corrections, conditioned on those that come before, are our own adjustments that link the learner's productions to the target form and are not representative of the learner's internal grammar. In this paper, we report the annotation scheme and the principles that guide it, as well as examples illustrating its functionality and directions for expansion.


Electronically available file formats:


Bibtex entry:

@InProceedings{dickinson:ledbetter:12,
  author =       {Markus Dickinson and Scott Ledbetter},
  title =        {Annotating Errors in a Hungarian Learner Corpus},
  booktitle =    {Proceedings of the 8th Language Resources and 
                  Evaluation Conference (LREC 2012)},
  address =      {Istanbul, Turkey},
  pages =        {},
  url =          {\url{http://cl.indiana.edu/~md7/papers/dickinson-ledbetter12.html}},
  year =         {2012}
}