Building a Korean Web Corpus for Analyzing Learner Language

Markus Dickinson, Ross Israel, and Sun-Hee Lee

Proceedings of the 6th Workshop on the Web as Corpus (WAC-6) at NAACL-10

Post-positional particles are a significant source of errors for learners of Korean. Following methodology that has proven effective in handling English preposition errors, we are beginning the process of building a machine learner for particle error detection in L2 Korean writing. As a first step, however, we must acquire data, and thus we present a methodology for constructing large-scale corpora of Korean from the Web, exploring the feasibility of building corpora appropriate for a given topic and grammatical construction.


Electronically available file formats:


Bibtex entry:

@InProceedings{dickinson:israel:lee:10, 
  author =       {Markus Dickinson and Ross Israel and Sun-Hee Lee},
  title =        {Building a Korean Web Corpus for Analyzing Learner Language},
  booktitle =    {Proceedings of the 6th Workshop on
                  the Web as Corpus (WAC-6)}
  address =      {Los Angeles}, 
  pages =        {},
  year =         {2010},
  url =  {http://cl.indiana.edu/~md7/papers/dickinson-israel-lee10.html}
}