Syntactically Annotating Learner Language of English

Welcome to the website of the project for Syntactically Annotating Learner Language of English, or SALLE! This is a dissertation project being done in the Linguistics Department at Indiana University, Bloomington

This project concerns syntactically annotating texts written by learners of English as a second language. Our goal is to annotate linguistic properties present in a given sentence, without making too much interpretation about what the learner meant to say, or what the correct form should have been. To acheive this end, our annotation scheme adds several pieces of linguistic information about each word, based on its context in the sentence, and based on the rules of English (the target language). We annotate dependency relations to mark syntactic relations between words in a sentence - e.g., one word is the subject of another word.

This site is under development, so please be patient.



A beta version of the guidelines we are using are available here. It should be noted that these guidelines are still in progress, and we welcome feedback. We are releasing this version of the guidelines, before any data is released, because we feel that they will be useful to other researchers. The decisions we have made (certainly needing refinement in some cases) point out many of the essential questions that need to be addressed for linguistically annotating learner data, and we hope they can stimulate discussion.

BiBTeX information for the guidelines:

author =  {Markus Dickinson and Marwa Ragheb},
title =  {Annotation for Learner {E}nglish
                  Guidelines, v. 0.1},
institution =  {Indiana University},
year =  {2013},
address =  {Bloomington, IN},
month =  {June},
note =  {June 9, 2013},

The best paper to cite for the overall project is probably either our COLING 2012 paper or one of our TLT 2014 papers:

author    = {Ragheb, Marwa  and  Dickinson, Markus},
title     = {Defining Syntax for Learner Language Annotation},
booktitle = {Proceedings of COLING 2012: Posters},
month     = {December},
year      = {2012},
address   = {Mumbai, India},
pages     = {965--974},
url       = {}
  author    = {Ragheb, Marwa and Dickinson, Markus},
  title     = {Developing a Corpus of Syntactically-Annotated Learner
               Language for English},
  booktitle = {Proceedings of the 13th International Workshop on 
               Treebanks and Linguistic Theories (TLT13},
  year      = {2014},
  address   = {T\"ubingen, Germany},
  pages     = {292--300},
  url       = {}


Markus Dickinson and Marwa Ragheb (2015). On Grammaticality in the Syntactic Annotation of Learner Language. Proceedings of The 9th Linguistic Annotation Workshop. Denver, CO. pp. 158-167.

Marwa Ragheb and Markus Dickinson (2014). Developing a Corpus of Syntactically-Annotated Learner Language for English. Proceedings of the 13th International Workshop on Treebanks and Linguistic Theories (TLT13). Tübingen, Germany. pp. 292-300.

Marwa Ragheb and Markus Dickinson (2014). The Effect of Annotation Scheme Decisions on Parsing Learner Data. Proceedings of the 13th International Workshop on Treebanks and Linguistic Theories (TLT13). Tübingen, Germany. pp. 137-148.

Marwa Ragheb and Markus Dickinson (2013). Inter-annotator Agreement for Dependency Annotation of Learner Language. Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications. Atlanta, GA.

Marwa Ragheb and Markus Dickinson (2012). Defining Syntax for Learner Language Annotation. Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), Poster Session. Mumbai, India. pp. 965-974.

Marwa Ragheb and Markus Dickinson (2011). Avoiding the Comparative Fallacy in the Annotation of Learner Corpora. Selected Proceedings of the 2010 Second Language Research Forum: Reconsidering SLA Research, Dimensions, and Directions. Cascadilla Proceedings Project: Somerville, MA. pp. 114--124.

Markus Dickinson and Marwa Ragheb (2011). Dependency Annotation of Coordination for Learner Language. International Conference on Dependency Linguistics. Barcelona, Spain.

Markus Dickinson and Marwa Ragheb (2009). Dependency Annotation for Learner Corpora. Proceedings of the Eighth Workshop on Treebanks and Linguistic Theories (TLT-8). Milan, Italy.

Contact info: Markus Dickinson ( or Marwa Ragheb (