Avoiding the Comparative Fallacy in the Annotation of Learner Corpora

Marwa Ragheb and Markus Dickinson

Selected Proceedings of the 2010 Second Language Research Forum: Reconsidering SLA Research, Dimensions, and Directions.

Annotated corpora of learner language can be useful for SLA researchers and FL teachers. Tagging phenomena with part of speech (POS) information and grammatical relations between words can make it feasible to search in a learner corpus for abstract grammatical properties not easily captured through a lexical search (e.g., headless relative clauses). One type of corpus annotation for learner language has focused on so-called errors (e.g., Granger, 2003), using specific error tags for phenomena that 'deviate' from the L2. Some of these schemes make use of target hypotheses, attempting to capture the learner's intention. These approaches risk falling into the comparative fallacy (Bley-Vroman, 1983), since they try to map specific phenomena in interlanguage to target categories in the L2. The task is even more challenging with ambiguous utterances. In the same vein, it is undesirable to bias any annotation in terms of the L1 (cf. Lakshmanan & Selinker, 2001). A recent approach is to annotate interlanguage as it appears, without focusing on errors (e.g., Diaz-Negrillo et al, 2010; Dickinson and Ragheb, 2009). In providing linguistic annotation such as POS tags or syntactic relations, one has to ensure that the annotation supports different topics of SLA research, while avoiding the comparative fallacy or inferring learner intention. This paper discusses the ramifications of annotating syntactic properties in learner language and pinpoints where annotation designers must be aware of the comparative fallacy. Using different layers of annotation to capture variability in learner language, the authors argue that one should annotate observable linguistic properties that are clearly defined. They show how even if one defines the properties in terms of the L2, a systematic description of learner data can support L2 syntactic studies, provide insight into interlanguage, and avoid inferring intention, putting the final interpretation of the data in the hands of SLA researchers.


Electronically available file formats:


Bibtex entry:

@InProceedings{ragheb:dickinson:11,
  author =       {Marwa Ragheb and Markus Dickinson},
  title =        {Avoiding the Comparative Fallacy in the Annotation of Learner Corpora},
  booktitle =    {Selected Proceedings of the 2010 Second Language Research Forum: 
                  Reconsidering SLA Research, Dimensions, and Directions},
  publisher =    {Cascadilla Proceedings Project},
  address =      {Somerville, MA},
  pages =        {114--124},
  url =          {http://cl.indiana.edu/~md7/papers/ragheb-dickinson11.html},
  year =         {2011}
}