Advances in computer technology have revolutionized the ways linguists can approach their data. By using computers, we can access large bodies of text (corpora) and search for the phenomena in which we are interested. In this way, we can uncover complexities in naturally-occurring data and explore issues related to frequency of usage.
In this course, the following questions will be investigated: What is a corpus? What corpora exist? How are corpora developed? What is XML? How does one search for specific phenomena in corpora? What is a concordancer? Do we need syntactic annotation? Are there programs that do the annotation automatically? Are there tools that help me search in linguistically annotated corpora?
Office hours: (at least for the first week)
|or by appointment|
There will be approximately one assignment every two weeks. These assignments give you the opportunity to practically explore the topics discussed in class.
There is a main required textbook we will use, plus one we will occasionally select readings from. Additionally, there will sometimes be readings available online.
Grades will be based on:
Final projects will allow you to explore a research topic of your own interest and how corpus linguistic methods can enhance the research. More details will be given sometime in February or early March.
Approximately every week, we will have a short lesson (20-30 minutes) on the Perl programming language. This language is useful for writing quick programs to process text, change data formats, access web data, as a front-end for language technology, etc. I assume no previous programming background.
Academic misconduct is not allowed in this course. The Indiana University Code of Student Rights, Responsibilities, and Conduct (http://dsa.indiana.edu/Code/) defines academic misconduct as ``any activity that tends to undermine the academic integrity of the institution . . . Academic misconduct may involve human, hard-copy, or electronic resources . . . Academic misconduct includes, but is not limited to . . . cheating, fabrication, plagiarism, interference, violation of course rules, and facilitating academic misconduct'' (II. G.1-6).
Students who need an accommodation based on the impact of a disability should contact me to arrange an appointment as soon as possible to discuss the course format, to anticipate needs, and to explore potential accommodations.
I rely on Disability Services for Students for assistance in verifying the need for accommodations and developing accommodation strategies. Students who have not previously contacted Disability Services are encouraged to do so (812-855-7578; http://www.indiana.edu/~iubdss/).
We will mix practical sessions with more lecture-based sessions.
|Jan.||13||Intro to class|
|15||Why corpus linguistics? (.pdf, -2x3.pdf)||A1, B2|
|20||Corpora at IU (handout; unix)||1 (code)|
|22||NO CLASS, I'M AWAY|
|27||NO CLASS, I'M AWAY|
|29||Basics (.pdf, -2x3.pdf)||A2, SM17||2 (code)|
|Feb.||3||Corpus annotation (.pdf, -2x3.pdf)||A3, A4|
|W: 4||Available corpora (.pdf, -2x3.pdf)||A5, A7|
|5||Corpus annotation||3 (code)||#1 due|
|10||Application #1: Language variation (.pdf, -2x3.pdf)||A10.4, B4|
|W: 11||In-class practice (help.pl)||C2|
|12||More on annotation: POS/Syntax (.pdf, -2x3.pdf)||SM21|
|17||More on annotation: Reliability (.pdf, -2x3.pdf)||SM29, Dickinson and Meurers (2003)|
|19||Regular expressions (.pdf, -2x3.pdf)||4 (code, palindromes.txt)|
|24||Regular expressions||#2 due|
|Mar.||3||Application #2: Collocations (.pdf, -2x3.pdf)||B3, SM22||5 (code)|
|W: 4||In-class practice (collocations.pl)||C1|
|5||Annotation tools (.pdf, -2x3.pdf)||SM39|
|10||NO CLASS, I'M AWAY|
|12||NO CLASS, I'M AWAY||#3 due (hw3.pl)|
|17||NO CLASS, SPRING BREAK|
|19||NO CLASS, SPRING BREAK|
|24||Application #3: Language learning (.pdf, -2x3.pdf)||B6|
|W: 25||In-class practice (onewordperline.pl, convert.pl, transform.pl)||C3|
|26||Automatic annotation: POS & syntax (.pdf, -2x3.pdf)||6 (code)|
|31||NO CLASS, I'M AWAY|
|Apr.||2||NO CLASS, I'M AWAY|
|7||Syntactic annotation (.pdf, -2x3.pdf)||SM37, SM25|
|W: 8||Syntactic searching (.pdf\ a>, -2x3.pdf, handout)||Meurers and Müller (2007); Meurers (2005)|
|9||Linguist's Search Engine||Resnik and Elkiss (2005); Resnik et al. (2005)||7 (code)||#4 due|
|14||Web as Corpus 1 (.pdf, -2x3.pdf, web.pl)||SM42|
|16||Web as Corpus 2||Baroni and Kilgarriff (2006); Sharoff (2006); Baroni and Bernardini (2004)||8 (code)|
|21||Application #4: Translation (.pdf, -2x3.pdf)||B5|
|W: 22||In-class practice (one.pl)||C6|
|23||Semantic annotation (.pdf, -2x3.pdf)||Burchardt et al. (2006); Palmer et al. (2000)|
|28||Multidimensional analysis (.pdf, -2x3.pdf)|
|30||Perl wrap-up||9 (code: a, b, c)||#5 due|
|May||5||Final papers/presentations (10:15am)|
|More on annotation: POS/Syntax||SM21|
|More on annotation: Reliability||SM29, Dickinson and Meurers (2003)|
|Syntactic annotation||SM37, SM25|
|Syntactic searching||Meurers and Müller (2007); Meurers (2005)|
|Semantic annotation||Burchardt et al. (2006); Palmer et al. (2000)|