Linguistics 445/515
The Computer and Natural Language
Autumn 2011

Course goals Present-day computer systems work with human language in many different forms, whether as stored data in the form of text, typed queries to a database or search engine, or speech commands in a voice-driven computer system. We also increasingly expect computers to produce human language, such as user-friendly error messages and synthesized speech. Through selected readings, exercises, demonstrations and Python programming, this course will: a) survey a range of issues relating natural language to computers, covering real-world applications, b) provide practical experience about representation and use of natural language on computers, and c) illustrate key principles of natural language processing through programming.

Topics include text encoding, search technology, tools for writing support, machine translation, dialogue systems, computer-aided language learning, and the social context of language technology.

Meeting time: MWF 10:10am-11:00am

Classroom: Lindley Hall (LH) 030

Course website: http://jones.ling.indiana.edu/~mdickinson/11/515/

Assignments, slides, etc. will be posted here.

Credits: 3

Course prerequisites: None. That means that no prior programming experience is expected.

Instructor: Markus Dickinson

Office: Memorial Hall (MM) 317

Phone: 856-2535

E-mail: md7@indianawaffle.edu (remove the food)

Office hours: (at least for the first week)

M11:30am-12:30pm
R 11:00am-12:00pm
or by appointment

Assistant Instructor: Amber Smith (smithamj@flapjackindiana.edu [remove the food])

Course requirements: There will be reading selections throughout the semester from a draft of a textbook. There will be approximately one exercise sheet, or homework, every two weeks. These assignments give you the opportunity to explore new aspects of the topics discussed in class, as well as to ensure that you are comprehending the material covered in class. These assignments will occasionally also give you the opportunity to practice your programming skills. Additionally, there will be in-class exercises which are included in your participation grade.

Readings: There is no textbook to purchase for this course, but there will be readings assigned throughout the course, including portions from a textbook-in-progress.

For each unit, slides will be available from the webpage before class. These slides are meant to aid classroom discussion and cannot replace actually being in class.

Grading: Grades will be based on classroom discussion/participation, homeworks, a midterm exam, and a final examination. For 515, there will be an additional final project.

L445 grading:
Participation10%
Homeworks 40%(8@5% each)
Midterm 25%Wednesday, October 12 @ 10:10am
Final 25%Wednesday, December 14 @ 10:15am-12:15pm

L515 grading:
Participation 5%
Homeworks 40%(8@5% each)
Midterm 18%Wednesday, October 12 @ 10:10am
Final 18%Wednesday, December 14 @ 10:15am-12:15pm
Final project19%Friday, December 16 @ 9:00am

Grading scale: (Scores in percentages)

A+99-100B+87-89C+77-79D+67-69F0-59
A 93-98B 83-86C 73-76D 63-66
A- 90-92B- 80-82C- 70-72D- 60-62

Make-up Policy: If you plan on missing either the midterm or final, you will have to provide extensive documentation for your excuse. See me immediately if this is the case.

Final Project (L515): For those enrolled at the 515 level, there is a final project requirement, the topics of which will be discussed individually with the instructor (beginning in October). The projects will generally be papers extending discussion of specific topics touched on in class, although they may also be implementations of a specific natural language processing algorithm (documented and evaluated) or evaluation of existing algorithms and software systems. The projects will be due on Friday, December 16 at 9:00am.

Practical sessions: To assist you in learning how to think logically & algorithmically, you are going to be taught some fundamentals of programming, using the Python programming language and other software tools. We will include this in various class sessions (not always listed on the syllabus), and I hope you will see some of the concepts discussed in class implemented in a program. I expect that most of you have absolutely no experience in programming and might be a little (or a lot) scared of it, and so I want to be clear that this material is not the main course material. It is meant as an aid to help think logically and for you to come to a better understanding about all the details that go into a program.

Academic Misconduct: Academic misconduct is not allowed in this course. The Indiana University Code of Student Rights, Responsibilities, and Conduct (http://dsa.indiana.edu/Code/) defines academic misconduct as “any activity that tends to undermine the academic integrity of the institution . . . Academic misconduct may involve human, hard-copy, or electronic resources . . . Academic misconduct includes, but is not limited to . . . cheating, fabrication, plagiarism, interference, violation of course rules, and facilitating academic misconduct” (II. G.1-6).

Students with Disabilities: Students who need an accommodation based on the impact of a disability should contact me to arrange an appointment as soon as possible to discuss the course format, to anticipate needs, and to explore potential accommodations.

I rely on Disability Services for Students for assistance in verifying the need for accommodations and developing accommodation strategies. Students who have not previously contacted Disability Services are encouraged to do so (812-855-7578; http://www.indiana.edu/~iubdss/).

Computational Linguistics: If you find yourself loving this material, I encourage you to come see me or Professor Sandra Kübler for more information about computational linguistics.

Disclaimer This syllabus is subject to change. All important changes will be made in writing, with ample time for adjustment. (Midterm and final dates, however, will not change.)

Schedule: Links to notes and homeworks will be posted on the course website.

MonthDateTopic Assignments








Aug. 29Intro to class
31Text & speech encoding: text (.pdf, 2x3.pdf)
Sep. 3Text & speech encoding: speech




5No class, Labor Day
7Writers’ aids: spelling correctors (.pdf, 2x3.pdf)
9Writers’ aids: spelling correctors HW1 due




12Writers’ aids: grammar correctors
14Guest lecture: Educational Testing Service (Ross Israel) (.pdf)
16No class - I’m gone




19Programming basics (.pdf, -2x3.pdf)
21Language Tutoring Systems (.pdf, 2x3.pdf)
23Language Tutoring Systems HW2 due




26Language Tutoring Systems
28Programming (2) (.pdf, -2x3.pdf)
30Searching (.pdf, 2x3.pdf) (handout) HW3 due




Oct. 3Searching: internals
5Searching: regular expressions (Handout: 1, 2)
7Programming (3) (.pdf, -2x3.pdf) HW4 due




10Midterm review (.pdf)
12MIDTERM MIDTERM
14Classifying documents (.pdf, 2x3.pdf)




17Classifying documents
19Classifying documents
21Playing with classifiers




24Practical session
26Cryptography HW5 due
28Cryptography




31Cryptography in Python
Nov. 2Machine Translation (MT) (.pdf, 2x3.pdf) (handout)
4Machine Translation (MT)




7Symbolic MT HW6 due (code)
9Symbolic MT
11Symbolic MT




14Statistical MT
16Statistical MT
18Statistical MT




21N-grams in Python (.pdf, -2x3.pdf, files (.tgz)) HW7 due
23No class, Thanksgiving break
25No class, Thanksgiving break




28Dialogue systems: dialogue (.pdf, 2x3.pdf)
30Dialogue systems: chatterbots
Dec. 2Dialogue systems: modern systems (.pdf)




5Dialogue systems: modern systems
7Impact of language technology use HW8 due
9Final review (.pdf)




14FINAL EXAM: Wednesday, December 1410:15am-12:15pm
16FINAL PROJECT (L515) due by 9am