Linguistics 645
Advanced Natural Language Processing (NLP)

Autumn 2012

Course goals In recent years, statistical methods have become the standard in the field of natural language processing (NLP). This course gives an introduction to statistical models and machine learning paradigms in NLP. Such methods are helpful for reaching wide coverage, reducing ambiguity, automatic learning, increasing robustness, etc.

In this course, we will cover basic notions in probability and information theory, focusing on the concepts needed for NLP. Then we will discuss (Hidden) Markov Models, exemplified by an approach to POS tagging. The following sessions will be dedicated to probabilistic approaches to parsing, focusing on probabilistic context-free grammars.

In the last third of the class, the plan is to cover semantics in NLP, examining word sense disambiguation before looking in detail at semantic role labeling. We will be focusing on statistical methods in the context of particular tasks, but all of the methods we will use are applicable to a range of tasks in NLP. Thus, this course provides an essential platform for further work in NLP.

Meeting time: MW, 2:30–3:45pm

Classroom: Memorial Hall (MM) 401

Course website: Assignments, slides, etc. will be posted here.

Credits: 3

Course prerequisites: Computation and Linguistic Analysis (L545) or permission of instructor. Some programming experience is expected.

Instructor: Markus Dickinson

Office: Memorial Hall (MM) 317

Phone: 856-2535

E-mail: (remove the animal name)

Office hours:

R 11:00am–12:00pm
or by appointment


Course requirements:

Academic Misconduct: Academic misconduct is not allowed in this course. The Indiana University Code of Student Rights, Responsibilities, and Conduct ( defines academic misconduct as “any activity that tends to undermine the academic integrity of the institution . . . Academic misconduct may involve human, hard-copy, or electronic resources . . . Academic misconduct includes, but is not limited to . . . cheating, fabrication, plagiarism, interference, violation of course rules, and facilitating academic misconduct” (II. G.1-6).

Students with Disabilities: Students who need an accommodation based on the impact of a disability should contact me to arrange an appointment as soon as possible to discuss the course format, to anticipate needs, and to explore potential accommodations.

I rely on Disability Services for Students for assistance in verifying the need for accommodations and developing accommodation strategies. Students who have not previously contacted Disability Services are encouraged to do so (812-855-7578;

(Tentative) Schedule:




Aug. 20Intro to class

MS, ch. 1

22Probability Theory

MS, 2.1

27Probability Theory

KS, 1.1–1.4


MS, ch. 5

HW1 due

Sep. 3Labor Day, no classes

5Information Theory

MS, 2.2; KS, 2.2

10Corpora and Linguistic Annotation

MS, ch. 3, 4

HW2 due
12Markov Chains, Markov Models

MS, 9.1; KS, 2.1.1–2.1.3

17N-gram POS tagging

HW3 due
19Practical POS tagging


JM, 4.5; MS, 6.2

26Hidden Markov Models

MS, 9.2–9.3; KS, 2.1.4

HW4 due

Oct. 1Calculating P(O)

MS, 9.3.1; KS, 2.1.5

3Finding the Optimal State Sequence

MS, 9.3.2; KS, 2.1.6

HW5 due

8Parameter Estimation

MS, 9.3.3; KS, 2.1.7

10CYK parsing

JM, 13–13.4.1

HW6 due

15Practical Parsing I

17Probabilistic Context-Free Grammars

MS, 11.1–11.3.3


JM, 14.2, 14.4–14.5

HW7 due
24Probabilistic Parsing

MS, ch. 12

29Practical Parsing II

31Estimating PCFGs

MS, 11.3.4–11.4

HW8 due

Nov. 5Estimating PCFGs

7Word Sense Disambiguation

MS, ch. 7

12SRL: Semantic Roles

PGX, ch. 1

14SRL: Lexical Resources

PGX, ch. 2

HW9 due

19Thanksgiving break, no classes

21Thanksgiving break, no classes

26SRL: Features

PGX, ch. 3, p. 31–43

28SRL: Methods

PGX, ch. 3, p. 44–52

Dec. 3SRL: Cross-Linguality

PGX, ch. 4

HW10 due
5SRL: Wrap-up

PGX, ch. 5

14Final HW/Project due

Final HW due

Disclaimer This syllabus is subject to change and likely will change. All important changes will be made in writing, with ample time for adjustment.