In recent years, statistical methods have become the standard in the field of natural language processing (NLP). This course gives an introduction to statistical models and machine learning paradigms in NLP. Such methods are helpful for reaching wide coverage, reducing ambiguity, automatic learning, increasing robustness, etc.
In this course, we will cover basic notions in probability and information theory, focusing on the concepts needed for NLP. Then we will discuss (Hidden) Markov Models, exemplified by an approach to POS tagging. The following sessions will be dedicated to probabilistic approaches to parsing, focusing on probabilistic context-free grammars.
Additionally, we will cover semantic role labeling, word sense disambiguation, and (if time) statistical alignment methods and their use in machine translation. We will be focusing on statistical methods in the context of particular tasks, but the methods we discuss are applicable to a range of tasks in NLP. Thus, this course provides an essential platform for further work in NLP.
|or by appointment|
Academic misconduct is not allowed in this course. The Indiana University Code of Student Rights, Responsibilities, and Conduct (http://dsa.indiana.edu/Code/) defines academic misconduct as ``any activity that tends to undermine the academic integrity of the institution . . . Academic misconduct may involve human, hard-copy, or electronic resources . . . Academic misconduct includes, but is not limited to . . . cheating, fabrication, plagiarism, interference, violation of course rules, and facilitating academic misconduct'' (II. G.1-6).
Students who need an accommodation based on the impact of a disability should contact me to arrange an appointment as soon as possible to discuss the course format, to anticipate needs, and to explore potential accommodations.
I rely on Disability Services for Students for assistance in verifying the need for accommodations and developing accommodation strategies. Students who have not previously contacted Disability Services are encouraged to do so (812-855-7578; http://www.indiana.edu/~iubdss/).
|Aug.||31||Intro to class (.pdf, .2x3pdf)||MS, ch. 1|
|Sep.||2||Probability Theory (.pdf, .2x3pdf) (handout)||MS, 2.1|
|7||Probability Theory||KS, 1.1-1.4|
|9||Collocations (.pdf, .2x3pdf)||MS, ch. 5||HW1 due|
|14||Information Theory (.pdf, .2x3pdf)||MS, 2.2; KS, 2.2|
|16||Corpora and Linguistic Annotation (.pdf, .2x3pdf)||MS, ch. 3, 4||HW2 due|
|21||FSAs (.pdf, .2x3pdf); Markov Chains \& Models (.pdf, .2x3pdf)||MS, 9.1; KS, 2.1.1-2.1.3|
|23||N-gram POS tagging (.pdf, .2x3pdf)||HW3 due|
|28||Practical POS tagging (.pdf, .2x3pdf)|
|30||Smoothing (.pdf, .2x3pdf)||JM, 4.5; MS, 6.2||HW4 due|
|Oct.||5||Hidden Markov Models (.pdf, .2x3pdf)||MS, 9.2-9.3; KS, 2.1.4|
|7||Calculating P(O) (.pdf, .2x3pdf)||MS, 9.3.1; KS, 2.1.5|
|12||Finding the Optimal State Sequence||MS, 9.3.2; KS, 2.1.6|
|14||Parameter Estimation (.pdf, .2x3pdf)||MS, 9.3.3; KS, 2.1.7||HW5 due|
|19||CYK parsing (.pdf, .2x3pdf)||JM, 13-13.4.1|
|21||Practical Parsing I||HW6 due|
|26||Probabilistic Context-Free Grammars (.pdf, .2x3pdf)||MS, 11.1-11.3.3|
|28||PCFGs (.pdf, .2x3pdf)||JM, 14.2, 14.4-14.5||HW7 due|
|Nov.||2||Probabilistic Parsing (.pdf, .2x3pdf) + Evaluation (.pdf, .2x3pdf)||MS, ch. 12|
|4||Practical Parsing II (.pdf, .2x3pdf, files)|
|9||Estimating PCFGs (.pdf, .2x3pdf)||MS, 11.3.4-11.4|
|11||Estimating PCFGs||HW8 due|
|16||Beyond PCFGs (.pdf, .2x3pdf)||MS, ch. 8, Charniak and Johnson (2005); McClosky et al. (2006)|
|18||Beyond PCFGs||Petrov et al. (2006); Klein and Manning (2003)||HW9 due|
|23||PP attachment||Merlo and Ferrer (2006)|
|25||NO CLASS: Thanksgiving|
|30||Semantic role labeling (SRL) (.pdf, .2x3pdf)||Márquez et al. (2008)|
|Dec.||2||SRL||Toutanova et al. (2008) or Pradhan et al. (2008)||HW10 due|
|7||Word Sense Disambiguation||MS, ch. 7|
|9||Statistical Machine Translation (SMT)||MS, ch. 13|
|17||Final HW/Project due||Final HW due|