Computational linguistics is an interdisciplinary field that addresses the use of computers to process or produce human language. Linguistics contributes to this field an understanding of the special properties of language data, and also provides theories and descriptions of language structure and use. Computational linguistics is largely an applied discipline concerned with practical problems. Typical applications include natural language processing, machine translation (translating from one language to another), speech synthesis, speech production, information retrieval (finding relevant documents or parts of documents in large collections of texts) cognitive modeling, and, in general, almost anything dealing with natural language interfaces.
If you started earlier, please check your year's requirements in the bulletins under "University Graduate School".
The Concentration in Computational Linguistics combines both general linguistic coursework and computational specific coursework.
A minimum of 90 credit hours, including dissertation. Specific requirements include LING-L545, LING-L645, LING-L615, LING-L555, one graduate-level course each in phonetics, phonology, and syntax, plus at least two additional courses in linguistics at the 600-700 levels.
LING-L555 may be waived if the student has previously completed equivalent coursework. In addition to the required core coursework, a student's advisory committee may assign other courses as appropriate and relevant to that student's particular program. These may include courses such as the following:
The choice of a minor field should be agreed to by the student’s advisory committee. The specific requirements for the minor are established by the department that grants the minor. The student is responsible for ascertaining what those requirements are and for meeting them.
Typical minors would include Cognitive Science, Computer Science, Informatics, Information and Library Science, or any of the language departments.
All students in the Ph.D. program will select an advisory committee consisting of at least three faculty members, one of whom should normally represent the student’s minor field. The committee must be selected no later than the end of the semester following the completion of the master’s degree at Indiana University, or, in the case of students entering the program with master’s degrees from other institutions, no later than two semesters after matriculation.
Students will plan their programs with the advisory committee, which will be responsible for counseling students with regard to the qualifying examination, setting the examination, and administering it.
Knowledge of the structure of a language other than English and outside the student’s general language family (choice to be determined in consultation with the student's advisory committee).
This requirement of knowledge of the structure of an "exotic" language can be fulfilled in several ways (1) through a one-semester "structure course" (e.g., "Structure of Mongolian", "Arabic Syntax", etc.); (2) through a two-semester introductory language course (e.g., Beginning Swahili), or (3) through the field methods sequence (LING-L653 to LING-L654). As for what counts as "outside the student’s general language family", this has been interpreted to mean outside Indo-European for English speakers (although Hindi or Bengali might be okay depending on the logic of the student’s program) and outside Semitic for Arabic speakers, to give just a couple of examples.
The student must demonstrate proficiency (1), in the basics of discrete mathematics or mathematical linguistics, which can be met by courses such as COGS-Q520 Mathematics and Logic in Cognitive Science or L611 Models of Linguistic Structure; and (2) in programming techniques, with working knowledge of at least two programming languages.
Completion of LING-L555 satisfies working knowledge of one programming language. Students then need a second programming language. Preferred languages are either Java or C++. Students should consult their academic advisor about what course would be most appropriate to take.
The qualifying exam is comprehensive; the examination is on two distinct areas of computational linguistics and/or linguistics. At least one of the qualifying examinations must entail a practical software artifact. The artifact may be a program, a computational grammar, an implemented scheme for corpus annotation, or some other approved artifact. The other examination may take the form of a written paper (of publishable quality) or a written exam. Specific focus and scheduling of the examination is determined by the student's advisory committee.
After nomination to candidacy, the student will select a research committee composed of no fewer than three members of the Department of Linguistics faculty and an outside representative. This committee must approve the proposed dissertation topic.
Oral defense of dissertation. This defense is normally open.