There is quite a bit of background knowledge needed for computational linguistics, both in linguistics and in computer science. Hardly anybody knows everything necessary when they arrive. Usually people are strong in one area and deficient in another.


  1. [ Learn Unix] : Unix is far friendlier to programmers than Windows, and all our systems are Unix of one flavour or another. Most are Macs: Macs are an easy way to get started with Unix because they have an excellent user interface. Or you can download a Linux virtual machine if you don't want to install it. If you decide to use Linux, which is basically open-source Unix, Ubuntu is a very nice, user-friendly distro.

  2. Learn to program : [ Eric Raymond recommends learning Python, Perl, C and Lisp] to make sure you really know how to program. Those four capture most of the ways of looking at programming. Of the four, Python is the best place to start because it's the easiest and because there is a lot of Python code written in IU comp ling. Once you can read at least one language, a good way to improve is to read the source of interesting programs. For example, I like computational OT, so I read the source of [ OTSoft] and [ RUBOT/FRed].

  3. Learn a markup language : Once you know Unix and know programming, you will have acquired a distaste for binary formats (human-unreadable formats) because they just don't work well with the rest of the system. The way to escape Microsoft Word is to learn a markup language. The standard for publishing papers is LaTeX, and the standard for web pages is HTML. The basics of both are easy, although web browsers are far more forgiving than the LaTeX compiler. However, learning LaTeX pays off by making publishing a paper very much easier. Learning XML is also a good idea for marking up content. Paired with XSLT, you can make a database and format it for your webpage!

See also: ["How to use SSH"]

See also: JonesServer


I am not an expert here since I didn't have any linguistic training until arriving at grad school. But it is a lot easier in a linguistics department to pick up linguistics because you have classes taught by linguists and filled with other linguistics students. My personal experience suggests that the historical background of linguistics is hard to pick up because everyone usually talks about current theories.

I also learned a lot of vocabulary by attending any talk I heard of during my first year. -- NathanSanders

GettingStarted (last edited 2008-08-04 20:45:11 by NathanSanders)