HG2051 – Language and the Computer

Michael Wayne Goodman <michaelgoodman@ntu.edu.sg>

Wednesdays, 09:30–12:30

LHS-TR+52 (Hive Level 2)

Traditionally linguistic analysis was done largely by hand, but computer-based methods and tools are becoming increasingly widely used in contemporary research. This course provides an introduction to skills and resources that can assist the linguist in performing fast, flexible, and accurate quantitative analyses. Students will learn a programming language (Python) along with techniques for processing human language data. No previous programming experience is required: we will teach you the basics of programming and computational linguistics along with some good software engineering practices.

Schedule

Week Date Topic Notes
1 12 Aug What is Computational Linguistics? Why do it? Why use Python?
2 19 Aug Basic Types and Data Structures; Using Python to Count Things
3 26 Aug Assignment, Expressions, and Control
4 02 Sep Text Corpora and Conditional Frequencies notebook
5 09 Sep Lexical Resources and WordNet notebook
6 16 Sep Processing Raw Text notebook
7 23 Sep Mid-review; Working with Software Projects group project
30 Sep Recess
8 07 Oct Regular Expressions notebook
9 14 Oct N-Grams and Collocations notebook
10 21 Oct Part-of-speech Tagging notebook
11 28 Oct Classification notebook
12 04 Nov Exploring Software Libraries, Language Models, Ethics in CL notebook
13 11 Nov Review notebook (completed)

Course Pages

Grading Criteria

This course is graded with continuous assessment as follows:

You may also get 1–5% extra credit (not exceeding 100% in the course) by submitting a contribution (e.g., code or documentation) to an open-source project. Contact me for details.

Resources

Acknowledgments

Previous years of this course were taught by: