Week 4

Learning Objectives

(color key: Python/Programming NLP/CL Software Engineering)

Reading

Constants

Python has several special constant values (“constant” meaning they have predefined, unchangeable values). For present purposes, we only care about True, False, and None. The Dive Into Python book has a good and concise description of these:

Another thing to add about None is that it is often used as a placeholder for optional arguments and it is the return value of function with no return statement or an empty return statement. For example:

>>> def func(x=None):
...     print('this function prints', x, 'but returns None')
... 
>>> x = func()
this function prints None but returns None
>>> print(x)
None
>>> x = func(5)
this function prints 5 but returns None
>>> print(x)
None

Data Types

The Python Tutorial has some good entries on tuple and dict:

In addition, bool is the type of the constants True and False. In practice the explicit use of the bool() function is rarely necessary as it is implicit in an if statement, but it can be useful in interactive sessions for determining the boolean value of objects:

>>> bool()           # bool of nothing is False
False
>>> bool(True)       # these are almost tautological...
True
>>> bool(False)
False
>>> bool(0)          # 0 is the only False-valued integer
False
>>> bool(-1)         # all other integers are True
True
>>> bool(99999)
True
>>> bool('')         # the empty string is the only False string
False
>>> bool('foo')      # all others are True
True
>>> bool('False')    # even deceptive ones
True
>>> bool([])         # empty containers (list, tuple, set, dict, etc.) are False
False
>>> bool([1, 2, 3])  # all others are True
True
>>> bool([[]])       # even if their contents would be False
True

Functions

For further topics on functions, see the Python Tutorial’s section on default and keyword arguments:

Text Corpora

The NLTK provides interfaces to a variety of common and freely-available corpora. Read chapter 2 section 1 of the NLTK book to get an overview. You don’t need to follow all the code examples, but try to be able to answer questions like these:

You don’t have to read all the section in 2.1. Focus on these for now:

Conditional Frequency Distribution

Earlier we discussed frequency distribution and used the NLTK’s FreqDist class. Now we will introduce conditional frequency distributions. Please read the following:

Testing Your Knowledge

Dictionaries

Get a feel for Python’s dict type by creating and inspecting some dictionaries. Use help(dict) in Python to browse the available methods (ignore the ones that start with __ for now). Try to read a list of words and create a dictionary mapping each letter to the set of words starting with the letter. For example:

>>> def letter_lookup(words):
...     # your code here
>>> d = letter_lookup('python programming provides endless possibilities'.split())
>>> d['p']
{'python', 'provides', 'programming', 'possibilities'}
>>> d['e']
{'endless'}

Text Corpora and Conditional Frequencies