Learning Objectives
- Data types: intfloatstrlistset
- Concepts: assignment functions types-vs-tokens tokenization normalization frequency distributions unit tests
- Tools: notebooks NLTK
(color key: Python/Programming NLP/CL Software Engineering)
Reading
The readings for this week come from the official Python tutorial. The topic is “Using Python as a Calculator”, but it is a good introduction to numbers, strings, and lists.
Additionally, please read the section on sets (only this section, not the rest of the chapter):
It helps to play with a Python interpreter while reading. Open up Visual Studio Code’s terminal and start Python (e.g., run python3 or py at the command prompt), then try out the examples for yourself.
Testing Your Knowledge
There are two methods not mentioned in the tutorial:
- str.split()– splits a string on whitespace and returns a list of substrings
- list.count(x)– return the number of times that- xoccurs in a sequence (e.g., a list or a string)
Given the following string:
s = ('There are seven days, there are seven days, '
     'there are seven days in a week. '
     'Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday')Try to answer the following questions:
- How many times does the word “day” occur in the string?
- How many times do the tokens “day”, “days”, and “days,” (note the comma) occur in the list of tokens (use split())?
- How many tokens are there in total?
- Find the relative frequency of the token “are” (number of times it occurs over the count of all tokens)
- What is the set of unique words?
- What is the set of unique letters?