Learning Objectives
- Data types:
int
float
str
list
set
- Concepts: assignment functions types-vs-tokens tokenization normalization frequency distributions unit tests
- Tools: notebooks NLTK
(color key: Python/Programming NLP/CL Software Engineering)
Reading
The readings for this week come from the official Python tutorial. The topic is “Using Python as a Calculator”, but it is a good introduction to numbers, strings, and lists.
Additionally, please read the section on sets (only this section, not the rest of the chapter):
It helps to play with a Python interpreter while reading. Open up Visual Studio Code’s terminal and start Python (e.g., run python3
or py
at the command prompt), then try out the examples for yourself.
Testing Your Knowledge
There are two methods not mentioned in the tutorial:
str.split()
– splits a string on whitespace and returns a list of substringslist.count(x)
– return the number of times thatx
occurs in a sequence (e.g., a list or a string)
Given the following string:
s = ('There are seven days, there are seven days, '
'there are seven days in a week. '
'Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday')
Try to answer the following questions:
- How many times does the word “day” occur in the string?
- How many times do the tokens “day”, “days”, and “days,” (note the comma) occur in the list of tokens (use
split()
)? - How many tokens are there in total?
- Find the relative frequency of the token “are” (number of times it occurs over the count of all tokens)
- What is the set of unique words?
- What is the set of unique letters?