Learning Objectives
- Concepts: packages licensing testing documentation distribution
(color key: Python/Programming NLP/CL Software Engineering)
Review and Quiz
For the review, please see the review of course topics, particularly those for the first quiz.
Working with Software Projects
As you develop your experiments into software projects that others might get value from, you might start to think about how to distribute your code. There are many practices and tools to support software distribution. Some of them are:
But first, let’s look at how to use Python packages to organize code.
Python Packages
When you write some file hw6.py
, then it is a Python module that can be imported as follows:
When working with larger projects, you may need multiple modules, and you may want to group them into packages. On the computer, these are just folders with a special __init__.py
file.
Please read the Python Tutorial’s sections on modules and packages:
Organizing your code into modules and packages is one of the first steps to making your code into a project that others might be able to use.
Licensing
If you want others to use your code, simply making it available (e.g., on GitHub) is not sufficient. The software needs a license which allows others to use, modify, and/or redistribute the code. There are many kinds of open-source licenses, but they generally fall into two groups: permissive and copyleft (see the MIT and GPL licenses, respectively, linked below).
Licenses are written in legalese that are hard to understand, but the website https://choosealicense.com/ makes it easy. For instance, you can see simple overviews of the MIT or GPL licenses, alongside their legal text.
Please read what happens if you don’t choose a license.
Testing
As we’ve witnessed in this course already, code that works on one system might not work well on another. Software testing is the practice of writing tests to help ensure that code works as intended. There are many kinds of tests (skim the Wikipedia article to get an idea), and software testing is a whole career path, but I’d like you to focus on just a few:
- Unit testing – ensure each unit (function, module, etc.) is correct
- Regression testing – ensure behavior of a system is correct over time (i.e., across changes)
- Integration testing – ensure that parts of the system interact with each other without errors
Please see this StackOverflow answer.
Documenting
Documentation of code helps others (sometimes the author of the code) understand how to use it, what its limitations are, or why it is implemented in a certain way. Please read the following article, which is a good introduction to the documentation of Python projects: https://docs.python-guide.org/writing/documentation/
Distributing
Getting your code into the hands of others is software distribution. One way to do this is to put it up on a public code host, such as GitHub. For Python projects, you might add them to the Python Package Index (PyPI), which allows them to be installed easily using pip
(as we have done in this class, for instance here’s the PyPI entry for the NLTK: https://pypi.org/project/nltk/). Adding projects to PyPI requires some metadata, such as the author, compatible Python versions, dependencies, etc. Traditionally users would create a setup.py
file (see this tutorial, if you’re curious), but recently there have been some more convenient ways of packaging Python (such as Flit or Poetry), which use a new standard file called pyproject.toml
.
Testing Your Knowledge
Go the project page for PyDelphin: https://github.com/delph-in/pydelphin
Answer the following questions:
- What is the license of the project?
- Is it distributed on PyPI?
- Does it have documentation? If so, where is it at?
- What is the minimum version of Python required?
- Did it ever support Python 2? If so, when (or what version) was this support removed? (hint: which file would you look in for a summary of changes?)
- Is the project actively developed? (e.g., when was the last commit?)
Install the project in a virtual environment with:
pip install pydelphin[web]
(the
[web]
part tells it to get some additional dependencies)Now try to follow the documentation to get a parse of the sentence “Kim left and Sandy had lunch, today.” using the web client.
- How many readings (i.e., parses) for the sentence are there?
- How many results were returned?
- How can you increase the number of results returned?