Week 7

Learning Objectives

(color key: Python/Programming NLP/CL Software Engineering)

Review and Quiz

For the review, please see the review of course topics, particularly those for the first quiz.

Working with Software Projects

As you develop your experiments into software projects that others might get value from, you might start to think about how to distribute your code. There are many practices and tools to support software distribution. Some of them are:

But first, let’s look at how to use Python packages to organize code.

Python Packages

When you write some file hw6.py, then it is a Python module that can be imported as follows:

import hw6

When working with larger projects, you may need multiple modules, and you may want to group them into packages. On the computer, these are just folders with a special __init__.py file.

Please read the Python Tutorial’s sections on modules and packages:

Organizing your code into modules and packages is one of the first steps to making your code into a project that others might be able to use.

Licensing

If you want others to use your code, simply making it available (e.g., on GitHub) is not sufficient. The software needs a license which allows others to use, modify, and/or redistribute the code. There are many kinds of open-source licenses, but they generally fall into two groups: permissive and copyleft (see the MIT and GPL licenses, respectively, linked below).

Licenses are written in legalese that are hard to understand, but the website https://choosealicense.com/ makes it easy. For instance, you can see simple overviews of the MIT or GPL licenses, alongside their legal text.

Please read what happens if you don’t choose a license.

Testing

As we’ve witnessed in this course already, code that works on one system might not work well on another. Software testing is the practice of writing tests to help ensure that code works as intended. There are many kinds of tests (skim the Wikipedia article to get an idea), and software testing is a whole career path, but I’d like you to focus on just a few:

Please see this StackOverflow answer.

Documenting

Documentation of code helps others (sometimes the author of the code) understand how to use it, what its limitations are, or why it is implemented in a certain way. Please read the following article, which is a good introduction to the documentation of Python projects: https://docs.python-guide.org/writing/documentation/

Distributing

Getting your code into the hands of others is software distribution. One way to do this is to put it up on a public code host, such as GitHub. For Python projects, you might add them to the Python Package Index (PyPI), which allows them to be installed easily using pip (as we have done in this class, for instance here’s the PyPI entry for the NLTK: https://pypi.org/project/nltk/). Adding projects to PyPI requires some metadata, such as the author, compatible Python versions, dependencies, etc. Traditionally users would create a setup.py file (see this tutorial, if you’re curious), but recently there have been some more convenient ways of packaging Python (such as Flit or Poetry), which use a new standard file called pyproject.toml.

Testing Your Knowledge

Go the project page for PyDelphin: https://github.com/delph-in/pydelphin

Answer the following questions: