Week 12

Reading

Ethics in NLP

Computational linguistics and natural language processing have, in the recent decade or two, been quickly propelled from somewhat niche academic interest to a huge industry with many consumer-facing products (e.g., Google Translate, Siri, autocomplete, etc.) and other applied systems (stock market prediction, risk assessment, hate speech detection, etc.). Unfortunately, the explosion of popularity did not coincide with increased understanding of the ethical questions particular to the field.

Please read the following short paper as a broad overview of the problems:

Questions

Additional Reading

Language Models

A “language model” is a model that gives a probability for a sequence of words and/or predicts missing (e.g., next) words in a sequence. They are used for judging the fluency of machine translation outputs and for text generation. The traditional method of creating language models uses n-grams. For this, read the following sections from Jurafsky and Martin’s Speech and Natural Language Processing:

In recent years, “neural language models”, in particular transformer-based models like BERT and GPT-3, have completely altered the direction of academic research and industry applications of NLP. The quality of their generated text is so good that it is often indistinguishable from human-produced text, at least when the prompts or outputs are carefully selected. Advocates of these models get mesmerized by the “magic” of their uninterpretable performance and may claim that they “understand” or “comprehend” text. Critics point out that they have only memorized and resynthesized linguistic form without any other signals (e.g., vision, sound, joint attention and social context, etc.) and cannot possibly have any true understanding of the world. The Bender and Koller (2020) paper (see “additional reading”, below) does an in-depth look at the claims of machine comprehension (and don’t miss the humorous example outputs of GPT-2 in the appendix).

Additional Reading

Software

The following are some additional libraries for Python that you may find useful for NLP:

There are tons more. E.g., here’s an “Awesome” list of NLP software for Python (and other languages): https://github.com/keon/awesome-nlp#user-content-python