Getting up to speed with the latest Machine Learning concepts doesn’t have to be complicated.

While there’s an abundance of introductory courses and tutorials on the basics, it can be harder to dive deeper and learn foundational concepts quickly. This post includes a round-up of our top 20 AI and Machine Learning tutorials, as well as recommended readings, on topics ranging from bias and fairness in AI, few-shot learning and meta-learning, auxiliary tasks in deep reinforcement learning, variational autoencoders, and more. 

A blue and green circle overlapping to create a teal section.

Tutorial #1: bias and fairness in AI

This tutorial discusses how bias can be introduced into the machine learning pipeline, what it means for a decision to be fair, and methods to remove bias and ensure fairness. As machine learning algorithms are increasingly used to determine important real-world outcomes such as loan approval, pay rates, and parole decisions, it is incumbent on the AI community to minimize unintentional unfairness and discrimination.

📖 Further Reading: 

An abstract representation of meta learning.

Tutorial #2: few-shot learning and meta-learning I

This tutorial describes few-shot and meta-learning problems and introduces a classification of methods. We also discuss methods that use a series of training tasks to learn prior knowledge about the similarity and dissimilarity of classes that can be exploited for future few-shot tasks. 

📖 Further Reading: 

Featured image of Tutorial #3, few-shot learning and meta-learning II

Tutorial #3: few-shot learning and meta-learning II

In part II of our tutorial on few-shot and meta-learning, we discuss methods that incorporate prior knowledge about how to learn models and that incorporate prior knowledge about the data itself. These include three distinct approaches “learning to initialize”, “learning to optimize” and “sequence methods”. 

📖 Further Reading: 

Tutorial #4: auxiliary tasks in deep reinforcement learning

Tutorial #4: auxiliary tasks in deep reinforcement learning

This tutorial focuses on the use of auxiliary tasks to improve the speed of learning in the context of deep reinforcement learning (RL). Auxiliary tasks are additional tasks that are learned simultaneously with the main RL goal and that generate a more consistent learning signal. The system uses these signals to learn a shared representation and hence speed up the progress on the main RL task. Additionally, examples from a variety of domains are explored. 

📖 Further Reading: 

Featured image of "Tutorial #5: variational autoencoders" blog

Tutorial #5: variational autoencoders

In this tutorial, we discuss latent variable models in general and then the specific case of the non-linear latent variable model. We’ll see that maximum likelihood learning of this model is not straightforward, but we can define a lower bound on the likelihood. We then show how the autoencoder architecture can approximate this bound using a Monte Carlo (sampling) method. To maximize the bound, we need to compute derivatives, but unfortunately, it’s not possible to compute the derivative of the sampling component. We’ll show how to side-step this problem using the reparameterization trick. Finally, we’ll discuss extensions of the VAE and some of its drawbacks.

📖 Further Reading: 

Tutorial #6: neural natural language generation – decoding algorithms

Neural natural language generation (NNLG) refers to the problem of generating coherent and intelligible text using neural networks. Example applications include response generation in dialogue, summarization, image captioning, and question answering. In this tutorial, we assume that the generated text is conditioned on an input. For example, the system might take a structured input like a chart or table and generate a concise description. Alternatively, it might take an unstructured input like a question in text form and generate an output which is the answer to this question.

📖 Further Reading: 

Tutorial #7: neural natural language generation – sequence level training

In this tutorial, we consider alternative training approaches that compare the complete generated sequence to the ground truth at the sequence level. We’ll consider two families of methods; in the first, we take models that have been trained using the maximum likelihood criterion and fine-tune them with a sequence-level cost function — we’ll consider using both reinforcement learning and minimum risk training for this fine-tuning.

📖 Further Reading: 

Featured image of "Tutorial #8: Bayesian optimization" blog

Tutorial #8: Bayesian optimization

In this tutorial, we dive into Bayesian optimization, its key components, and applications. Optimization is at the heart of machine learning; Bayesian optimization specifically is a framework that can deal with many optimization problems that will be discussed. The core idea is to build a model of the entire function that we are optimizing. This model includes both our current estimate of that function and the uncertainty around that estimate. By considering this model, we can choose where next to sample the function. Then we update the model based on the observed sample. This process continues until we are sufficiently certain of where the best point on the function is.

📖 Further Reading: 

Featured image of Tutorial #9: SAT Solvers 1: Introduction and applications

Tutorial #9: SAT Solvers I: Introduction and Applications

This tutorial concerns the Boolean satisfiability or SAT problem. We are given a formula containing binary variables that are connected by logical relations such as OR and AND. We aim to establish whether there is any way to set these variables so that the formula evaluates to true. Algorithms that are applied to this problem are known as SAT solvers.

📖 Further Reading: 

Tutorial #10: SAT Solvers II: Algorithms

In this tutorial we focus exclusively on the SAT solver algorithms that are applied to this problem. We’ll start by introducing two ways to manipulate Boolean logic formulae. We’ll then exploit these manipulations to develop algorithms of increasing complexity. We’ll conclude with an introduction to conflict-driven clause learning which underpins most modern SAT solvers.

📖 Further Reading: 

Tutorial #11: SAT Solvers III: Factor graphs and SMT solvers

This third and final tutorial of our SAT solvers series is divided into two sections, each of which is self-contained. First, we consider a completely different approach to solving satisfiability problems which are based on factor graphs. Second, we will discuss methods that allow us to apply the SAT machinery to problems with continuous variables.

📖 Further Reading: 

Tutorial #12: Differential Privacy I: Introduction

In this two-part tutorial, we explore the issue of privacy in machine learning. In part I, we discuss definitions of privacy in data analysis and cover the basics of differential privacy. Part II considers the problem of how to perform machine learning with differential privacy.

📖 Further Reading: 

Tutorial #13: Differential Privacy II: machine learning and data generation

In this tutorial, we present recent methods for making machine learning differentially private. We will also discuss differentially private methods for generative modelling, which provide an enticing solution to a seemingly intractable problem: how can we release data for general use while still protecting privacy?

📖 Further Reading: 

Tutorial #14: Transformers I: Introduction

This tutorial introduces self-attention, which is the core mechanism that underpins the transformer architecture. We then describe transformers themselves and how they can be used as encoders, decoders, or encoder-decoders using well-known examples such as BERT and GPT3. This discussion will be suitable for someone who knows machine learning, but who is not familiar with the transformer.

📖 Further Reading: 

  • Attention Is All You Need by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin

Tutorial #15: Parsing I context-free grammars and the CYK algorithm

In this tutorial, we review earlier work that models grammatical structure. We introduce the CYK algorithm, which finds the underlying syntactic structure of sentences and forms the basis of many algorithms for linguistic analysis.  The algorithms are elegant and interesting for their own sake. However, we also believe that this topic remains important in the age of large transformers. We hypothesize that the future of NLP will consist of merging flexible transformers with linguistically informed algorithms to achieve systematic and compositional generalization in language processing.

📖 Further Reading: 

Tutorial #16: Transformers II: Extensions

For this tutorial we’ll focus on two families of modifications that address limitations of the basic architecture and draw connections between transformers and other models. This blog will be suitable for someone who knows how transformers work, and wants to know more about subsequent developments.

📖 Further Reading: 

Tutorial #17: Transformers III Training

This Tutorial discusses the challenges with transformer training dynamics and introduces some of the tricks that practitioners use to get transformers to converge. This discussion will be suitable for researchers who already understand transformer architecture and who are interested in training transformers and similar models from scratch.

📖 Further Reading: 

Tutorial #18: Parsing II: WCFGs, the inside algorithm, and weighted parsing

We introduce weighted context-free grammars or WCFGs. These assign a non-negative weight to each rule in the grammar. From here, we can assign a weight to any parse tree by multiplying the weights of its component rules together. We present two variations of the CYK algorithm that apply to WCFGs. (i) The inside algorithm computes the sum of the weights of all possible analyses (parse trees) for a sentence. (ii) The weighted parsing algorithm finds the parse tree with the highest weight.

Tutorial #19: Parsing III: PCFGs and the inside-outside algorithm

This Tutorial covers probabilistic context-free grammars or PCFGs, which are are a special case of WCFGs. They are featured more than WCFGs in the earlier statistical NLP literature and in most teaching materials. As the name suggests, they replace the rule weights with probabilities. We will treat these probabilities as model parameters and describe algorithms to learn them for both the supervised and the unsupervised cases. The latter is tackled by expectation-maximization and leads us to develop the inside-outside algorithm which computes the expected rule counts that are required for the EM updates.

📖 Further Reading: 

Featured image of Understanding XLNet blog

Tutorial #20: Understanding XLNet

This tutorial provides an overview of XLNet, an auto-regressive language model designed for natural language processing tasks. XLNet combines the transformer architecture with recurrence, allowing for bidirectional context learning. The article includes a technical overview of XLNet’s pre-training and fine-tuning procedures, as well as its performance on benchmark datasets.

📖 Further Reading: