Deep Learning with PyTorch: An Introduction

In this tutorial, you’ll get an introduction to deep learning using the PyTorch framework, and by its conclusion, you’ll be comfortable applying it to your deep learning models. Facebook launched PyTorch 1.0 early this year with integrations for Google Cloud, AWS, and Azure Machine Learning. In this tutorial, I assume that you’re already familiar with Scikit-learn, Pandas, NumPy, and SciPy. These packages are important prerequisites for this tutorial.

Plan of Attack

What is deep learning?
Introduction to PyTorch
Why you’d prefer PyTorch to other Python Deep Learning Libraries
PyTorch Tensors
PyTorch Autograd
PyTorch nn Module
PyTorch optim Package
Custom nn Modules in PyTorch
Putting it all Together and Further Reading

What is Deep Learning?

Deep learning is a subfield of machine learning with algorithms inspired by the working of the human brain. These algorithms are referred to as artificial neural networks. Examples of these neural networks include Convolutional Neural Networks that are used for image classification, Artificial Neural Networks and Recurrent Neural Networks.

Introduction to PyTorch

PyTorch is a Python machine learning package based on Torch, which is an open-source machine learning package based on the programming language Lua. PyTorch has two main features:

Tensor computation (like NumPy) with strong GPU acceleration
Automatic differentiation for building and training neural networks

Why you might prefer PyTorch to other Python deep learning libraries

There are a few reason you might prefer PyTorch to other deep learning libraries:

Unlike other libraries like TensorFlow, where you have to first define an entire computational graph before you can run your model, PyTorch allows you to define your graph dynamically.
PyTorch is also great for deep learning research and provides maximum flexibility and speed.

PyTorch Tensors

PyTorch Tensors are very similar to NumPy arrays with the addition that they can run on the GPU. This is important because it helps accelerate numerical computations, which can increase the speed of neural networks by 50 times or greater. In order to use PyTorch, you’ll need to head over to their website to install it. If you’re using Conda, you can install PyTorch by running this simple command:

conda install PyTorch torchvision -c PyTorch

In order to define a PyTorch tensor, start by importing the torch package. PyTorch allows you to define two types of tensors — a CPU and GPU tensor. For this tutorial, I’ll assume you’re running a CPU machine, but I’ll also show you how to define tensors in a GPU:

import torch

The default tensor type in PyTorch is a float tensor defined as torch.FloatTensor. As an example, you’ll create a tensor from a Python list:

torch.FloatTensor([[20, 30, 40], [90, 60, 70]])

If you’re using a GPU-enabled machine, you’ll define the tensor as shown below:

torch.cuda.FloatTensor([[20, 30, 40], [90, 60, 70]])

You can also perform mathematical computations such as addition and subtraction using PyTorch tensors:

x = torch.FloatTensor([25])
y = torch.FloatTensor([30])
x + y

You can also define matrices and perform matrix operations. Let’s see how you’d define a matrix and transpose it:

matrix = torch.randn(4, 5)
matrix
matrix.t()

PyTorch Autograd

PyTorch uses a technique called automatic differentiation that numerically evaluates the derivative of a function. Automatic differentiation computes backward passes in neural networks. In training neural networks weights are randomly initialized to numbers that are near zero but not zero. A backward pass is the process by which these weights are adjusted from right to left, and a forward pass is the inverse (left to right).

torch.autograd is the library that supports automatic differentiation in PyTorch. The central class of this package is torch.Tensor. To track all operations on it, set .requires_grad as True. To compute all gradients, call .backward(). The gradient for this tensor will be accumulated in the .grad attribute.

If you want to detach a tensor from computation history, call the .detach() function. This will also prevent future computations on the tensor from being tracked. Another way to prevent history tracking is by wrapping your code with torch.no_grad():

The Tensor and Function classes are interconnected to build an acyclic graph that encodes a complete history of the computation. The .grad_fn attribute of the tensor references the Function that created the tensor. To compute derivatives, call .backward() on a Tensor. If the Tensor contains one element, you don’t have to specify any parameters for the backward() function. If the Tensor contains more than one element, specify a gradient that’s a tensor of matching shape.

As an example, you’ll create two tensors, one with requires_grad as True and the other as False. You’ll then use these two tensors to perform addition and sum operations. Thereafter, you’ll compute the gradient of one of the tensors.

a = torch.tensor([3.0, 2.0], requires_grad=True)
b = torch.tensor([4.0, 7.0])
ab_sum = a + b
ab_sum
ab_res = (ab_sum*8).sum()
ab_res.backward()
ab_res
a.grad

Calling .grad on b will return nothing since you didn’t set requires_grad to True on it.

PyTorch nn Module

This is the module for building neural networks in PyTorch. nn depends on autograd to define models and differentiate them. Let’s start by defining the procedure for training a neural network:

Define the neural network with some learnable parameters, referred to as weights.
Iterate over a dataset of inputs.
Process input through the network.
Compare predicted results to actual values and measure the error.
Propagate gradients back into the network’s parameters.
Update the weights of the network using a simple update rule:

weight = weight — learning_rate * gradient

You’ll now use the nn package to create a two-layer neural network:

N, D_in, H, D_out = 64, 1000, 100, 10
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss()
learning_rate = 1e-4

Let’s explain some of the parameters used above:

N is batch size. Batch size is the number of observations after which the weights will be updated.
D_in is the input dimension
H is the hidden dimension
D_out is the output dimension
torch.randn defines a matrix of the specified dimensions
torch.nn.Sequential initializes a linear stack of layers
torch.nn.Linear applies a linear transformation to the incoming data
torch.nn.ReLU applies the rectified linear unit function element-wise
torch.nn.MSELoss creates a criterion that measures the mean squared error between n elements in the input x and target y

PyTorch optim Package

Next you’ll use the optim package to define an optimizer that will update the weights for you. The optim package abstracts the idea of an optimization algorithm and provides implementations of commonly used optimization algorithms such as AdaGrad, RMSProp and Adam. We’ll use the Adam optimizer, which is one of the more popular optimizers.

The first argument this optimizer takes is the tensors, which it should update. In the forward pass you’ll compute the predicted y by passing x to the model. After that, compute and print the loss. Before running the backward pass, zero all the gradients for the variables that will be updated using the optimizer. This is done because, by default, gradients are not overwritten when .backward() is called. Thereafter, call the step function on the optimizer, and this updates its parameters. How you’d implement this is shown below:

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
  y_pred = model(x)
  loss = loss_fn(y_pred, y)
  print(t, loss.item())
  optimizer.zero_grad()
  loss.backward()
  optimizer.step()

Custom nn Modules in PyTorch

Sometimes you’ll need to build your own custom modules. In these cases you’ll subclass the nn.Module. You’ll then need to define a forward that will receive input tensors and produce output tensors. How to implement a two-layer network using nn.Module is shown below. The model is very similar to the one above, but the difference is you’ll use torch.nn.Module to create the neural network. The other difference is the use of stochastic gradient descent optimizer instead of Adam. You can implement a custom nn module as shown below:

import torch

class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(TwoLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        return y_pred


N, D_in, H, D_out = 64, 1000, 100, 10

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = TwoLayerNet(D_in, H, D_out)

criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
    y_pred = model(x)

    loss = criterion(y_pred, y)
    print(t, loss.item())

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Putting it all Together and Further Reading

PyTorch allows you to implement different types of layers such as convolutional layers, recurrent layers, and linear layers, among others. You can learn more about PyTorch from its official documentation.

Discuss this post on Hacker News and Reddit.