In this articles series, we’ll aim to build a deep learning library in the mold of TensorFlow and PyTorch, focusing on implementing the same core components. Here’s a preview of some of the concepts and tasks we’ll cover:
- Tensors
- Automatic gradient for basic math operations
- Linear layer, ReLU, and softmax activation functions (part 2)
- Sequential to pass in a list of layers (part 3)
- Cross-entropy loss function and stochastic optimization (part 3)
- Visualizing tensors (part 4 )
- Saving and loading models
- A hands-on project creating a word2vec embedding (part 5)
- Visualization of word embeddings
The major aim of this project is to drive us away from the comfort zone of using Python, which will help us understand the core principles behind popular deep learning libraries. Doing this in JavaScript can also aid us in implementing the same thing in other programming languages.
Pre-requisites:
- Knowledge of backpropagation
- A basic understanding of automatic gradient computation.
- Familiarity with one or two deep learning libraries (i.e. PyTorch and TensorFlow)
Goal: Implement automatic gradient computation for a simple computational graph.
Let’s Get Started
If you need a refresher on the basics of backpropogation and autograd computation, you can check out Stanford’s cs231n course on deep learning with convolutional neural networks (CNNs). For a more quick review, check out this slide from the course.
Quick Summary of Key Concepts
Backpropagation: A technique in neural representation that involves obtaining the gradient of weights and biases, and updating them with the gradient obtained. This technique involves the use of the chain rule from Calculus.
Chain rule: A technique in calculus used to calculate the derivative of a function with respect to its variable (and indirect variable). This technique helps identify how each variable of the function contributes to its final output.
Automatic gradient calculation: The process of automatically performing the chain rule for various computational operations. This process helps ease the calculation of gradients for more complex operations.
Implementation of Automatic Gradient Computation in JavaScript
With these key concepts defines, let’s dive right in and implement them in JavaScript.
For this, we’ll again turn to the Stanford course for a guiding hand, using one of the examples from cs231n slide below:
The example is a simple computational operation, and we’ll be using this to develop an intuition about how to implement backpropagation and automatic gradient calculation for simple maths operations in JavaScript.
The graph above shows the computational flow of the function f(x,y,z)=(x+y)z.
The green value in image above shows the computational input and output of each node (nodes are represented by the circles), while the red value shows the gradient flow from the final output to the input.
If we are to compute the example with code, it will look like this in JavaScript
The above snippet shows how the values in the computational graph are computed. We all know that during this kind of computation, for a more complex computational graph, mathematical errors are not uncommon.
But thanks to the help of deep learning libraries we don’t need to do this all the time. Next, let’s automate the gradient calculation for the above code in JavaScript.
First, let’s create a tensor that will only accept a single value (simplicity for demonstration purposes):
function Tensor(arr,require_grad){
this.item = arr;
this.require_grad = require_grad;
this.gradv = 0;
}
Tensor.prototype = {
grad: function(g){ // update the gradient
this.gradv = g;
}
}
In the code above, the tensor takes in a single value arr , and we also specify whether the gradient is allowed to flow into the tensor using require_grads, which is a boolean operator.
In the tensor class, the gradient flow in is stored in this.gradv, and the inputted value is stored in this.item (PyTorch users should be familiar with this).
Now that the tensor class is created, we can go ahead to create classes for addition and multiplication.
function add(x,y){
this.x = x;
this.y = y;
this.require_grad=true;
this.item = x.item + y.item
this.gradv = 0;
}
add.prototype = {
backward: function(){
if(this.x.require_grad){
this.x.grad(1*this.gradv);
if("backward" in this.x){
this.x.backward()
}
}
if(this.y.require_grad){
this.y.grad(1*this.gradv);
if("backward" in this.y){
this.y.backward()
}
}
},
grad: function(g){
this.gradv = g;
}
}
In the add class, we have a backward method to calculate backpropagation for each of input. This is a bit different than the tensor class because we don’t differentiate a value, but instead a function.
The add function takes in two variables, and the gradient for the add function is 1 (checkout the cs231n material for more on this). The gradient flowing into the add object gradv is multiplied by 1.
And you can see that we calculated the gradient for the two inputs—first by checking if the require_grads is set to true, and if truem the gradient of the input is set. And then second, we also check if the input contains a backward property.
The same structure used to create the addition object will be used to create all other math operator classes. Hence, let’s create an object for multiplication:
function multi(x, y) {
this.item = x.item * y.item;
this.x = x;
this.y = y;
this.gradv = 0;
this.require_grad = true;
}
multi.prototype = {
backward: function () {
if (this.x.require_grad) {
this.x.grad(this.y.item * this.gradv);
if ("backward" in this.x) {
// console.log("True")
this.x.backward()
}
}
if (this.y.require_grad) {
this.y.grad(this.x.item * this.gradv);
if ("backward" in this.y) {
this.y.backward()
}
} },
grad: function (g) {
this.gradv = g;
}
}
We can see that the code has the same structure for multiplication, but the backward function is different—the backprop of an input x is the value of the other input y .
Now, since we’ve created this basic method, let’s do forward and backward pass for the function f(x,y,z) = x,y,x. By doing this, we’ll be creating a static computational graph, much like you might see while working with TensorFlow.
var x = new Tensor(-2,true);
var y = new Tensor(5,true);
var z = new Tensor(-4,true);
var q = new add(x,y);
var f = new multi(q,z);
console.log(f.item)
//output: -12
Since f is an object, let’s see what it’s made of—this might give us a better look at what the object is all about.
{ item: -12,
x:
{ x: { item: -2, require_grad: true, gradv: 0 },
y: { item: 5, require_grad: true, gradv: 0 },
require_grad: true,
item: 3,
gradv: 0 },
y: { item: -4, require_grad: true, gradv: 0 },
gradv: 0,
require_grad: true }
You can see that this object f contains its own item property, which is -12 (i.e. the value calculated). f also contains a chain of inputted operations.
the inputted operation to f are add and a tensor z this operation and tensor are represented as x and y .
And if we look at the x operation above, it also contains other inputted values, which are also represented as x and y .
To see the name of this operator in the output, let’s add the following property to the operator and the tensor object. And with a good visualization tool, we should be able to plot a graph based on the node, just like we would using something like TensorBoard.
Adding the above property to each of the objects, we can now see what x and y are:
{ item: -12,
x:
{ x: { item: -2, require_grad: true, gradv: 0, name: '<Tensor>' },
y: { item: 5, require_grad: true, gradv: 0, name: '<Tensor>' },
require_grad: true,
item: 3,
gradv: 0,
name: '<Add>' },
y: { item: -4, require_grad: true, gradv: 0, name: '<Tensor>' },
gradv: 0,
require_grad: true,
name: '<Multi>' }
Because we’ have not performed backpropagation, we can see that the gradv is still 0 for all operations and values.
Now let’s see what happens when we backpropagate the object f:
f.grad(1)
f.backward()
console.log(f)
The gradient of the f object is first set to on—this is because the differentiation of a function with regards to itself is 1 , but in building a neural network, we don’t need to set our output gradient to 1.
The output of the previous code block gives us the following:
{ item: -12,
x:
{ x:
{ item: -2, require_grad: true, gradv: -4, name: '<Tensor>' },
y: { item: 5, require_grad: true, gradv: -4, name: '<Tensor>' },
require_grad: true,
item: 3,
gradv: -4,
name: '<Add>' },
y: { item: -4, require_grad: true, gradv: 3, name: '<Tensor>' },
gradv: 1,
require_grad: true,
name: '<Multi>' }
Now you can check the computational graph image example above, and you’ll see that they have the same gradient value. The <Add> object has its own gradient, and its inputs x and y also have their own gradients.
It’s now possible for us to access the gradient of the individual operator and input:
You can try to check the gradv for each of the variables that were being created.
Let’s also set the input x require_grads to false. If we try to output the gradv, we get zero:
And if we check the f graph, we’ll see that one of the input gradients of addition is zero:
{ item: -12,
x:
{ x:
{ item: -2, require_grad: false, gradv: 0, name: '<Tensor>' },
y: { item: 5, require_grad: true, gradv: -4, name: '<Tensor>' },
require_grad: true,
item: 3,
gradv: -4,
name: '<Add>' },
y: { item: -4, require_grad: true, gradv: 3, name: '<Tensor>' },
gradv: 1,
require_grad: true,
name: '<Multi>' }
The output gradient of add did not flow into x.
I hope by now you’ve gotten the gist of autograd computation in JavaScript. By first solving a simpler problem, we can move into the tensor itself instead of this single-value function we implemented.
The Concept
To summarize the key concepts we’ve covered here:
- Autograd helps abstract mathematical operations as an object (i.e. class instance)
- All tensors in autograd have a grad_fn method (function) that collects the gradient flow. And they also have grad properties that store the incoming gradient. require_grad is used to determine if a gradient is to flow into the tensor.
- All mathematical operators contain a forward pass, which is the actual computation with regards to what the operator is—e.g, if the operator is addition, then the forward pass is about adding two numbers together.
- Additionally, the mathematical operator contains a backward pass in which backpropagation is being calculated, and the gradient is being assigned to the tensor’s input. For example, the gradient of an add object is 1 * the incoming gradient flow.
- The mathematical operators also have a grad_fn method, which collects gradient inflow and assigns them to the grad property of the operator.
In the next part of this series, we’ll be discussing how to create real tensors and also how to implement all other operations needed to create a simple deep learning library, similar to TensorFlow and PyTorch.
The code for this part can be obtained here:
Comments 0 Responses