Heads up! This is an end-to-end series. In its three parts, I’m going to show you how to train, save, and deploy a recommender model. Specifically, you will understand how to get and process your data, build and train a neural network, package it in an application, and finally serve it over the internet for everyone to see and use.
At the end of this tutorial, you’ll have a book recommender application that can suggest books to users based on their history and preferences. We’ll get into the details of how this works shortly, but before that, below is the result of what you’ll be building:
In this first part of the series, you will learn how to build and train the recommender model. In part 2, you’ll learn how to convert and embed the model in a web application, as well as make recommendations. And finally, in part 3, you’ll learn how to deploy your application using Firebase.
Table of Contents
- Introduction to recommender systems
- Downloading and pre-processing the book dataset
- Building the recommendation engine using TensorFlow / Keras
- Training and saving the model
- Visualizing the embedding layer with TensorFlow embedding projector
- Making recommendations for users
- Conclusion
Introduction to Recommender Systems
A recommender system, in simple terms, seeks to model a user’s behavior regarding targeted items and/or products. That is, a recommender system leverages user data to better understand how they interact with items. Items here could be books in a book store, movies on a streaming platform, clothes in an online marketplace, or even friends on Facebook.
Types of Recommender Systems
There are two primary types of recommender systems:
- Collaborative Filtering Systems: These types of recommender systems are based on the user’s direct behavior. That is, this system builds a model of the user based on past choices, activities, and preferences. It then uses this knowledge to predict what the user will like based on their similarity to other user profiles.
2. Content-Based Filtering System: Content-based recommender systems, on the other hand, are based on the items, and not necessarily the users. This method builds an understanding of similarity between items. That is, it recommends items that are similar to each other in terms of properties.
There is a third type of recommender system, known as a hybrid approach. As you can guess, this approach combines the collaborative and content-based approaches to build better and more generalized systems. That is, it basically combines the strength of both approaches.
In this article, we’re going to be using a variant of collaborative filtering. That is, we’ll be using a neural network approach to building a collaborative filtering recommender system.
We’ll use something called an embedding to build a profile/understanding of the interactions between users and books. This technique falls neither in the collaborative nor content-based approach—I’d say it’s more of a hybrid approach.
To do this, we’re going to leverage existing data of books, users, and ratings given by users. A special kind of neural network layer called an embedding is then trained on this interaction, learning the similarity between books in something called an embedding space.
This embedding space helps the neural network better understand the interaction between books and users, and we can leverage this knowledge, combined with the user ratings of each book, to train a neural network. This is a classic regression approach, where the input is the learned embedding of book-user interaction, and the target/labels are book ratings given by the users.
Now that you have a basic understanding of the kind of system we’re building, let’s get our data and start writing some code.
Downloading and pre-processing the book dataset
The data used for this tutorial can be download from Kaggle by following this link. The dataset contains about ten thousand books and one million ratings given by users. This is a rich dataset and can serve us well for this project.
On the data page, you can download all the files as a zip folder, or download the specific ones we’ll be using in this article, which are books.csv (contains all metadata about each book), and ratings.csv (maps each book and user to a rating).
After downloading the dataset, move them to a specific folder where you want your project to live, and then fire up your Jupyter Notebook/Lab server.
Next, let’s import our libraries:
We’ll be using TensorFlow’s official Keras API here, so if you don’t have TensorFlow installed, be sure to install version 2+ before you proceed. You can also use cloud-based platforms like Colab to run this part, as well.
Next, we read in both datasets:
We can see that the ratings dataset contains just three columns: book_id, user_id, and the corresponding rating given by the user.
Next, let’s take a peek at the books dataset:
The book dataset has 23 columns and contains different metadata about the books. We can see information like a book title, book author, ISBN number, book image, and so on. We’ll use this data when making predictions, and also when we’re displaying the books to users in our application.
As far as this tutorial is concerned, we’re mainly concerned with the ratings dataset. This is what we’ll feed into our embedding layer, so as to learn an efficient mapping of users to books.
Next, let’s print out some statistics about the ratings dataset:
As we can see from the output above, there are over 900,000 ratings given by 53,424 users to about 10,000 books. That means different users have rated multiple books, and each book has been rated by more than one user.
We can also observe that there are no missing values in the dataset, and each column is already in numerical format; as such, we won’t be doing any further data processing. Keep in ming that with different datasets, more of this processing might be required.
Next, we’ll split the data into train and test sets so we can effectively evaluate the model performance. Remember we’re treating this as a regression problem.
from sklearn.model_selection import train_test_split
Xtrain, Xtest = train_test_split(ratings_df, test_size=0.2, random_state=1)
print(f"Shape of train data: {Xtrain.shape}")
print(f"Shape of test data: {Xtest.shape}")
We use a test size of 0.2 (20%) when splitting the dataset. This seems quite large (as the dataset is large) but you can definitely choose a smaller percentage if you’d like.
Now that we have our data ready, let’s build our model.
Building the recommendation engine using TensorFlow / Keras
The neural network we’re going to create will have two input embedding layers. The first embedding layer accepts the books, and the second the users. These two embeddings are trained separately and then combined together before being passed to a dense layer.
It’s pretty easy to code this architecture in Keras using the functional API. If you aren’t familiar with the Keras Functional API, not to worry, you can easily read and understand the flow. Also, you can learn about it at a high-level here before you proceed.
First, let’s get the unique users and books in the dataset—this forms the vocabulary for our embeddings.
The embedding can be thought of as simply the mapping of an entity (book, user) to a vector of real numbers in a smaller dimension.
Let’s see the code in action:
#Book input network
input_books = tf.layers.Input(shape=[1])
embed_books = tf.layers.Embedding(nbook_id + 1,15)(input_books)
books_out = tf.layers.Flatten()(embed_books)
#user input network
input_users = tf.layers.Input(shape=[1])
embed_users = tf.layers.Embedding(nuser_id + 1,15)(input_users)
users_out = tf.layers.Flatten()(embed_users)
conc_layer = tf.layers.Concatenate()([books_out, users_out])
x = tf.layers.Dense(128, activation='relu')(conc_layer)
x_out = x = tf.layers.Dense(1, activation='relu')(x)
model = tf.Model([input_books, input_users], x_out)
Note that we’re using the Keras API in TensorFlow. This is the official TensorFlow implementation of Keras.
In the first three lines, we create an input layer to accept a 1D array of book IDs, then we create an embedding layer with a shape of (number of unique books + 1, 15). We add 1 to the number of unique books because the embedding layers need an extra row for books that do not appear in the training dataset. This can be called the out-of-vocabulary entities.
The second dimension (15), is an arbitrary dimension we chose. This can be any number depending on how large we want the embedding layer to be.
Notice that we append the input layer to the end of the book embedding layer. This is the functional API in action. What we are basically saying here is that we want to pass the output of the input layer to the embedding layer.
In the next three lines of code, we do the same thing we did for books, but this time for the users. That is, we create an input that accepts the users as a 1D vector, and then we create the user embeddings, as well.
In the concatenate line, we simply concatenate or join both the books and the user embedding layer together, and then add a single dense layer with 128 nodes on top of it. For the final layer of the network, we use a single node, because we’re predicting the ratings given to each book, and that requires just a single node.
In the last line of code, we use the tf.Model class to create a single model from our defined architecture. This model is expecting two input arrays (books and users).
Now that we have defined the network, we’ll compile it in the next section by choosing an optimizer and a loss function:
I decided to use an Adam optimizer here with a learning rate of 0.001, and mean squared error as the loss function. You can try out other optimizers and compare the results.
Looking at the model summary, we can see the connection between defined layers, as well as the number of trainable parameters.
Training and Saving the Model
Next, we’ll fit our model, evaluate it, and plot the loss curves to see how well it is doing:
hist = model.fit([Xtrain.book_id, Xtrain.user_id], Xtrain.rating,
batch_size=64,
epochs=5,
verbose=1,
validation_data=([Xtest.book_id, Xtest.user_id], Xtest.rating))
The fit parameter expects two arrays as input, based on our predefined architecture. So we pass a list of books and users IDs, and also the ratings as the target. I chose a batch size of 64 because the dataset is quite large, and I wanted faster training. You can play around with the batch size as well, but 64 and 128 typically work best. I also trained for just five epochs and recorded a relatively low MSE (~0.55). This can definitely be lower with a fine-tuned network. I’ll leave that to you to discover.
Notice also that we pass our test set to the validation parameter. This tells Keras to calculate performance on previously unseen data at the end of every epoch. We’ll plot these metrics below to understand how well our model is doing:
Here, we notice a steady decrease in the training loss, but little or no improvement in the validation loss. This is the classic case of overfitting, and we can improve this with hyperparameter tuning and possibly by adding more layers to our network. You can go ahead and experiment with this and see if you can improve it further.
After fine-tuning your network, you can save it by calling the save function on the trained model object, as shown below:
This saves the model as a Tensorflow / Keras model. Note this format, as you’ll be referencing it during model conversion in the next tutorial.
In the next section, we’ll take an inside look at the book embedding layer to better understand how books are represented.
Visualizing the Embedding Layer with TensorFlow Embedding Projector
To better understand the purpose of the embedding layer, we’re going to extract it and visualize it using the TensorFlow Embedding Projector. This efficient tool uses dimensionality reduction algorithms (TSne, PCA) to reduce the size of our embedding layer to 3 dimensions and visualizes them in the embedding space. This can give us a visual clue as to how books are clustered together in the embedding space.
To extract the embedding , copy the book embedding layer’s name from the model.summary() output, and pass it to the get_layer function, as shown below:
The shape of the book embedding layer is (10001, 15). This means that the network has been able to map each book to a 15 column vector. We will save this embedding vector, as well as the corresponding book’s title, and upload them to the TensorFlow Embedding Projector.
First, let’s get the book titles from the books.csv dataset:
In the code cell above, we first make a copy of the book DataFrame, and then set the column book_id as the index so we can easily access it.
Next, we’ll get all the unique book IDs, and then write them to a tsv file:
b_id =list(ratings_df.book_id.unique())
b_id.remove(10000)
dict_map = {}
for i in b_id:
dict_map[i] = books_df_copy.iloc[i]['title']
out_v = open('vecs.tsv', 'w')
out_m = open('meta.tsv', 'w')
for i in b_id:
book = dict_map[i]
embeddings = book_em_weights[i]
out_m.write(book + "n")
out_v.write('t'.join([str(x) for x in embeddings]) + "n")
out_v.close()
out_m.close()
In the code block above, were simply looping over all the unique book IDs, retrieving their titles, and then writing them to the corresponding tsv file. In the end, you’ll have two tsv files—one containing the embedding weights, and the other containing the corresponding book title.
Confirm you have the two tsv files in your directory. If so, go to the TensorFlow Embedding Projector page, wait for the default embedding to load, and then click Load to upload your tsv files.
The first upload button is for the vecs.tsv file. Click and add it. The second button is for the meta.tsv file. You can upload that, as well. When you’re done uploading, click outside the model to view the resulting visualization.
You can click on a point (book) to see the closest books in the embedding space. This trained embedding can be effectively used to recommend similar books, because books closer in the embedding space tend to be similar.
If we were creating a recommendation engine based on similar books (content-based Filtering), we can use the trained embedding to simply extract the closest books to a given input.
But remember, we’re using a slightly different approach in this tutorial, where we’re determining recommendations based on user ratings of other books (collaborative filtering).
Now that we have a better understanding of how our model is trained, we’re ready to make some recommendations for users.
Making Recommendations for Users
In order to make recommendations, we need to pass in the list of books and a particular user to the model. That is, the model will make a prediction of a rating it thinks the user will give to books based on its understanding of the user.
These ratings are then sorted in ascending order of magnitude. Therefore, if we want to, say, recommend 10 books to a user, we’ll pass in a list of books to the model to predict ratings it feels the user will give to those books. Then we pick the top 10 of these ratings and recommend those books to the user.
Let’s see the code in action:
In the code cell above, first, we get all book IDs and save them in an array. Then we create another array with the same length as the book array, but with the same user ID all through. Next, we pass it to the model, which is expecting two inputs (Books and User). The returned array is a list of predicted ratings for each book.
Next, we’ll sort the array, and retrieve the index of the highest 5. With this index, we can retrieve the corresponding books from the dataset:
Finally, we’ll use the index (pred_ids) to retrieve the corresponding books from the books.csv DataFrame:
And voila! you can see the recommended books for the user based on the index of the highest predicted ratings. Go ahead and try other user numbers as well. There are 53,424 unique users, so you can change the number and watch the recommendation change as well.
One last but important thing we need to do before we end this part of the tutorial is to save some features of the book data in JSON format. This will be used when creating our web app. It helps us to easily display various book properties such as the titles, authors, and images.
To save this, we first slice a subset of the book data:
Notice that we retrieve only the book_id, title, image_url and authors. This is enough for our simple web application.
Next, we can export this to JSON format by using the to_json function in Pandas.
Once you run this cell, you should see the web_book_data.json file in your directory.
What’s Next?
Congratulations! You have just created your very own recommendation engine based on collaborative filtering, but using neural network Embeddings.
In the next part of this tutorial series, you’ll convert your saved model to JavaSscript and serve it in a website. How fun is that!? I’m sure you’re looking forward to it.
In the meantime, try to improve your network’s loss so it can provide even better recommendations!
Connect with me on Twitter.
Connect with me on LinkedIn.
Comments 0 Responses