Using ONNX to Transfer Machine Learning Models from PyTorch to Caffe2 and Mobile

In this tutorial, we’ll cover how to convert a model in PyTorch to the ONNX format and then load it to Caffe2. We’ll then use Caffe2’s mobile exporter to execute it on mobile.

Plan of Attack

What are Caffe2 and ONNX?
Creating a Super-Resolution Model in PyTorch.
Exporting Models in PyTorch.
Using ONNX representation in Caffe2.
Running the Model on Mobile Devices.
Conclusion and Further Reading.

What are Caffe2 and ONNX?

Caffe2 (Convolutional Architecture for Fast Feature Embedding) is a scalable, modular deep learning framework designed on the original Caffe framework. ONNX (Open Neural Network Exchange) is a format for deep learning models that allows interoperability between different open source AI frameworks. ONNX supports Caffe2, PyTorch, MXNet and Microsoft CNTK deep learning framework.

For this tutorial one needs to install install onnx, onnx-caffe2 and Caffe2. onnx and onnx-caffe2 can be installed via conda using the following command:

conda install -c ezyang onnx onnx-caffe2

First we need to import a couple of packages:

io for working with different types of input and output.
numpy for scientific computations.
nn for initializing the neural network.
torch.utils.model_zoo, which will load the Torch serialized object at the given URL.
torch.onnx contains functions to export models in the ONNX format.

import io
import numpy as np

from torch import nn
import torch.utils.model_zoo as model_zoo
import torch.onnx

Creating a SuperResolution Model in PyTorch

SuperResolution is a way of increasing the resolution of images and videos. It is mainly used in image and video processing. We’ll create a SuperResolution model based on the official example in the PyTorch documentation.

import torch.nn as nn
import torch.nn.init as init


class SuperResolutionNet(nn.Module):
    def __init__(self, upscale_factor, inplace=False):
        super(SuperResolutionNet, self).__init__()

        self.relu = nn.ReLU(inplace=inplace)
        self.conv1 = nn.Conv2d(1, 64, (5, 5), (1, 1), (2, 2))
        self.conv2 = nn.Conv2d(64, 64, (3, 3), (1, 1), (1, 1))
        self.conv3 = nn.Conv2d(64, 32, (3, 3), (1, 1), (1, 1))
        self.conv4 = nn.Conv2d(32, upscale_factor ** 2, (3, 3), (1, 1), (1, 1))
        self.pixel_shuffle = nn.PixelShuffle(upscale_factor)

        self._initialize_weights()

    def forward(self, x):
        x = self.relu(self.conv1(x))
        x = self.relu(self.conv2(x))
        x = self.relu(self.conv3(x))
        x = self.pixel_shuffle(self.conv4(x))
        return x

    def _initialize_weights(self):
        init.orthogonal_(self.conv1.weight, init.calculate_gain('relu'))
        init.orthogonal_(self.conv2.weight, init.calculate_gain('relu'))
        init.orthogonal_(self.conv3.weight, init.calculate_gain('relu'))
        init.orthogonal_(self.conv4.weight)

torch_model = SuperResolutionNet(upscale_factor=3)

Instead of training this model, we’ll download pre-trained weights for this purpose. After loading the model, we set a random batch size then initialize the model with the pre-trained weights.

model_url = 'https://s3.amazonaws.com/pytorch/test_data/export/superres_epoch100-44c6958e.pth'
batch_size = 1    
map_location = lambda storage, loc: storage
if torch.cuda.is_available():
    map_location = None
torch_model.load_state_dict(model_zoo.load_url(model_url, map_location=map_location))

torch_model.train(False)

Exporting Models in PyTorch

Exporting models in PyTorch is done via tracing. This is done with the aid of the torch.onnx._export() function. This function will execute the model and record a trace of what operators are used to compute the outputs. Since _export runs the model, we need to provide an input tensor x .

x = torch.randn(batch_size, 1, 224, 224, requires_grad=True)

torch_out = torch.onnx._export(torch_model,           
                               x,                       
                               "super_resolution.onnx", 
                               export_params=True)

torch_out contains the output that we’ll use to confirm the model we exported computes the same values when run in Caffe2.

Using ONNX representation in Caffe2

This is the point where we verify that Caffe2 and PyTorch are computing the same value for the network.This involves a couple of steps:

importing onnx and onnx_caffe2.backend.
Loading the ONNX ModelProto object.
Preparing the Caffe2 backend for executing the model, which converts the ONNX model into a Caffe2 NetDef that can execute it.
Running the model in Caffe2.
Constructing a map from input names to Tensor data.
Running the Caffe2 net and verifying the numerical correctness.

import onnx
import onnx_caffe2.backend

model = onnx.load("super_resolution.onnx")

prepared_backend = onnx_caffe2.backend.prepare(model)

W = {model.graph.input[0].name: x.data.numpy()}


c2_out = prepared_backend.run(W)[0]

np.testing.assert_almost_equal(torch_out.data.cpu().numpy(), c2_out, decimal=3)

print("Exported model executed on Caffe2 backend, result looks good")

Running the Model on Mobile Devices

Now that the model is in Caffe2, we can convert it to a format suitable to run on mobile devices. This can be achieved using Caffe2’s mobile_exporter. We generate two model protobufs; one for initializing the models with the correct weights and the second one that runs and executes the model. There are a couple of steps to this process:

Extracting the workspace and the model proto from the internal representation.
Importing the Caffe2 mobile exporter
Calling the Export to get the predict_net, init_net, both needed for running the model on mobile.
Saving the init_net and predict_net to a file we’ll use for running them on mobile.

init_net has the model parameters and the model input in it, while the predict_net will guide the init_net execution at run-time.

c2_workspace = prepared_backend.workspace
c2_model = prepared_backend.predict_net

from caffe2.python.predictor import mobile_exporter

init_net, predict_net = mobile_exporter.Export(c2_workspace, c2_model, c2_model.external_input)

with open('init_net.pb', "wb") as fopen:
    fopen.write(init_net.SerializeToString())
with open('predict_net.pb', "wb") as fopen:
    fopen.write(predict_net.SerializeToString())

We run the generated init_net and predict_net in Caffe2 using a cat image to verify that the output (high resolution cat image) is the same in both runs. We start by doing some standard imports:

from caffe2.proto import caffe2_pb2
from caffe2.python import core, net_drawer, net_printer, visualize, workspace, utils

import numpy as np
import os
import subprocess
from PIL import Image
from matplotlib import pyplot
from skimage import io, transform

We then use Python’s Skimage to process the cat image, the same as we would while doing data processing in neural networks. After loading the image, we resize it to 224×224 dimensions and save the resized image.

img_in = io.imread("catimage.jpg")

img = transform.resize(img_in, [224, 224])

io.imsave("cat_224x224.jpg", img)

The next step is to take the resized cat image and run the super-resolution model in a Caffe2 backend and save the output image. The following steps are involved in this:

Loading the resized image and converting it to Ybr format.
Running the mobile nets that we generated so that the Caffe2 workspace is initialized correctly.
Using net_printer to inspect what the nets look like and identifying what the input and output blob names are.

img = Image.open("cat_224x224.jpg")
img_ycbcr = img.convert('YCbCr')
img_y, img_cb, img_cr = img_ycbcr.split()

workspace.RunNetOnce(init_net)
workspace.RunNetOnce(predict_net)

print(net_printer.to_string(predict_net))

Next we pass in the resized cat image for processing by the model and then run the predict_net to get the model output.

workspace.FeedBlob("9", np.array(img_y)[np.newaxis, np.newaxis, :, :].astype(np.float32))

workspace.RunNetOnce(predict_net)

img_out = workspace.FetchBlob("Insert number that was printed above")

Next we construct the final image and save it.

img_out_y = Image.fromarray(np.uint8((img_out[0, 0]).clip(0, 255)), mode='L')

final_img = Image.merge(
    "YCbCr", [
        img_out_y,
        img_cb.resize(img_out_y.size, Image.BICUBIC),
        img_cr.resize(img_out_y.size, Image.BICUBIC),
    ]).convert("RGB")

final_img.save("cat_.jpg")

Let’s now execute the model on a mobile device and obtain the model output. The following steps are involved in doing this:

Specifying a binary that will be used to execute the model on mobile and exporting the model output to be retrieved later
Pushing the binary and init_net and proto_net we had saved earlier
Serializing the input image blob to a blob proto and then sending it to mobile for execution
Pushing the input image blob to adb
Running the net on mobile
Getting the model output from adb and saving to a file
Recovering the output content and post-processing of the model using the same steps followed earlier
Saving the image

CAFFE2_MOBILE_BINARY = ('specifiedbinary')

os.system('adb push ' + CAFFE2_MOBILE_BINARY + ' /data/local/tmp/')
os.system('adb push init_net.pb /data/local/tmp')
os.system('adb push predict_net.pb /data/local/tmp')

with open("input.blobproto", "wb") as fid:
    fid.write(workspace.SerializeBlob("9"))

os.system('adb push input.blobproto /data/local/tmp/')

os.system(
    'adb shell /data/local/tmp/specifiedbinary '                    
    '--init_net=/data/local/tmp/super_resolution_mobile_init.pb '    
    '--net=/data/local/tmp/super_resolution_mobile_predict.pb '     
    '--input=9 '                                                     
    '--input_file=/data/local/tmp/input.blobproto '                 
    '--output_folder=/data/local/tmp '                               
    '--output=27,9 '                                                 
    '--iter=1 '                                                     
    '--caffe2_log_level=0 '
)


os.system('adb pull /data/local/tmp/27 ./output.blobproto')


blob_proto = caffe2_pb2.BlobProto()
blob_proto.ParseFromString(open('./output.blobproto').read())
img_out = utils.Caffe2TensorToNumpyArray(blob_proto.tensor)
img_out_y = Image.fromarray(np.uint8((img_out[0,0]).clip(0, 255)), mode='L')
final_img = Image.merge(
    "YCbCr", [
        img_out_y,
        img_cb.resize(img_out_y.size, Image.BICUBIC),
        img_cr.resize(img_out_y.size, Image.BICUBIC),
    ]).convert("RGB")
final_img.save("cat_mobile.jpg")

Conclusion and Further Reading

You can compare the cat_.jpg from the pure Caffe2 execution and the cat_mobile.jpg from the mobile execution. If the two images don’t look the same, it means that something went wrong during the mobile execution. For further reading on Caffe2 mobile, check out this AI Camera Demo and Tutorial.

Discuss this post on Hacker News and Reddit