TensorFlow Estimators

Building deep learning models for audio classification is pretty common and you will find numerous blogs and articles that describe how to build the standard audio classification models using Keras.

There are numerous use-cases associated with audio processing and deep learning, but the one that amazed me was audio separation library Spleeter — where they split the given audio into various tracks such as vocal, piano, drums, bass, and accompaniment. I was really baffled at the accuracy with which the library splits the tracks and I would give full credit to the authors for building such an amazing library.

One thing I observed while going through the library’s source code is that they have used Tensorflow’s estimator approach to build the model and not the Keras-based approach. That’s when I was intrigued to learn more about what Estimators are and their benefits.

This blog is to briefly introduce you to estimators, to build an audio classification model using estimators, and to generate TFLite models.

Tensorflow estimators are an high level APIs (similar to Keras’ API) that separates the data processing layer and modeling layer, which is widely used in processing a huge volume of data at production scale.

With TF2.0, Keras has become officially adopted into TF and it has become a way forward. But still, some of the features of TF are not yet supported with Keras API, such as Full TFX integration and Parameter Server-based training.

Estimators support the above mentioned features and we can also convert Keras models into estimators. So, with estimators, we can have the best of both worlds and support the training at large scale.

Full source code of this article is available here for your reference. Please feel free to use it.

Urban sound classification using estimators and TFLite model generation

As part of this article, we are going to build an Urban Sound Classification model using estimators (with Keras model) and export the model as TFLite for mobile integration.

We are going to repurpose the UrbanSoundClassification model built in Keras here and will be building the estimator version for the same.

Here, the author would have processed the urban sound data for melspectrograms and would have saved the processed data in npz format for building the model. We will take these npz file and build our estimator model from there.

Estimator based model creation

Estimators work based on the concept of having a distinct process with regard to data processing function and model function. We will be reading the dataset from the npz files as mentioned above, and will convert them into datasets as below:

def getTrainingDataSet():
    train_index= range(1)
    x_train, y_train = [], []
    for ind in train_index:
        # read features or segments of an audio file
        train_data = np.load("{0}/{1}.npz".format(load_dir, folds[ind]),
                             allow_pickle=True)
        # for training stack all the segments so that they are treated as an example/instance
        features = np.concatenate(train_data["features"], axis=0)
        labels = np.concatenate(train_data["labels"], axis=0)
        #labels = to_categorical(labels, num_classes=10)
        x_train.append(features)
        y_train.append(labels)
    # stack x,y pairs of all training folds
    x_train = np.concatenate(x_train, axis=0).astype(np.float32)
    y_train = np.concatenate(y_train, axis=0).astype(np.float32)
    dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
    print(dataset)
    dataset = dataset.map(map_features)
    dataset = dataset.shuffle(1000).repeat()
    return dataset.batch(32)

We will be required to create a similar kind of function for getting an ‘eval’ dataset as well. The only difference is that the eval function won’t have a shuffle or repeat operation.

Once the dataset function is created, we will be required to create the estimator function. Here, we will build a network in Keras and convert the Keras model into an estimator using the method ‘tf.keras.estimator.model_to_estimator,’ as below:

def get_network():
    num_filters = [24, 32, 64, 128]
    pool_size = (2, 2)
    kernel_size = (3, 3)
    input_shape = (60, 41, 2)
    num_classes = 10
    keras.backend.clear_session()

    model = keras.models.Sequential()
    model.add(tf.keras.layers.InputLayer(input_shape=input_shape, name='features'))
    model.add(keras.layers.Conv2D(24, kernel_size,
                                  padding="same", input_shape=input_shape))
    model.add(keras.layers.BatchNormalization())
    model.add(keras.layers.Activation("relu"))
    model.add(keras.layers.MaxPooling2D(pool_size=pool_size))

    model.add(keras.layers.Conv2D(32, kernel_size,
                                  padding="same"))
    model.add(keras.layers.BatchNormalization())
    model.add(keras.layers.Activation("relu"))
    model.add(keras.layers.MaxPooling2D(pool_size=pool_size))

    model.add(keras.layers.Conv2D(64, kernel_size,
                                  padding="same"))
    model.add(keras.layers.BatchNormalization())
    model.add(keras.layers.Activation("relu"))
    model.add(keras.layers.MaxPooling2D(pool_size=pool_size))

    model.add(keras.layers.Conv2D(128, kernel_size,
                                  padding="same"))
    model.add(keras.layers.BatchNormalization())
    model.add(keras.layers.Activation("relu"))

    model.add(keras.layers.GlobalMaxPooling2D())
    model.add(keras.layers.Dense(128, activation="relu"))
    model.add(keras.layers.Dense(num_classes, activation="softmax", name="label"))

    model.compile(optimizer=keras.optimizers.Adam(1e-4),
                  loss=keras.losses.SparseCategoricalCrossentropy(),
                  metrics=["accuracy"])
    return model


def _create_estimator():
    session_config = tf.compat.v1.ConfigProto()
    session_config.gpu_options.per_process_gpu_memory_fraction = 0.45
    model = get_network()
    est_model = tf.keras.estimator.model_to_estimator(keras_model=model, model_dir="urban_est_model_ckpt_dir", config=tf.estimator.RunConfig(
        save_checkpoints_steps=300,
        tf_random_seed=3,
        save_summary_steps=5,
        session_config=session_config,
        log_step_count_steps=10,
        keep_checkpoint_max=2
    ))
    return est_model

Once we are done, then we can initiate the train and evaluate operations by calling ‘tf.estimator.train_and_evaluate’ function.

Export Estimator model into savedmodel format

Once the model is trained and evaluated, we need to export the model into SavedModel format. This operation is similar to saving sklearn models into pickle format, where we can reload the model later and can start using it without training operation.

Once the model is exported into savedmodel, format, it can be used in various servings such as TensorFlow Lite, TF Serving, Tensorflow.js, etc.

As the objective of this article is to generate a TFLite model, let’s export the trained model into savedmodel format.

def serving_input_fn():
    inputs = {'features': tf.compat.v1.placeholder(dtype=tf.float32,shape=[1,60,41,2], name='features')}
    return tf.estimator.export.ServingInputReceiver(inputs, inputs)

export_dir = './urban_export_savedmodel_dir/'

estimator.export_saved_model(export_dir_base=export_dir, serving_input_receiver_fn=serving_input_fn)

Here are the key things to observe:

→ In serving_input_fn(), you will be required to mention the shape of the features that are being fed to the model.

→ Method tf.estimator.export.ServingInputReceiver accepts features and receiver_tensors. In this case, both are the same, as we are not doing any processing of the data in serving_input_fn. If not, features correspond to the input that is fed to the model and receive_tensors correspond to the shape of the input that serving_input_fn receives.

Reload SavedModel and perform prediction

Now we have exported the model into SavedModel format. Let’s see how to load the model back and make the prediction.

We will be required to write getTestDataSet function, which reads from the npz file again and processes the test data.

Once it is done, then using tf.saved_model.load() method, we can load the model back and make the predictions, as below:

export_dir = './urban_export_savedmodel_dir/'
subdirs = [x for x in Path(export_dir).iterdir()
           if x.is_dir() and 'temp' not in str(x)]
latest = str(sorted(subdirs)[-1])


predict_fn = tf.saved_model.load(latest)
inference_func = predict_fn.signatures["serving_default"]

for batch in getTestDataSet().take(1):
    preds = inference_func(batch['features'])['label']
    topPred = tf.argmax(preds, 1).numpy()
    print(topPred)

TFLite generation from SavedModel

Once the model is exported into SavedModel format, generation of a TFLite model is straightforward process and can be done as below:

subdirs = [x for x in Path(export_dir).iterdir()
           if x.is_dir() and 'temp' not in str(x)]
latest = str(sorted(subdirs)[-1])

converter = tf.lite.TFLiteConverter.from_saved_model(latest, signature_keys=['serving_default'])
converter.allow_custom_ops = True
converter.target_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]
converter.target_spec.supported_types = [tf.uint8, tf.float32]
tflite_model = converter.convert()

Conclusion

Thus, we are at the end of the article and we have discussed the following:

→ Tensorflow estimators and their usage

→ How to port Keras models into estimators

→ How to export estimator models as SavedModel

→ How to convert SavedModel into TFLite format

Personally, I have learned a lot about TensorFlow while writing this article and I feel whatever models we build using Keras are production compliant for an enterprise grade requirement. There are lot of advanced concepts in TF, such as distributed training, hooks, scaffolding, etc. I will be writing more articles on these concepts with the working code — please follow me to keep track of them.

Happy Learning!

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to providing premier educational resources for data science, machine learning, and deep learning practitioners. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Comet, an MLOps platform that enables data scientists & ML teams to track, compare, explain, & optimize their experiments. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletters (Deep Learning Weekly and the Comet Newsletter), join us on Slack, and follow Comet on Twitter and LinkedIn for resources, events, and much more that will help you build better ML models, faster.

TensorFlow Estimators — TFLite and Model Generation

A Comprehensive Overview with Audio Classification Model