Deploying Keras Model in Production with TensorFlow 2.0

Upasana | October 24, 2019 | 7 min read | 1,226 views


In this article, we are going to discuss the process of building a REST API over keras’s saved model in TF 2.0 and deploying it to production using Flask and Gunicorn/WSGI.

If you are looking for tensorflow 1.x support then refer to this article.

Introduction

We are going to take example of a mood detection model which is built using NLTK, keras in python. When we train deep learning model in keras, we always need some other part as well to test its results and if we want to demo then we cannot show raw probabilities (output from model) and have to show interactive results such that someone who is not from this background shall also be able to understand the results.

Keras

Keras is an open-source neural-network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, Theano, and PlaidML. It is designed to enable fast experimentation with deep neural networks, and focuses on being user-friendly, modular, and extensible.

NLTK

NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries.

Now, NLTK has added support for indian languages as well.

Flask

Flask is a micro web framework written in python, which is frequently used by developers to create simple REST endpoints.

Flask homepage

http://flask.pocoo.org

We will be creating one python script for calling REST Endpoints using flask application and will be keeping classes in services folder.

Mood detection model

This model was built on 1,82,689 observations which includes data based on emotions categories as Anger, disgust, joy, sadness, shame, guilt and fear. Model is based in Bi-directional LSTM and was trained on only 50 epochs. Since, data was not normalized earlier to retain the pattern, BatchNormalisation layer was also used in model. Below are the recall scores from the model stats on test data:

  • Anger : 0.72

  • Disgust : 0.68

  • Fear : 0.96

  • Guilt : 0.63

  • Joy : 0.92

  • Sad : 0.94

  • Shame : 0.81

Directory structure

Our directory structure is going to be like:

directory structure

In src folder, we have two directories and main.py to start flask app.

  1. Directory mood-saved-models contains saved keras models and saved tokenizer in pickle format.

  2. Directory service contains services scripts in .py.

Text pre-processing

Before training deep learning models with the textual data we have, we usually perform few transformations on the data to clean it and convert it into vector format. This process is generally known as text pre-processing.

Since, we perform these tasks on training data then we shall be doing the same on testing data as well.

Now, we are going to build a service for the same which will pre-process the text before sending it to model for prediction.

TextPreprocessing Method
def text_preprocessing(self,text):
        eyes = r"[8:=;]"
        nose = r"['`-]?"

        def re_sub(pattern, repl):
            return re.sub(pattern, repl, text, flags=self.FLAGS)

        text = re_sub(r"https?:\/\/\S+\b|www\.(\w+\.)+\S*", " ")
        text = re_sub(r"@\w+", "user")
        text = re_sub(r"{}{}[)dD]+|[)dD]+{}{}".format(eyes, nose, nose, eyes), "smile")
        text = re_sub(r"{}{}p+".format(eyes, nose), "laugh")
        text = re_sub(r"{}{}\(+|\)+{}{}".format(eyes, nose, nose, eyes), "sad")
        text = re_sub(r"{}{}[\/|l*]".format(eyes, nose), "neutral")
        text = re_sub(r"/"," / ")
        text = re_sub(r"<3","love")
        text = re_sub(r"[-+]?[.\d]*[\d]+[:,.\d]*", " ")
        text = re_sub(r"#\S+", self.hashtag)
        text = re_sub(r"([!?.]){2,}", r"\1 repeat")
        text = re_sub(r"\b(\S*?)(.)\2{2,}\b", r"\1\2 <elong>")
        text = re_sub(r"([A-Z]){2,}", self.allcaps)

        return text.lower()

We will be using this method to clean the text. It involves

  • Removing repetitive words

  • converting smileys to text

  • extracting text from hashtags.

We can also add spell corrector such that it can take care of typos. There is library named as enchant which can be used to correct spelling od the words. Try installing and using it by pip install pyenchant. This shall work on Mac OS X and Ubuntu, not sure about windows

So now, the whole class is going to be look like below:

TextPreprocessing.py
import re


class TextPreprocessing(object):


    def __init__(self):
        self.FLAGS = re.MULTILINE | re.DOTALL

    def hashtag(self,text):
        text = text.group()
        hashtag_body = text[1:]
        if hashtag_body.isupper():
            result = " {} ".format(hashtag_body.lower())
        else:
            result = " ".join([""] + [re.sub(r"([A-Z])",r" \1", hashtag_body, flags=self.FLAGS)])
        return result

    def allcaps(self,text):
        text = text.group()
        return text.lower() + " "

    def re_sub(self,pattern, repl,text):
            return re.sub(pattern, repl, text, flags=self.FLAGS)

    def tweet_preprocessing(self,text):
        eyes = r"[8:=;]"
        nose = r"['`-]?"

        def re_sub(pattern, repl):
            return re.sub(pattern, repl, text, flags=self.FLAGS)

        text = re_sub(r"https?:\/\/\S+\b|www\.(\w+\.)+\S*", " ")
        text = re_sub(r"@\w+", "user")
        text = re_sub(r"{}{}[)dD]+|[)dD]+{}{}".format(eyes, nose, nose, eyes), "smile")
        text = re_sub(r"{}{}p+".format(eyes, nose), "laugh")
        text = re_sub(r"{}{}\(+|\)+{}{}".format(eyes, nose, nose, eyes), "sad")
        text = re_sub(r"{}{}[\/|l*]".format(eyes, nose), "neutral")
        text = re_sub(r"/"," / ")
        text = re_sub(r"<3","love")
        text = re_sub(r"[-+]?[.\d]*[\d]+[:,.\d]*", " ")
        text = re_sub(r"#\S+", self.hashtag)
        text = re_sub(r"([!?.]){2,}", r"\1 repeat")
        text = re_sub(r"\b(\S*?)(.)\2{2,}\b", r"\1\2 <elong>")
        text = re_sub(r"([A-Z]){2,}", self.allcaps)

        return text.lower()

Now we need to make a service for loading saved model of keras and make it a predict function as well. But, saved deep learning models are usually big in size and some of theme even takes time to load themselves. we shall implement the service in a way such that we won’t have to load it, at every call of endpoint.

To avoid this problem, we will be using singleton design pattern.

SentimentService.py
from keras.models import model_from_json
import tensorflow as tf
import pickle

class SentimentService(object):
    model1 = None
    tokenizer = None

    @classmethod
    def load_deep_model(self, model):
        loaded_model = tf.keras.models.load_model("./src/mood-saved-models/" + model + ".h5")
        return loaded_model

    @classmethod
    def get_model1(self):
        if self.model1 is None:
            self.model1 = self.load_deep_model('model5_ver1')
        return self.model1

    @classmethod
    def load_tokenizer(self):
        if self.tokenizer is None:
            with open('./src/mood-saved-models/tokenizer.pickle', 'rb') as handle:
                self.tokenizer = pickle.load(handle)
        return self.tokenizer

load_tokenizer is for loading saved tokenizer.

Now, we need to build endpoints which will be using these services. We will be building three endpoints.

  • Health Check, to check status of flask service if it is running or not.

  • get structure & parameters of saved model

  • get prediction of the model

Health Check
@app.route("/heath", methods=["GET"])
def heath():
    return Response(json.dumps({"status":"UP"}), status=200, mimetype='application/json')
Get structure of model
@app.route("/show_model", methods=["GET"])
def show_model():
    model = request.args.get("model", default=None,type=str)
    model_format = json.loads(open('mood-saved-models/' + model + '.json').read())
    return Response(json.dumps(model_format), status=200, mimetype='application/json')
Detect mood from text
@app.route('/mood-detect', methods=['POST'])
def model_predict():

    if not request.json or not 'text' in request.json:
        abort(400)

    tp = TextPreprocessing()

    sent = pd.Series(request.json['text'])
    new_sent = [tp.tweet_preprocessing(i) for i in sent]

    seq = SentimentService.load_tokenizer().texts_to_sequences(pd.Series(''.join(new_sent)))
    test = pad_sequences(seq, maxlen=256)

    another_strategy = tf.distribute.MirroredStrategy()
    with another_strategy.scope():
        model = SentimentService.get_model1()

    res = model.predict_proba(test,batch_size=32, verbose=0)

    lab_list = ['anger', 'disgust', 'fear', 'guilt', 'joy', 'sadness', 'shame']
    moods = {}
    for actual, probabilities in zip(lab_list, res[0]):
        moods[actual] = 100*probabilities

    return Response(json.dumps(moods), status=200, mimetype='application/json')

Now, we are ready to use this service to detect from a text.

Run main.py and get results after calling endpoints.

$ python src/main.py

To get structure of model [GET]

GET http://0.0.0.0:5000/show_model?model=model5_ver1
Output
{
    "class_name": "Sequential",
    "config": [
        {
            "class_name": "Embedding",
            "config": {
                "name": "embedding_2",
                "trainable": false,
                "batch_input_shape": [
                    null,
                    256
                ],
                "dtype": "float32",
                "input_dim": 57888,
                "output_dim": 100,
                "embeddings_initializer": {
                    "class_name": "RandomUniform",
                    "config": {
                        "minval": -0.05,
                        "maxval": 0.05,
                        "seed": null
                    }
                },
                "embeddings_regularizer": null,
                "activity_regularizer": null,
                "embeddings_constraint": null,
                "mask_zero": false,
                "input_length": 256
            }
        },
        {
            "class_name": "SpatialDropout1D",
            "config": {
                "name": "spatial_dropout1d_4",
                "trainable": true,
                "rate": 0.2,
                "noise_shape": null,
                "seed": null
            }
        },
        {
            "class_name": "Bidirectional",
            "config": {
                "name": "bidirectional_7",
                "trainable": true,
                "layer": {
                    "class_name": "LSTM",
                    "config": {
                        "name": "lstm_13",
                        "trainable": true,
                        "return_sequences": true,
                        "return_state": false,
                        "go_backwards": false,
                        "stateful": false,
                        "unroll": false,
                        "units": 128,
                        "activation": "tanh",
                        "recurrent_activation": "hard_sigmoid",
                        "use_bias": true,
                        "kernel_initializer": {
                            "class_name": "VarianceScaling",
                            "config": {
                                "scale": 1,
                                "mode": "fan_avg",
                                "distribution": "uniform",
                                "seed": null
                            }
                        },
                        "recurrent_initializer": {
                            "class_name": "Orthogonal",
                            "config": {
                                "gain": 1,
                                "seed": null
                            }
                        },
                        "bias_initializer": {
                            "class_name": "Zeros",
                            "config": {}
                        },
                        "unit_forget_bias": true,
                        "kernel_regularizer": null,
                        "recurrent_regularizer": null,
                        "bias_regularizer": null,
                        "activity_regularizer": null,
                        "kernel_constraint": null,
                        "recurrent_constraint": null,
                        "bias_constraint": null,
                        "dropout": 0.2,
                        "recurrent_dropout": 0.2,
                        "implementation": 1
                    }
                },
                "merge_mode": "concat"
            }
        },
        {
            "class_name": "BatchNormalization",
            "config": {
                "name": "batch_normalization_10",
                "trainable": true,
                "axis": -1,
                "momentum": 0.99,
                "epsilon": 0.001,
                "center": true,
                "scale": true,
                "beta_initializer": {
                    "class_name": "Zeros",
                    "config": {}
                },
                "gamma_initializer": {
                    "class_name": "Ones",
                    "config": {}
                },
                "moving_mean_initializer": {
                    "class_name": "Zeros",
                    "config": {}
                },
                "moving_variance_initializer": {
                    "class_name": "Ones",
                    "config": {}
                },
                "beta_regularizer": null,
                "gamma_regularizer": null,
                "beta_constraint": null,
                "gamma_constraint": null
            }
        },
        {
            "class_name": "Bidirectional",
            "config": {
                "name": "bidirectional_8",
                "trainable": true,
                "layer": {
                    "class_name": "LSTM",
                    "config": {
                        "name": "lstm_14",
                        "trainable": true,
                        "return_sequences": false,
                        "return_state": false,
                        "go_backwards": false,
                        "stateful": false,
                        "unroll": false,
                        "units": 128,
                        "activation": "tanh",
                        "recurrent_activation": "hard_sigmoid",
                        "use_bias": true,
                        "kernel_initializer": {
                            "class_name": "VarianceScaling",
                            "config": {
                                "scale": 1,
                                "mode": "fan_avg",
                                "distribution": "uniform",
                                "seed": null
                            }
                        },
                        "recurrent_initializer": {
                            "class_name": "Orthogonal",
                            "config": {
                                "gain": 1,
                                "seed": null
                            }
                        },
                        "bias_initializer": {
                            "class_name": "Zeros",
                            "config": {}
                        },
                        "unit_forget_bias": true,
                        "kernel_regularizer": null,
                        "recurrent_regularizer": null,
                        "bias_regularizer": null,
                        "activity_regularizer": null,
                        "kernel_constraint": null,
                        "recurrent_constraint": null,
                        "bias_constraint": null,
                        "dropout": 0.2,
                        "recurrent_dropout": 0.2,
                        "implementation": 1
                    }
                },
                "merge_mode": "concat"
            }
        },
        {
            "class_name": "Dense",
            "config": {
                "name": "dense_10",
                "trainable": true,
                "units": 7,
                "activation": "sigmoid",
                "use_bias": true,
                "kernel_initializer": {
                    "class_name": "VarianceScaling",
                    "config": {
                        "scale": 1,
                        "mode": "fan_avg",
                        "distribution": "uniform",
                        "seed": null
                    }
                },
                "bias_initializer": {
                    "class_name": "Zeros",
                    "config": {}
                },
                "kernel_regularizer": null,
                "bias_regularizer": null,
                "activity_regularizer": null,
                "kernel_constraint": null,
                "bias_constraint": null
            }
        }
    ],
    "keras_version": "2.2.2",
    "backend": "tensorflow"
}

To get prediction [POST]

POST http://0.0.0.0:5000/mood-detect
Request body
{
	"text": "great i am liking it"
}
Response
{
    "anger": 7.112710922956467,
    "disgust": 3.1775277107954025,
    "fear": 12.434638291597366,
    "guilt": 2.8116755187511444,
    "joy": 56.977683305740356,
    "sadness": 13.96680623292923,
    "shame": 3.2702498137950897
}

Github repository

Source code is available on the github tensorflow 2.0 repository. You can clone the project from github and run it on your system.

Production deployment using WSGI

You can checkout these 3 series articles for production deployment of Flask endpoints:

Thanks for reading this article.


Top articles in this category:
  1. RuntimeError: get_session is not available when using TensorFlow 2.0
  2. Deploying Keras Model in Production using Flask
  3. Imbalanced classes in classification problem in deep learning with keras
  4. Flask Interview Questions
  5. Part 2: Deploy Flask API in production using WSGI gunicorn with nginx reverse proxy
  6. SVM after LSTM deep learning model for text classification
  7. Creating custom Keras callbacks in python

Recommended books for interview preparation:

Find more on this topic:
Buy interview books

Java & Microservices interview refresher for experienced developers.