Serverless APIs for MAX models

IBM’s Model Asset eXchange provides a curated list of free Machine Learning models for developers. Models currently published include detecting emotions or ages in faces from images, forecasting the weather, converting speech to text and more. Models are pre-trained and ready for use in the cloud.

Models are published as series of public Docker images. Images automatically expose a HTTP API for model predictions. Documentation in the model repositories explains how to run images locally (using Docker) or deploy to the cloud (using Kubernetes). This got me thinking…

Could MAX models be used from serverless functions? 🤔

Running machine learning models on serverless platforms can take advantage of the horizontal scalability to process large numbers of computationally intensive classification tasks in parallel. Coupled with the serverless pricing structure ("no charge for idle"), this can be an extremely cheap and effective way to perform model classifications in the cloud.

CHALLENGE ACCEPTED! 🦸‍♂️🦸‍♀️

After a couple days of experimentation, I had worked out an easy way to automatically expose MAX models as Serverless APIs on IBM Cloud Functions. 🎉🎉🎉

I’ve given instructions below on how to create those APIs from the models using a simple script. If you just want to use the models, follow those instructions. If you are interested in understanding how this works, keep reading as I explain afterwards what I did…

Running MAX models on IBM Cloud Functions

This repository contains a bash script which builds custom Docker runtimes with MAX models for usage on IBM Cloud Functions. Pushing these images to Docker Hub allows IBM Cloud Functions to use them as custom runtimes. Web Actions created from these custom runtime images expose the same Prediction API described in the model documentation. They can be used with no further changes or custom code needed.

prerequisites

Please follow the links below to set up the following tools before proceeding.

Check out the “Serverless MAX Models repository. Run all the following commands from that folder.

git clone https://github.com/jthomas/serverless-max-models 
cd serverless-max-models 

build custom runtime images

  • Set the following environment variables (MODELS) with MAX model names and run build script.
    • MODELS: MAX model names, e.g. max-facial-emotion-classifier
    • USERNAME: Docker Hub username.
MODELS="..." USERNAME="..." ./build.sh

This will create Docker images locally with the MAX model names and push to Docker Hub for usage in IBM Cloud Functions. IBM Cloud Functions only supports public Docker images as custom runtimes.

create actions using custom runtimes

ibmcloud wsk action create <MODEL_IMAGE> --docker <DOCKERHUB_NAME>/<MODEL_IMAGE> --web true -m 512
  • Retrieve the Web Action URL (https://<REGION>.functions.cloud.ibm.com/api/v1/web/<NS>/default/<ACTION>)
ibmcloud wsk action get <MODEL_IMAGE> --url

invoke web action url with prediction api parameters

Use the same API request parameters as defined in the Prediction API specification with the Web Action URL. This will invoke model predictions and return the result as the HTTP response, e.g.

curl -F "image=@assets/happy-baby.jpeg" -XPOST <WEB_ACTION_URL>

NOTE: The first invocation after creating an action may incur long cold-start delays due to the platform pulling the remote image into the local registry. Once the image is available in the platform, both further cold and warm invocations will be much faster.

Example

Here is an example of creating a serverless API using the max-facial-emotion-classifier MAX model. Further examples of models which have been tested are available here. If you encounter problems, please open an issue on Github.

max-facial-emotion-classifier

Start by creating the action using the custom runtime and then retrieve the Web Action URL.

$ ibmcloud wsk action create max-facial-emotion-classifier --docker <DOCKERHUB_NAME>/max-facial-emotion-classifier --web true -m 512
ok: created action max-facial-emotion-classifier
$ ibmcloud wsk action get max-facial-emotion-classifier --url
ok: got action max-facial-emotion-classifier
https://<REGION>.functions.cloud.ibm.com/api/v1/web/<NS>/default/max-facial-emotion-classifier

According to the API definition for this model, the prediction API expects a form submission with an image file to classify. Using a sample image from the model repo, the model can be tested using curl.

$ curl -F "image=@happy-baby.jpeg" -XPOST https://<REGION>.functions.cloud.ibm.com/api/v1/web/<NS>/default/max-facial-emotion-classifier
{
  "status": "ok",
  "predictions": [
    {
      "detection_box": [
        0.15102639296187684,
        0.3828125,
        0.5293255131964809,
        0.5830078125
      ],
      "emotion_predictions": [
        {
          "label_id": "1",
          "label": "happiness",
          "probability": 0.9860254526138306
        },
        ...
      ]
    }
  ]
}

performance

Example Invocation Duration (Cold): ~4.8 seconds

Example Invocation Duration (Warm): ~ 800 ms

How does this work?

background

Running machine learning classifications using pre-trained models from serverless functions has historically been challenging due to the following reason…

Developers do not control runtime environments in (most) serverless cloud platforms. Libraries and dependencies needed by the functions must be provided in the deployment package. Most platforms limit deployment package sizes (~50MB compressed & ~250MB uncompressed).

Machine Learning libraries and models can be much larger than those deployment size limits. This stops them being included in deployment packages. Loading files dynamically during invocations may be possible but incurs extremely long cold-start delays and additional costs.

Fortunately, IBM Cloud Functions is based on the open-source serverless project, Apache OpenWhisk. This platform supports bespoke function runtimes using custom Docker images. Machine learning libraries and models can therefore be provided in custom runtimes. This removes the need to include them in deployment packages or be loaded at runtime.

Interested in reading other blog posts about using machine learning libraries and toolkits with IBM Cloud Functions? See these posts for more details.

MAX model images

IBM’s Model Asset eXchange publishes Docker images for each model, alongside the pre-trained model files. Images expose a HTTP API for predictions using the model on port 5000, built using Python and Flask. Swagger files for the APIs describe the available operations, input parameters and response bodies.

These images use a custom application framework (maxfw), based on Flask, to standardise exposing MAX models as HTTP APIs. This framework handles input parameter validation, response marshalling, CORS support, etc. This allows model runtimes to just implement the prediction API handlers, rather than the entire HTTP application.

Since the framework already handles exposing the model as a HTTP API, I started looking for a way to simulate an external HTTP request coming into the framework. If this was possible, I could trigger this fake request from a Python Web Action to perform the model classification from input parameters. The Web Action would then covert the HTTP response returned into the valid Web Action response parameters.

flask test client

Reading through the Flask documentation, I came across the perfect solution! 👏👏👏

Flask provides a way to test your application by exposing the Werkzeug test Client and handling the context locals for you. You can then use that with your favourite testing solution.

This allows application routes to be executed with the test client, without actually running the HTTP server.

max_app = MAXApp(API_TITLE, API_DESC, API_VERSION)
max_app.add_api(ModelPredictAPI, '/predict')
test_client = max_app.app.test_client()
r = test_client.post('/model/predict', data=content, headers=headers)

Using this code within a serverless Python function allows function invocations to trigger the prediction API. The serverless function only has to convert input parameters to the fake HTTP request and then serialise the response back to JSON.

python docker action

The custom MAX model runtime image needs to implement the HTTP API expected by Apache OpenWhisk. This API is used to instantiate the runtime environment and then pass in invocation parameters on each request. Since the runtime image contains all files and code need to process requests, the /init handler becomes a no-op. The /run handler converts Web Action HTTP parameters into the fake HTTP request.

Here is the Python script used to proxy incoming Web Actions requests to the framework model service.

from maxfw.core import MAXApp
from api import ModelPredictAPI
from config import API_TITLE, API_DESC, API_VERSION
import json
import base64
from flask import Flask, request, Response

max_app = MAXApp(API_TITLE, API_DESC, API_VERSION)
max_app.add_api(ModelPredictAPI, '/predict')

# Use flask test client to simulate HTTP requests for the prediction APIs
# HTTP request data will come from action invocation parameters, neat huh? :)
test_client = max_app.app.test_client()
app = Flask(__name__)

# This implements the Docker runtime API used by Apache OpenWhisk
# https://github.com/apache/incubator-openwhisk/blob/master/docs/actions-docker.md
# /init is a no-op as everything is provided in the image.
@app.route("/init", methods=['POST'])
def init():
    return ''

# Action invocation requests will be received as the `value` parameter in request body.
# Web Actions provide HTTP request parameters as `__ow_headers` & `__ow_body` parameters.
@app.route("/run", methods=['POST'])
def run():
    body = request.json
    form_body = body['value']['__ow_body']
    headers = body['value']['__ow_headers']

    # binary image content provided as base64 strings
    content = base64.b64decode(form_body)

    # send fake HTTP request to prediction API with invocation data
    r = test_client.post('/model/predict', data=content, headers=headers)
    r_headers = dict((x, y) for x, y in r.headers)

    # binary data must be encoded as base64 strings to return in JSON response
    is_image = r_headers['Content-Type'].startswith('image')
    r_data = base64.b64encode(r.data) if is_image else r.data
    body = r_data.decode("utf-8")

    response = {'headers': r_headers, 'status': r.status_code, 'body': body }
    print (r.status)
    return Response(json.dumps(response), status=200, mimetype='application/json')

app.run(host='0.0.0.0', port=8080)

building into an image

Since the MAX models already exist as public Docker images, those images can be used as base images when building custom runtimes. Those base images handle adding model files and all dependencies needed to execute them into the image.

This is the Dockerfile used by the build script to create the custom model image. The model parameter refers to the build argument containing the model name.

ARG model
FROM codait/${model}:latest

ADD openwhisk.py .

EXPOSE 8080

CMD python openwhisk.py

This is then used from the following build script to create a custom runtime image for the model.

#!/bin/bash

set -e -u

for model in $MODELS; do
  echo "Building $model runtime image"
  docker build -t $model --build-arg model=$model .
  echo "Pushing $model to Docker Hub"
  docker tag $model $USERNAME/$model
  docker push $USERNAME/$model
done 

Once the image is published to Docker Hub, it can be referenced when creating new Web Actions (using the —docker parameter). 😎

ibmcloud wsk action create <MODEL_IMAGE> --docker <DOCKERHUB_NAME>/<MODEL_IMAGE> --web true -m 512

Conclusion

IBM’s Model Asset eXchange is a curated collection of Machine Learning models, ready to deploy to the cloud for a variety of tasks. All models are available as a series of public Docker images. Models images automatically expose HTTP APIs for classifications.

Documentation in the model repositories explains how to run them locally and deploy using Kubernetes, but what about using on serverless cloud platforms? Serverless platforms are becoming a popular option for deploying Machine Learning models, due to horizontal scalability and cost advantages.

Looking through the source code for the model images, I discovered a mechanism to hook into the custom model framework used to export the model files as HTTP APIs. This allowed me write a simple wrapper script to proxy serverless function invocations to the model prediction APIs. API responses would be serialised back into the Web Action response format.

Building this script into a new Docker image, using the existing model image as the base image, created a new runtime which could be used on the platform. Web Actions created from this runtime image would automatically expose the same HTTP APIs as the existing image!