Once I had this working with a local Node.js script, my next idea was to convert it into a serverless function. Running this function on IBM Cloud Functions (Apache OpenWhisk) would turn the script into my own visual recognition microservice.
Converting the image classification script to run in a serverless environment had the following challenges…
- TensorFlow.js libraries need to be available in the runtime.
- Native bindings for the library must be compiled against the platform architecture.
- Models files need to be loaded from the filesystem.
Some of these issues were more challenging than others to fix! Let’s start by looking at the details of each issue, before explaining how Docker support in Apache OpenWhisk can be used to resolve them all.
TensorFlow.js libraries are not included in the Node.js runtimes provided by the Apache OpenWhisk.
External libraries can be imported into the runtime by deploying applications from a zip file. Custom
node_modules folders included in the zip file will be extracted in the runtime. Zip files are limited to a maximum size of 48MB.
npm install for the TensorFlow.js libraries used revealed the first problem… the resulting
node_modules directory was 175MB. 😱
Looking at the contents of this folder, the
tfjs-node module compiles a native shared library (
libtensorflow.so native shared library must be compiled using the platform runtime. Running
npm install locally automatically compiles native dependencies against the host platform. Local environments may use different CPU architectures (Mac vs Linux) or link against shared libraries not available in the serverless runtime.
MobileNet Model Files
TensorFlow models files need loading from the filesystem in Node.js. Serverless runtimes do provide a temporary filesystem inside the runtime environment. Files from deployment zip files are automatically extracted into this environment before invocations. There is no external access to this filesystem outside the lifecycle of the serverless function.
Models files for the MobileNet model were 16MB. If these files are included in the deployment package, it leaves 32MB for the rest of the application source code. Although the model files are small enough to include in the zip file, what about the TensorFlow.js libraries? Is this the end of the blog post? Not so fast….
Apache OpenWhisk’s support for custom runtimes provides a simple solution to all these issues!
Apache OpenWhisk uses Docker containers as the runtime environments for serverless functions (actions). All platform runtime images are published on Docker Hub, allowing developers to start these environments locally.
Developers can also specify custom runtime images when creating actions. These images must be publicly available on Docker Hub. Custom runtimes have to expose the same HTTP API used by the platform for invoking actions.
Using platform runtime images as parent images makes it simple to build custom runtimes. Users can run commands during the Docker build to install additional libraries and other dependencies. The parent image already contains source files with the HTTP API service handling platform requests.
Here is the Docker build file for the Node.js action runtime with additional TensorFlow.js dependencies.
1 2 3 4 5
openwhisk/action-nodejs-v8:latest is the Node.js action runtime image published by OpenWhisk.
TensorFlow libraries and other dependencies are installed using
npm install in the build process. Native dependencies for the
@tensorflow/tfjs-node library are automatically compiled for the correct platform by installing during the build process.
Since I’m building a new runtime, I’ve also added the MobileNet model files to the image. Whilst not strictly necessary, removing them from the action zip file reduces deployment times.
Want to skip the next step? Use this image
jamesthomas/action-nodejs-v8:tfjs rather than building your own.
Building The Runtime
In the previous blog post, I showed how to download model files from the public storage bucket.
- Download a version of the MobileNet model and place all files in the
- Copy the Docker build file from above to a local file named
- Run the Docker build command to generate a local image.
- Tag the local image with a remote username and repository.
<USERNAME> with your Docker Hub username.
- Push the local image to Docker Hub
Once the image is available on Docker Hub, actions can be created using that runtime image. 😎
This source code implements image classification as an OpenWhisk action. Image files are provided as a Base64 encoded string using the
image property on the event parameters. Classification results are returned as the
results property in the response.
Caching Loaded Models
Serverless platforms initialise runtime environments on-demand to handle invocations. Once a runtime environment has been created, it will be re-used for further invocations with some limits. This improves performance by removing the initialisation delay (“cold start”) from request processing.
Applications can exploit this behaviour by using global variables to maintain state across requests. This is often use to cache opened database connections or store initialisation data loaded from external systems.
I have used this pattern to cache the MobileNet model used for classification. During cold invocations, the model is loaded from the filesystem and stored in a global variable. Warm invocations then use the existence of that global variable to skip the model loading process with further requests.
Caching the model reduces the time (and therefore cost) for classifications on warm invocations.
Running the Node.js script from blog post on IBM Cloud Functions was possible with minimal modifications. Unfortunately, performance testing revealed a memory leak in the handler function. 😢
Reading more about how TensorFlow.js works on Node.js uncovered the issue…
TensorFlow.js’s Node.js extensions use a native C++ library to execute the Tensors on a CPU or GPU engine. Memory allocated for Tensor objects in the native library is retained until the application explicitly releases it or the process exits. TensorFlow.js provides a
dispose method on the individual objects to free allocated memory. There is also a
tf.tidy method to automatically clean up all allocated objects within a frame.
Reviewing the code, tensors were being created as model input from images on each request. These objects were not disposed before returning from the request handler. This meant native memory grew unbounded. Adding an explicit
dispose call to free these objects before returning fixed the issue.
Profiling & Performance
Action code records memory usage and elapsed time at different stages in classification process.
Recording memory usage allows me to modify the maximum memory allocated to the function for optimal performance and cost. Node.js provides a standard library API to retrieve memory usage for the current process. Logging these values allows me to inspect memory usage at different stages.
Timing different tasks in the classification process, i.e. model loading, image classification, gives me an insight into how efficient classification is compared to other methods. Node.js has a standard library API for timers to record and print elapsed time to the console.
- Run the following command with the IBM Cloud CLI to create the action.
<IMAGE_NAME> with the public Docker Hub image identifier for the custom runtime. Use
jamesthomas/action-nodejs-v8:tfjs if you haven’t built this manually.
Testing It Out
- Download this image of a Panda from Wikipedia.
- Invoke the action with the Base64 encoded image as an input parameter.
- Returned JSON message contains classification probabilities. 🐼🐼🐼
1 2 3 4 5 6
- Retrieve logging output for the last activation to show performance data.
Profiling and memory usage details are logged to stdout
1 2 3 4 5 6 7 8 9 10 11 12 13 14
main is the total elapsed time for the action handler.
mn_model.classify is the elapsed time for the image classification. Cold start requests print an extra log message with model loading time,
classify action 1000 times for both cold and warm activations (using 256MB memory) generated the following performance results.
Classifications took an average of 316 milliseconds to process when using warm environments. Looking at the timing data, converting the Base64 encoded JPEG into the input tensor took around 100 milliseconds. Running the model classification task was in the 200 - 250 milliseconds range.
Classifications took an average of 1260 milliseconds to process when using cold environments. These requests incur penalties for initialising new runtime containers and loading models from the filesystem. Both of these tasks took around 400 milliseconds each.
One disadvantage of using custom runtime images in Apache OpenWhisk is the lack of pre-warmed containers. Pre-warming is used to reduce cold start times by starting runtime containers before they are needed. This is not supported for non-standard runtime images.
IBM Cloud Functions provides a free tier of 400,000 GB/s per month. Each further second of execution is charged at $0.000017 per GB of memory allocated. Execution time is rounded up to the nearest 100ms.
If all activations were warm, a user could execute more than 4,000,000 classifications per month in the free tier using an action with 256MB. Once outside the free tier, around 600,000 further invocations would cost just over $1.
If all activations were cold, a user could execute more than 1,2000,000 classifications per month in the free tier using an action with 256MB. Once outside the free tier, around 180,000 further invocations would cost just over $1.
Getting a local script to run image classification was relatively simple, but converting to a serverless function came with more challenges! Apache OpenWhisk restricts the maximum application size to 50MB and native libraries dependencies were much larger than this limit.
Fortunately, Apache OpenWhisk’s custom runtime support allowed us to resolve all these issues. By building a custom runtime with native dependencies and models files, those libraries can be used on the platform without including them in the deployment package.