Large Applications on OpenWhisk

Aug 04, 2017
openwhisk serverless python
5 min read

OpenWhisk supports creating actions from archive files containing source files and project dependencies.

The maximum code size for the action is 48MB. OpenWhisk system details, https://github.com/apache/incubator-openwhisk/blob/master/docs/reference.md#per-action-artifact-mb-fixed-48mb

Applications with lots of third-party modules, native libraries or external tools may be soon find themselves running into this limit. Node.js libraries are notorious for having large amounts of dependencies.

What if you need to deploy an application larger than this limit to OpenWhisk?

Previous solutions used Docker support in OpenWhisk to build a custom Docker image per action. Source files and dependencies are built into a public image hosted on Docker Hub.

This approach overcomes the limit on deployment size but means application source files will be accessible on Docker Hub. This is not an issue for building samples or open-source projects but not realistic for most applications.

So, using an application larger than this limit requires me to make my source files public? 🤔

There’s now a better solution! 👏👏👏

OpenWhisk supports creating actions from an archive file AND a custom Docker image.

If we build a custom Docker runtime which includes shared libraries, those dependencies don’t need including in the archive file. Private source files will still be bundled in the archive and injected at runtime.

Reducing archive file sizes also improves deployment times.

Let’s look at an example…

Using Machine Learning Libraries on OpenWhisk

Python is a popular language for machine learning and data science. Libraries like pandas, scikit-learn and numpy provide all the tools. Serverless computing is becoming a good choice for machine learning microservices.

OpenWhisk supports Python 2 and 3 runtimes.

Popular libraries like flask, requests and beautifulsoup are available as global packages. Additional packages can be imported using virutalenv during invocations.

Python Machine Learning Libraries

Python packages can be used in OpenWhisk using virtualenv. Developers install the packages locally and include the virutalenv folder in the archive for deployment.

Machine Learning libraries often use numerous shared libraries and compile native dependencies for performance. This can lead to hundreds of megabytes of dependencies.

Setting up a new virtualenv folder and installing pandas leads to an environment with nearly 100MB of dependencies.

$ virtualenv env
$ source env/bin/activate
$ pip install pandas
...
Installing collected packages: numpy, six, python-dateutil, pytz, pandas
Successfully installed numpy-1.13.1 pandas-0.20.3 python-dateutil-2.6.1 pytz-2017.2 six-1.10.0
$ du -h
...
84M	. <-- FOLDER SIZE 😱

Bundling these libraries within an archive file will not be possible due to the file size limit.

Custom OpenWhisk Runtime Images

Overcoming this limit can be achieved using a custom runtime image. The runtime will pre-install additional libraries during the build process and make them available during invocations.

OpenWhisk uses Docker for the runtime containers. Source files for the images are available on Github under the core folder. Here’s the Dockerfile for the Python runtime: https://github.com/apache/incubator-openwhisk/blob/master/core/pythonAction/Dockerfile.

Images for OpenWhisk runtimes are also available on Docker Hub under the OpenWhisk organisation.

Docker supports building new images from a parent image using the FROM directive. Inheriting from the existing runtime images means the Dockerfile for the new runtime only has to contain commands for installing extra dependencies.

Let’s build a new Python runtime which includes those libraries as shared packages.

Building Runtimes

Let’s create a new Dockerfile which installs additional packages into the OpenWhisk Python runtime.

FROM openwhisk/python3action

# lapack-dev is available in community repo.
RUN echo "http://dl-4.alpinelinux.org/alpine/edge/community" >> /etc/apk/repositories

# add package build dependencies
RUN apk add --no-cache \
        g++ \
        lapack-dev \
        gfortran

# add python packages
RUN pip install \
    numpy \
    pandas \
    scipy \
    sklearn

Running the Docker build command will create a new image with these extra dependencies.

$ docker build -t python_ml_runtime .
Sending build context to Docker daemon  83.01MB
Step 1/4 : FROM openwhisk/python3action
 ---> 46388e726fae
...
Successfully built cfc14a93863e
Successfully tagged python_ml_runtime:latest

Hosting images on Docker Hub requires registering a (free) account @ https://hub.docker.com/

Create a new tag from the python_ml_runtime image containing the Docker Hub username.

$ docker tag python_ml_runtime <YOUR_USERNAME>/python_ml_test

Push the image to Docker Hub to make it available to OpenWhisk.

$ docker push <YOUR_USERNAME>/python_ml_test

Testing It Out

Create a new Python file (main.py) with the following contents:

import numpy 
import pandas 
import sklearn
import scipy

def main(params):
    return {
        "numpy": numpy.__version__,
        "pandas": pandas.__version__,
        "sklearn": sklearn.__version__,
        "scipy": scipy.__version__
    }

Create a new OpenWhisk action using the Docker image from above and source file.

$ wsk action create lib-versions --docker <YOUR_USERNAME>/openwhisk_python_ml main.py
ok: created action lib-versions

Invoke the action to verify the modules are available and return the versions.

$ wsk action invoke lib-versions --result
{
    "numpy": "1.13.1",
    "pandas": "0.20.3",
    "scipy": "0.19.1",
    "sklearn": "0.18.2"
}

Yass. It works. 💃🕺

Serverless Machine Learning here we come…. 😉

Conclusions

Using custom runtimes with private source files is an amazing feature of OpenWhisk. It enables developers to run larger applications on the platform but also enables lots of other use cases. Almost any runtime, library or tool can now be used from the platform.

Here are some examples of where this approach could be used…

Installing global libraries to reduce archive file size under 48MB and speed up deployments.
Upgrading language runtimes, i.e. using Node.js 8 instead of 6.
Adding native dependencies or command-line tools to the environment, e.g. ffmpeg.

Building new runtimes is really simple using pre-existing base images published on Dockerhub.

The possibilities are endless!