If serverless platforms provide Node.js v12 on multi-core environments, functions can use this feature to reduce execution time and, therefore, lower costs. Depending on the workload, functions can utilise all available CPU cores to parallelise work, rather than executing more functions concurrently. 💰💰💰
In this blog post, I’ll explain how to use Worker Threads from a serverless function. I’ll be using IBM Cloud Functions (Apache OpenWhisk) as the example platform but this approach is applicable for any serverless platform with Node.js v12 support and a multi-core CPU runtime environment.
Node.js v12 in IBM Cloud Functions (Apache OpenWhisk)
This section of the blog post is specifically about using the new Node.js v12 runtime on IBM Cloud Functions (powered by Apache OpenWhisk). If you are using a different serverless platform, feel free to skip ahead to the next section…
I’ve recently been working on adding the Node.js v12 runtime to Apache OpenWhisk.
Apache OpenWhisk uses Docker containers as runtime environments for serverless functions. All runtime images are maintained in separate repositories for each supported language, e.g. Node.js, Java, Python, etc. Runtime images are automatically built and pushed to Docker Hub when the repository is updated.
node.js v12 runtime image
Having this image available as a native runtime in Apache OpenWhisk requires upstream changes to the project’s runtime manifest. After this happens, developers will be able to use the
--kind CLI flag to select this runtime version.
IBM Cloud Functions is powered by Apache OpenWhisk. It will eventually pick up the upstream project changes to include this new runtime version. Until that happens, Docker support allows usage of this new runtime before it is built-in the platform.
This Apache OpenWhisk action returns the version of Node.js used in the runtime environment.
1 2 3 4 5
Running this code on IBM Cloud Functions, using the Node.js v12 runtime image, allows us to confirm the new Node.js version is available.
1 2 3 4 5 6
Worker Threads in Serverless Functions
This is a great introdution blog post to Workers Threads. It uses an example of generating prime numbers as the CPU intensive task to benchmark. Comparing the performance of the single-threaded version to multiple-threads - the performance is improved as a factor of the threads used (up to the number of CPU cores available).
This code can be ported to run in a serverless function. Running with different input values and thread counts will allow benchmarking of the performance improvement.
Here is the sample code for a serverless function to generate prime numbers. It does not use Worker Threads. It will run on the main event loop for the Node.js process. This means it will only utilise a single thread (and therefore single CPU core).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
porting the code to use worker threads
Here is the prime number calculation code which uses Worker Threads. Dividing the total input range by the number of Worker Threads generates individual thread input values. Worker Threads are spawned and passed chunked input ranges. Threads calculate primes and then send the result back to the parent thread.
Reviewing the code to start converting it to a serverless function, I realised there were two issues running this code in serverless environment: worker thread initialisation and optimal worker thread counts.
How to initialise Worker Threads?
This is how the existing source code initialises the Worker Threads.
__filename is a special global variable in Node.js which contains the currently executing script file path.
This means the Worker Thread will be initialised with a copy of the currently executing script. Node.js provides a special variable to indicate whether the script is executing in the parent or child thread. This can be used to branch script logic.
So, what’s the issue with this?
In the Apache OpenWhisk Node.js runtime, action source files are dynamically imported into the runtime environment. The script used to start the Node.js runtime process is for the platform handler, not the action source files. This means the
__filename variable does not point to the action source file.
This issue is fixed by separating the serverless function handler and worker thread code into separate files. Worker Threads can be started with a reference to the worker thread script source file, rather than the currently executing script name.
How Many Worker Threads?
The next issue to resolve is how many Worker Threads to use. In order to maximise parallel processing capacity, there should be a Worker Thread for each CPU core. This is the maximum number of threads that can run concurrently.
Node.js provides CPU information for the runtime environment using the
os.cpus() function. The result is an array of objects (one per logical CPU core), with model information, processing speed and elapsed processing times. The length of this array will determine number of Worker Threads used. This ensures the number of Worker Threads will always match the CPU cores available.
workers threads version
Here is the serverless version of the prime number generation algorithm which uses Worker Threads.
The code is split over two files -
This file contains the serverless function handler used by the platform. Input ranges (based on the
max action parameters) are divided into chunks, based upon the number of Worker Threads. The handler function creates a Worker Thread for each chunk and waits for the message with the result. Once all the results have been retrieved, it returns all those primes numbers as the invocation result.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
This is the script used in the Worker Thread. The
workerData value is used to receive number ranges to search for prime numbers. Primes numbers are sent back to the parent thread using the
postMessage function. Since this script is only used in the Worker Thread, it does need to use the
isMainThread value to check if it is a child or parent process.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Source files deployed from a zip file also need to include a
package.json file in the archive. The
main property is used to determine the script to import as the exported package module.
1 2 3 4 5
Running both functions with the same input parameters allows execution time comparison. The Worker Threads version should improve performance by a factor proportional to available CPU cores. Reducing execution time also means reduced costs in a serverless platform.
Creating a new serverless function (
primes) from the non-worker threads source code, using the Node.js v12 runtime, I can test with small values to check correctness.
1 2 3 4 5 6
Playing with sample input values, 10,000,000 seems like a useful benchmark value. This takes long enough with the single-threaded version to benefit from parallelism.
1 2 3 4 5
Using the simple single-threaded algorithm it takes the serverless function around ~35 seconds to calculate primes up to ten million.
workers threads performance
Creating a new serverless function, from the worker threads-based source code using the Node.js v12 runtime, allows me to verify it works as expected for small input values.
1 2 3 4 5 6
Hurrah, it works.
Invoking the function with an
max parameter of 10,000,000 allows us to benchmark against the non-workers version of the code.
1 2 3 4 5
The workers versions only takes ~25% of the time of the single-threaded version!
This is because IBM Cloud Functions’ runtime environments provide access to four CPU cores. Unlike other platforms, CPU cores are not tied to memory allocations. Utilising all available CPU cores concurrently allows the algorithm to run 4x times as fast. Since serverless platforms charge based on execution time, reducing execution time also means reducing costs.
The worker threads version also costs 75% less than the single-threaded version!
Node.js v12 was released in April 2019. This version included support for Worker Threads, that were enabled by default (rather than needing an optional runtime flag). Using multiple CPU cores in Node.js applications has never been easier!
Node.js applications with CPU-intensive workloads can utilise this feature to reduce execution time. Since serverless platforms charge based upon execution time, this is especially useful for Node.js serverless functions. Utilising multiple CPU cores leads, not only to improved performance, but also lower bills.
PRs have been opened to enable Node.js v12 as a built-in runtime to the Apache OpenWhisk project. This Docker image for the new runtime version is already available on Docker Hub. This means it can be used with any Apache OpenWhisk instance straight away!
Playing with Worker Threads on IBM Cloud Functions allowed me to demonstrate how to speed up performance for CPU-intensive workloads by utilising multiple cores concurrently. Using an example of prime number generation, calculating all primes up to ten million took ~35 seconds with a single thread and ~8 seconds with four threads. This represents a reduction in execution time and cost of 75%!