James Thomas

Notes on software.

Saving Money and Time With Node.js Worker Threads in Serverless Functions

Node.js v12 was released last month. This new version includes support for Worker Threads, that are enabled by default. Node.js Worker Threads make it simple to execute JavaScript code in parallel using threads. πŸ‘πŸ‘πŸ‘

This is useful for Node.js applications with CPU-intensive workloads. Using Worker Threads, JavaScript code can be executed code concurrently using multiple CPU cores. This reduces execution time compared to a non-Worker Threads version.

If serverless platforms provide Node.js v12 on multi-core environments, functions can use this feature to reduce execution time and, therefore, lower costs. Depending on the workload, functions can utilise all available CPU cores to parallelise work, rather than executing more functions concurrently. πŸ’°πŸ’°πŸ’°

In this blog post, I’ll explain how to use Worker Threads from a serverless function. I’ll be using IBM Cloud Functions (Apache OpenWhisk) as the example platform but this approach is applicable for any serverless platform with Node.js v12 support and a multi-core CPU runtime environment.

Node.js v12 in IBM Cloud Functions (Apache OpenWhisk)

This section of the blog post is specifically about using the new Node.js v12 runtime on IBM Cloud Functions (powered by Apache OpenWhisk). If you are using a different serverless platform, feel free to skip ahead to the next section…

I’ve recently been working on adding the Node.js v12 runtime to Apache OpenWhisk.

Apache OpenWhisk uses Docker containers as runtime environments for serverless functions. All runtime images are maintained in separate repositories for each supported language, e.g. Node.js, Java, Python, etc. Runtime images are automatically built and pushed to Docker Hub when the repository is updated.

node.js v12 runtime image

Here is the PR used to add the new Node.js v12 runtime image to Apache OpenWhisk. This led to the following runtime image being exported to Docker Hub: openwhisk/action-nodejs-v12.

Having this image available as a native runtime in Apache OpenWhisk requires upstream changes to the project’s runtime manifest. After this happens, developers will be able to use the --kind CLI flag to select this runtime version.

1
ibmcloud wsk action create action_name action.js --kind nodejs:12

IBM Cloud Functions is powered by Apache OpenWhisk. It will eventually pick up the upstream project changes to include this new runtime version. Until that happens, Docker support allows usage of this new runtime before it is built-in the platform.

1
ibmcloud wsk action create action_name action.js --docker openwhisk/action-nodejs-v12

example

This Apache OpenWhisk action returns the version of Node.js used in the runtime environment.

1
2
3
4
5
function main () {
  return {
    version: process.version
  }
}

Running this code on IBM Cloud Functions, using the Node.js v12 runtime image, allows us to confirm the new Node.js version is available.

1
2
3
4
5
6
$ ibmcloud wsk action create nodejs-v12 action.js --docker openwhisk/action-nodejs-v12
ok: created action nodejs-v12
$ ibmcloud wsk action invoke nodejs-v12 --result
{
    "version": "v12.1.0"
}

Worker Threads in Serverless Functions

This is a great introdution blog post to Workers Threads. It uses an example of generating prime numbers as the CPU intensive task to benchmark. Comparing the performance of the single-threaded version to multiple-threads - the performance is improved as a factor of the threads used (up to the number of CPU cores available).

This code can be ported to run in a serverless function. Running with different input values and thread counts will allow benchmarking of the performance improvement.

non-workers version

Here is the sample code for a serverless function to generate prime numbers. It does not use Worker Threads. It will run on the main event loop for the Node.js process. This means it will only utilise a single thread (and therefore single CPU core).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
'use strict';

const min = 2

function main(params) {
  const { start, end } = params
  console.log(params)
  const primes = []
  let isPrime = true;
  for (let i = start; i < end; i++) {
    for (let j = min; j < Math.sqrt(end); j++) {
      if (i !== j && i%j === 0) {
        isPrime = false;
        break;
      }
    }
    if (isPrime) {
      primes.push(i);
    }
    isPrime = true;
  }

  return { primes }
}

porting the code to use worker threads

Here is the prime number calculation code which uses Worker Threads. Dividing the total input range by the number of Worker Threads generates individual thread input values. Worker Threads are spawned and passed chunked input ranges. Threads calculate primes and then send the result back to the parent thread.

Reviewing the code to start converting it to a serverless function, I realised there were two issues running this code in serverless environment: worker thread initialisation and optimal worker thread counts.

How to initialise Worker Threads?

This is how the existing source code initialises the Worker Threads.

1
 threads.add(new Worker(__filename, { workerData: { start: myStart, range }}));

__filename is a special global variable in Node.js which contains the currently executing script file path.

This means the Worker Thread will be initialised with a copy of the currently executing script. Node.js provides a special variable to indicate whether the script is executing in the parent or child thread. This can be used to branch script logic.

So, what’s the issue with this?

In the Apache OpenWhisk Node.js runtime, action source files are dynamically imported into the runtime environment. The script used to start the Node.js runtime process is for the platform handler, not the action source files. This means the __filename variable does not point to the action source file.

This issue is fixed by separating the serverless function handler and worker thread code into separate files. Worker Threads can be started with a reference to the worker thread script source file, rather than the currently executing script name.

1
 threads.add(new Worker("./worker.js", { workerData: { start: myStart, range }}));

How Many Worker Threads?

The next issue to resolve is how many Worker Threads to use. In order to maximise parallel processing capacity, there should be a Worker Thread for each CPU core. This is the maximum number of threads that can run concurrently.

Node.js provides CPU information for the runtime environment using the os.cpus() function. The result is an array of objects (one per logical CPU core), with model information, processing speed and elapsed processing times. The length of this array will determine number of Worker Threads used. This ensures the number of Worker Threads will always match the CPU cores available.

1
const threadCount = os.cpus().length

workers threads version

Here is the serverless version of the prime number generation algorithm which uses Worker Threads.

The code is split over two files - primes-with-workers.js and worker.js.

primes-with-workers.js

This file contains the serverless function handler used by the platform. Input ranges (based on the min and max action parameters) are divided into chunks, based upon the number of Worker Threads. The handler function creates a Worker Thread for each chunk and waits for the message with the result. Once all the results have been retrieved, it returns all those primes numbers as the invocation result.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
'use strict';

const { Worker } = require('worker_threads');
const os = require('os')
const threadCount = os.cpus().length

const compute_primes = async (start, range) => {
  return new Promise((resolve, reject) => {
    let primes = []
    console.log(`adding worker (${start} => ${start + range})`)
    const worker = new Worker('./worker.js', { workerData: { start, range }})

    worker.on('error', reject)
    worker.on('exit', () => resolve(primes))
    worker.on('message', msg => {
      primes = primes.concat(msg)
    })
  })
}

async function main(params) {
  const { min, max } = params
  const range = Math.ceil((max - min) / threadCount)
  let start = min < 2 ? 2 : min
  const workers = []

  console.log(`Calculating primes with ${threadCount} threads...`);

  for (let i = 0; i < threadCount - 1; i++) {
    const myStart = start
    workers.push(compute_primes(myStart, range))
    start += range
  }

  workers.push(compute_primes(start, max - start))

  const primes = await Promise.all(workers)
  return { primes: primes.flat() }
}

exports.main = main

workers.js

This is the script used in the Worker Thread. The workerData value is used to receive number ranges to search for prime numbers. Primes numbers are sent back to the parent thread using the postMessage function. Since this script is only used in the Worker Thread, it does need to use the isMainThread value to check if it is a child or parent process.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
'use strict';
const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');

const min = 2

function generatePrimes(start, range) {
  const primes = []
  let isPrime = true;
  let end = start + range;
  for (let i = start; i < end; i++) {
    for (let j = min; j < Math.sqrt(end); j++) {
      if (i !== j && i%j === 0) {
        isPrime = false;
        break;
      }
    }
    if (isPrime) {
      primes.push(i);
    }
    isPrime = true;
  }

  return primes
}

const primes = generatePrimes(workerData.start, workerData.range);
parentPort.postMessage(primes)

package.json

Source files deployed from a zip file also need to include a package.json file in the archive. The main property is used to determine the script to import as the exported package module.

1
2
3
4
5
{
  "name": "worker_threads",
  "version": "1.0.0",
  "main": "primes-with-workers.js",
}

Performance Comparison

Running both functions with the same input parameters allows execution time comparison. The Worker Threads version should improve performance by a factor proportional to available CPU cores. Reducing execution time also means reduced costs in a serverless platform.

non-workers performance

Creating a new serverless function (primes) from the non-worker threads source code, using the Node.js v12 runtime, I can test with small values to check correctness.

1
2
3
4
5
6
$ ibmcloud wsk action create primes primes.js --docker openwhisk/action-nodejs-v12
ok: created action primes
$ ibmcloud wsk action invoke primes --result -p start 2 -p end 10
{
    "primes": [ 2, 3, 5, 7 ]
}

Playing with sample input values, 10,000,000 seems like a useful benchmark value. This takes long enough with the single-threaded version to benefit from parallelism.

1
2
3
4
5
$ time ibmcloud wsk action invoke primes --result -p start 2 -p end 10000000 > /dev/null

real  0m35.151s
user  0m0.840s
sys   0m0.315s

Using the simple single-threaded algorithm it takes the serverless function around ~35 seconds to calculate primes up to ten million.

workers threads performance

Creating a new serverless function, from the worker threads-based source code using the Node.js v12 runtime, allows me to verify it works as expected for small input values.

1
2
3
4
5
6
$ ibmcloud wsk action create primes-workers action.zip --docker openwhisk/action-nodejs-v12
ok: created action primes-workers
$ ibmcloud wsk action invoke primes-workers --result -p min 2 -p max 10
{
    "primes": [ 2, 3, 5, 7 ]
}

Hurrah, it works.

Invoking the function with an max parameter of 10,000,000 allows us to benchmark against the non-workers version of the code.

1
2
3
4
5
$ time ibmcloud wsk action invoke primes-workers --result -p min 2 -p max 10000000 --result > /dev/null

real  0m8.863s
user  0m0.804s
sys   0m0.302s

The workers versions only takes ~25% of the time of the single-threaded version!

This is because IBM Cloud Functions’ runtime environments provide access to four CPU cores. Unlike other platforms, CPU cores are not tied to memory allocations. Utilising all available CPU cores concurrently allows the algorithm to run 4x times as fast. Since serverless platforms charge based on execution time, reducing execution time also means reducing costs.

The worker threads version also costs 75% less than the single-threaded version!

Conclusion

Node.js v12 was released in April 2019. This version included support for Worker Threads, that were enabled by default (rather than needing an optional runtime flag). Using multiple CPU cores in Node.js applications has never been easier!

Node.js applications with CPU-intensive workloads can utilise this feature to reduce execution time. Since serverless platforms charge based upon execution time, this is especially useful for Node.js serverless functions. Utilising multiple CPU cores leads, not only to improved performance, but also lower bills.

PRs have been opened to enable Node.js v12 as a built-in runtime to the Apache OpenWhisk project. This Docker image for the new runtime version is already available on Docker Hub. This means it can be used with any Apache OpenWhisk instance straight away!

Playing with Worker Threads on IBM Cloud Functions allowed me to demonstrate how to speed up performance for CPU-intensive workloads by utilising multiple cores concurrently. Using an example of prime number generation, calculating all primes up to ten million took ~35 seconds with a single thread and ~8 seconds with four threads. This represents a reduction in execution time and cost of 75%!

Comments