James Thomas

Notes on software.

Accessing Long-Running Apache OpenWhisk Actions Results

Apache OpenWhisk actions are invoked by sending HTTP POST requests to the platform API. Invocation requests have two different modes: blocking and non-blocking.

Blocking invocations mean the platform won’t send the HTTP response until the action finishes. This allows it to include the action result in the response. Blocking invocations are used when you want to invoke an action and wait for the result.

1
2
3
4
5
6
7
8
9
10
11
12
$ wsk action invoke my_action --blocking
ok: invoked /_/my_action with id db70ef682fae4f8fb0ef682fae2f8fd5
{
    "activationId": "db70ef682fae4f8fb0ef682fae2f8fd5",
    ...
    "response": {
        "result": { ... },
        "status": "success",
        "success": true
    },
    ...
}

Non-blocking invocations return as soon as the platform processes the invocation request. This is before the action has finished executing. HTTP responses from non-blocking invocations only include activation identifiers, as the action result is not available.

1
2
$ wsk action invoke my_action
ok: invoked /_/my_action with id d2728aaa75394411b28aaa7539341195

HTTP responses from a blocking invocation will only wait for a limited amount of time before returning. This defaults to 65 seconds in the platform configuration file. If an action invocation has not finished before this timeout limit, a HTTP 5xx status response is returned.

Hmmm… πŸ€”

“So, how can you invoke an action and wait for the result when actions take longer than this limit?”

This question comes up regularly from developers building applications using the platform. I’ve decided to turn my answer into a blog post to help others struggling with this issue (after answering this question again this week 😎).

solution

  • Invoke the action using a non-blocking invocation.
  • Use the returned activation identifier to poll the activation result API.
  • The HTTP response for the activation result will return a HTTP 404 response until the action finishes.

When polling for activation results from non-blocking invocations, you should enforce a limit on the maximum polling time allowed. This is because HTTP 404s can be returned due to other scenarios (e.g. invalid activation identifiers). Enforcing a time limit ensures that, in the event of issues in the application code or the platform, the polling loop with eventually stop!

Setting the maximum polling time to the action timeout limit (plus a small offset) is a good approach.

An action cannot run for longer than its timeout limit. If the activation record is not available after this duration has elapsed (plus a small offset to handle internal platform delays), something has gone wrong. Continuing to poll after this point runs the risk of turning the polling operation into an infinite loop…

example code

This example provides an implementation of this approach for Node.js using the JavaScript Client SDK.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
"use strict";

const openwhisk = require('openwhisk')

const options = { apihost: <API_HOST>, api_key: <API_KEY> }
const ow = openwhisk(options)

// action duration limit (+ small offset)
const timeout_ms = 85000
// delay between polling requests
const polling_delay = 1000
// action to invoke
const action = 'delay'

const now = () => (new Date().getTime())
const max_polling_time = now() + timeout_ms

const delay = async ms => new Promise(resolve => setTimeout(resolve, ms))

const activation = await ow.actions.invoke({name: action})
console.log(`new activation id: ${activation.activationId}`)

let result = null

do {
  try {
    result = await ow.activations.get({ name: activation.activationId })
    console.log(`activation result (${activation.activationId}) now available!`)
  } catch (err) {
    if (err.statusCode !== 404) {
      throw err
    }
    console.log(`activation result (${activation.activationId}) not available yet`)
  }

  await delay(polling_delay)
} while (!result && now() < max_polling_time)

console.log(`activation result (${activation.activationId})`, result)

testing it out

Here is the source code for an action which will not return until 70 seconds have passed. Blocking invocations firing this action will result in a HTTP timeout before the response is returned.

1
2
3
4
5
const delay = async ms => new Promise(resolve => setTimeout(resolve, ms))

function main() {
  return delay(70*1000)
}

Using the script above, the action result will be retrieved from a non-blocking invocation.

  • Create an action from the source file in the example above.
1
wsk action create delay delay.js --timeout 80000 --kind nodejs:10
  • Run the Node.js script to invoke this action and poll for the activation result.
1
node script.js

If the script runs correctly, log messages will display the polling status and then the activation result.

1
2
3
4
5
6
7
8
$ node script.js
new activation id: d4efc4641b544320afc4641b54132066
activation result (d4efc4641b544320afc4641b54132066) not available yet
activation result (d4efc4641b544320afc4641b54132066) not available yet
activation result (d4efc4641b544320afc4641b54132066) not available yet
...
activation result (d4efc4641b544320afc4641b54132066) now available!
activation result (d4efc4641b544320afc4641b54132066) { ... }

Saving Money and Time With Node.js Worker Threads in Serverless Functions

Node.js v12 was released last month. This new version includes support for Worker Threads, that are enabled by default. Node.js Worker Threads make it simple to execute JavaScript code in parallel using threads. πŸ‘πŸ‘πŸ‘

This is useful for Node.js applications with CPU-intensive workloads. Using Worker Threads, JavaScript code can be executed code concurrently using multiple CPU cores. This reduces execution time compared to a non-Worker Threads version.

If serverless platforms provide Node.js v12 on multi-core environments, functions can use this feature to reduce execution time and, therefore, lower costs. Depending on the workload, functions can utilise all available CPU cores to parallelise work, rather than executing more functions concurrently. πŸ’°πŸ’°πŸ’°

In this blog post, I’ll explain how to use Worker Threads from a serverless function. I’ll be using IBM Cloud Functions (Apache OpenWhisk) as the example platform but this approach is applicable for any serverless platform with Node.js v12 support and a multi-core CPU runtime environment.

Node.js v12 in IBM Cloud Functions (Apache OpenWhisk)

This section of the blog post is specifically about using the new Node.js v12 runtime on IBM Cloud Functions (powered by Apache OpenWhisk). If you are using a different serverless platform, feel free to skip ahead to the next section…

I’ve recently been working on adding the Node.js v12 runtime to Apache OpenWhisk.

Apache OpenWhisk uses Docker containers as runtime environments for serverless functions. All runtime images are maintained in separate repositories for each supported language, e.g. Node.js, Java, Python, etc. Runtime images are automatically built and pushed to Docker Hub when the repository is updated.

node.js v12 runtime image

Here is the PR used to add the new Node.js v12 runtime image to Apache OpenWhisk. This led to the following runtime image being exported to Docker Hub: openwhisk/action-nodejs-v12.

Having this image available as a native runtime in Apache OpenWhisk requires upstream changes to the project’s runtime manifest. After this happens, developers will be able to use the --kind CLI flag to select this runtime version.

1
ibmcloud wsk action create action_name action.js --kind nodejs:12

IBM Cloud Functions is powered by Apache OpenWhisk. It will eventually pick up the upstream project changes to include this new runtime version. Until that happens, Docker support allows usage of this new runtime before it is built-in the platform.

1
ibmcloud wsk action create action_name action.js --docker openwhisk/action-nodejs-v12

example

This Apache OpenWhisk action returns the version of Node.js used in the runtime environment.

1
2
3
4
5
function main () {
  return {
    version: process.version
  }
}

Running this code on IBM Cloud Functions, using the Node.js v12 runtime image, allows us to confirm the new Node.js version is available.

1
2
3
4
5
6
$ ibmcloud wsk action create nodejs-v12 action.js --docker openwhisk/action-nodejs-v12
ok: created action nodejs-v12
$ ibmcloud wsk action invoke nodejs-v12 --result
{
    "version": "v12.1.0"
}

Worker Threads in Serverless Functions

This is a great introdution blog post to Workers Threads. It uses an example of generating prime numbers as the CPU intensive task to benchmark. Comparing the performance of the single-threaded version to multiple-threads - the performance is improved as a factor of the threads used (up to the number of CPU cores available).

This code can be ported to run in a serverless function. Running with different input values and thread counts will allow benchmarking of the performance improvement.

non-workers version

Here is the sample code for a serverless function to generate prime numbers. It does not use Worker Threads. It will run on the main event loop for the Node.js process. This means it will only utilise a single thread (and therefore single CPU core).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
'use strict';

const min = 2

function main(params) {
  const { start, end } = params
  console.log(params)
  const primes = []
  let isPrime = true;
  for (let i = start; i < end; i++) {
    for (let j = min; j < Math.sqrt(end); j++) {
      if (i !== j && i%j === 0) {
        isPrime = false;
        break;
      }
    }
    if (isPrime) {
      primes.push(i);
    }
    isPrime = true;
  }

  return { primes }
}

porting the code to use worker threads

Here is the prime number calculation code which uses Worker Threads. Dividing the total input range by the number of Worker Threads generates individual thread input values. Worker Threads are spawned and passed chunked input ranges. Threads calculate primes and then send the result back to the parent thread.

Reviewing the code to start converting it to a serverless function, I realised there were two issues running this code in serverless environment: worker thread initialisation and optimal worker thread counts.

How to initialise Worker Threads?

This is how the existing source code initialises the Worker Threads.

1
 threads.add(new Worker(__filename, { workerData: { start: myStart, range }}));

__filename is a special global variable in Node.js which contains the currently executing script file path.

This means the Worker Thread will be initialised with a copy of the currently executing script. Node.js provides a special variable to indicate whether the script is executing in the parent or child thread. This can be used to branch script logic.

So, what’s the issue with this?

In the Apache OpenWhisk Node.js runtime, action source files are dynamically imported into the runtime environment. The script used to start the Node.js runtime process is for the platform handler, not the action source files. This means the __filename variable does not point to the action source file.

This issue is fixed by separating the serverless function handler and worker thread code into separate files. Worker Threads can be started with a reference to the worker thread script source file, rather than the currently executing script name.

1
 threads.add(new Worker("./worker.js", { workerData: { start: myStart, range }}));

How Many Worker Threads?

The next issue to resolve is how many Worker Threads to use. In order to maximise parallel processing capacity, there should be a Worker Thread for each CPU core. This is the maximum number of threads that can run concurrently.

Node.js provides CPU information for the runtime environment using the os.cpus() function. The result is an array of objects (one per logical CPU core), with model information, processing speed and elapsed processing times. The length of this array will determine number of Worker Threads used. This ensures the number of Worker Threads will always match the CPU cores available.

1
const threadCount = os.cpus().length

workers threads version

Here is the serverless version of the prime number generation algorithm which uses Worker Threads.

The code is split over two files - primes-with-workers.js and worker.js.

primes-with-workers.js

This file contains the serverless function handler used by the platform. Input ranges (based on the min and max action parameters) are divided into chunks, based upon the number of Worker Threads. The handler function creates a Worker Thread for each chunk and waits for the message with the result. Once all the results have been retrieved, it returns all those primes numbers as the invocation result.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
'use strict';

const { Worker } = require('worker_threads');
const os = require('os')
const threadCount = os.cpus().length

const compute_primes = async (start, range) => {
  return new Promise((resolve, reject) => {
    let primes = []
    console.log(`adding worker (${start} => ${start + range})`)
    const worker = new Worker('./worker.js', { workerData: { start, range }})

    worker.on('error', reject)
    worker.on('exit', () => resolve(primes))
    worker.on('message', msg => {
      primes = primes.concat(msg)
    })
  })
}

async function main(params) {
  const { min, max } = params
  const range = Math.ceil((max - min) / threadCount)
  let start = min < 2 ? 2 : min
  const workers = []

  console.log(`Calculating primes with ${threadCount} threads...`);

  for (let i = 0; i < threadCount - 1; i++) {
    const myStart = start
    workers.push(compute_primes(myStart, range))
    start += range
  }

  workers.push(compute_primes(start, max - start))

  const primes = await Promise.all(workers)
  return { primes: primes.flat() }
}

exports.main = main

workers.js

This is the script used in the Worker Thread. The workerData value is used to receive number ranges to search for prime numbers. Primes numbers are sent back to the parent thread using the postMessage function. Since this script is only used in the Worker Thread, it does need to use the isMainThread value to check if it is a child or parent process.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
'use strict';
const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');

const min = 2

function generatePrimes(start, range) {
  const primes = []
  let isPrime = true;
  let end = start + range;
  for (let i = start; i < end; i++) {
    for (let j = min; j < Math.sqrt(end); j++) {
      if (i !== j && i%j === 0) {
        isPrime = false;
        break;
      }
    }
    if (isPrime) {
      primes.push(i);
    }
    isPrime = true;
  }

  return primes
}

const primes = generatePrimes(workerData.start, workerData.range);
parentPort.postMessage(primes)

package.json

Source files deployed from a zip file also need to include a package.json file in the archive. The main property is used to determine the script to import as the exported package module.

1
2
3
4
5
{
  "name": "worker_threads",
  "version": "1.0.0",
  "main": "primes-with-workers.js",
}

Performance Comparison

Running both functions with the same input parameters allows execution time comparison. The Worker Threads version should improve performance by a factor proportional to available CPU cores. Reducing execution time also means reduced costs in a serverless platform.

non-workers performance

Creating a new serverless function (primes) from the non-worker threads source code, using the Node.js v12 runtime, I can test with small values to check correctness.

1
2
3
4
5
6
$ ibmcloud wsk action create primes primes.js --docker openwhisk/action-nodejs-v12
ok: created action primes
$ ibmcloud wsk action invoke primes --result -p start 2 -p end 10
{
    "primes": [ 2, 3, 5, 7 ]
}

Playing with sample input values, 10,000,000 seems like a useful benchmark value. This takes long enough with the single-threaded version to benefit from parallelism.

1
2
3
4
5
$ time ibmcloud wsk action invoke primes --result -p start 2 -p end 10000000 > /dev/null

real  0m35.151s
user  0m0.840s
sys   0m0.315s

Using the simple single-threaded algorithm it takes the serverless function around ~35 seconds to calculate primes up to ten million.

workers threads performance

Creating a new serverless function, from the worker threads-based source code using the Node.js v12 runtime, allows me to verify it works as expected for small input values.

1
2
3
4
5
6
$ ibmcloud wsk action create primes-workers action.zip --docker openwhisk/action-nodejs-v12
ok: created action primes-workers
$ ibmcloud wsk action invoke primes-workers --result -p min 2 -p max 10
{
    "primes": [ 2, 3, 5, 7 ]
}

Hurrah, it works.

Invoking the function with an max parameter of 10,000,000 allows us to benchmark against the non-workers version of the code.

1
2
3
4
5
$ time ibmcloud wsk action invoke primes-workers --result -p min 2 -p max 10000000 --result > /dev/null

real  0m8.863s
user  0m0.804s
sys   0m0.302s

The workers versions only takes ~25% of the time of the single-threaded version!

This is because IBM Cloud Functions’ runtime environments provide access to four CPU cores. Unlike other platforms, CPU cores are not tied to memory allocations. Utilising all available CPU cores concurrently allows the algorithm to run 4x times as fast. Since serverless platforms charge based on execution time, reducing execution time also means reducing costs.

The worker threads version also costs 75% less than the single-threaded version!

Conclusion

Node.js v12 was released in April 2019. This version included support for Worker Threads, that were enabled by default (rather than needing an optional runtime flag). Using multiple CPU cores in Node.js applications has never been easier!

Node.js applications with CPU-intensive workloads can utilise this feature to reduce execution time. Since serverless platforms charge based upon execution time, this is especially useful for Node.js serverless functions. Utilising multiple CPU cores leads, not only to improved performance, but also lower bills.

PRs have been opened to enable Node.js v12 as a built-in runtime to the Apache OpenWhisk project. This Docker image for the new runtime version is already available on Docker Hub. This means it can be used with any Apache OpenWhisk instance straight away!

Playing with Worker Threads on IBM Cloud Functions allowed me to demonstrate how to speed up performance for CPU-intensive workloads by utilising multiple cores concurrently. Using an example of prime number generation, calculating all primes up to ten million took ~35 seconds with a single thread and ~8 seconds with four threads. This represents a reduction in execution time and cost of 75%!

Apache OpenWhisk Web Action HTTP Proxy

What if you could take an existing web application and run it on a serverless platform with no changes? πŸ€”

Lots of existing (simple) stateless web applications are perfect candidates for serverless, but use web frameworks that don’t know how to integrate with those platforms. People have started to develop a number of custom plugins for those frameworks to try and bridge this gap.

These plugins can provide an easier learning curve for developers new to serverless. They can still use familiar web application frameworks whilst learning about the platforms. It also provides a path to “lift and shift” existing (simple) web applications to serverless platforms.

This approach relies on custom framework plugins being available, for every web app framework and serverless platform, which is not currently the case. Is there a better solution?

Recently, I’ve been experimenting with Apache OpenWhisk’s Docker support to prototype a different approach. This solution allows any web application to run on the platform, without needing bespoke framework plugins, with minimal changes. Sounds interesting? Read about how I did this below… πŸ‘

Apache OpenWhisk Web Action HTTP Proxy

This project provides a static binary which proxies HTTP traffic from Apache OpenWhisk Web Actions to existing web applications. HTTP events received by the Web Action Proxy are forwarded as HTTP requests to the web application. HTTP responses from the web application are returned as Web Action responses.

Apache OpenWhisk Web Action HTTP Proxy

Both the proxy and web application needed to be started inside the serverless runtime environment. The proxy uses port 8080 and the web application can use any other port. An environment variable or action parameter can be used to configure the local port to proxy.

Running both HTTP processes on the platform is possible due to custom runtime support in Apache OpenWhisk. This allows using custom Docker images as the runtime environment. Custom runtimes images can be built which include the proxy binary and (optionally) the web application source files.

Two different options are available for getting web application source files into the runtime environment.

  • Build source files directly into the container image alongside proxy binary.
  • Dynamically inject source files into container runtime during initialisation.

Building source files into the container is simpler and incurs lower cold-starts delays, but means source code will be publicly available on Docker Hub. Injecting source files through action zips means the public container image can exclude all private source files and secrets. The extra initialisation time for dynamic injection does increase cold-start delays.

Please note: This is an alpha-stage experiment! Don’t expect everything to work. This project is designed to run small simple stateless web applications on Apache OpenWhisk. Please don’t attempt to “lift ‘n’ shift” a huge stateful enterprise app server onto the platform!

Node.js + Express Example

This is an example Node.js web application, built using the Express web application framework:

https://camo.githubusercontent.com/2aa43809d8d8a9f9ccb906c1028d81f1ba1913d9/687474703a2f2f7368617065736865642e636f6d2f696d616765732f61727469636c65732f657870726573735f6578616d706c652e6a7067

The web application renders static HTML content for three routes (/, /about and /contact). CSS files and fonts are also served by the backend.

Use these steps to run this web application on Apache OpenWhisk using the Web Action Proxy…

  • Clone project repo.
1
git clone https://github.com/jthomas/express_example
  • Install project dependencies in the express_example directory.
1
npm install
  • Bundle web application and libraries into zip file.
1
zip -r action.zip *
  • Create the Web Action (using a custom runtime image) with the following command.
1
wsk action create --docker jamesthomas/generic_node_proxy --web true --main "npm start" -p "__ow_proxy_port" 3000 web_app action.zip
  • Retrieve the Web Action URL for the new action.
1
wsk action get web_app --url
  • Open the Web Action URL in a HTTP web browser. (Note: Web Action URLs must end with a forward-slash to work correctly, e.g. https://<OW_HOST>/api/v1/web/<NAMESPACE>/default/web_app/).

Web Action Proxy Express JS

If this works, the web application should load as above. Clicking links in the menu will navigate to different pages in the application.

custom runtime image

This example Web Action uses my own pre-built custom runtime image for Node.js web applications (jamesthomas/generic_node_proxy). This was created from the following Dockerfile to support dynamic runtime injection of web application source files.

1
2
3
4
5
6
7
FROM node:10

ADD proxy /app/
WORKDIR /app
EXPOSE 8080

CMD ./proxy

More Examples

See the examples directory in the project repository for sample applications with build instructions for the following runtimes.

Usage & Configuration

Web application source files can be either be dynamically injected (as in the example above) or built into the custom runtime image.

Dynamic injection uses a custom runtime image with just the proxy binary and runtime dependencies. Web application source files are provided in the action zip file and extracted into the runtime upon initialisation. The proxy will start the app server during cold-starts.

Alternatively, source files for the web application can be included directly in the runtime image. The container start command will start both processes concurrently. No additional files are provided when creating the web action.

Configuration for values such as the proxy port, can be provided using environment variables or default action parameters.

Please see the project documentation for more details on both these approaches, how to use them and configuration parameters.

Challenges

This experiment is still in the alpha-stage and comes with many restrictions at the moment…

  • HTTP request and responses sizes are limited to the maximum sizes allowed by Apache OpenWhisk for input parameters and activation results. This defaults to 1MB in the open-source project and 5MB on IBM Cloud Functions.
  • Page links must use URLs with relative paths to the Web Action URL rather than the host root, e.g. href="home" rather than href="/home". This is due to the Web Actions being served from a sub-path of the platform (/api/v1/web/<NAMESPACE>/default/<ACTION>) rather than the host root.
  • Docker images will be pulled from the public registry on the first invocation. This will lead to long cold-start times for the first request after the action has been created. Large image sizes = longer delays. This only occurs on the first invocation.
  • Web app startup times affect cold start times. The proxy blocks waiting for the web application to start before responding. This delay is included in each cold start. Concurrent HTTP requests from a web browser for static page assets will (initially) result in multiple cold starts.
  • Web Sockets and other complex HTTP features, e.g. server-side events, cannot be supported.
  • Web applications will run in ephemeral container environments that are paused between requests and destroyed without warning. This is not a traditional web application environment, e.g. running background tasks will not work.

Lots of things haven’t been tested. Don’t expect complex stateful web applications to work.

Conclusion

Being able to run existing web applications on serverless platforms opens up a huge opportunity for moving simple (and stateless) web application over to those platforms. These applications can then benefit from the scaling, cost and operational benefits serverless platforms provide.

Previous attempts to support traditional web applications on serverless platforms relied on custom framework plugins. This approach was limited by the availability of custom plugins for each web application framework and serverless platform.

Playing around with Apache OpenWhisk’s custom runtime support, I had an idea… could a generic HTTP proxy be used to support any framework without needing any plugins? This led to the Apache OpenWhisk Web Action HTTP Proxy project.

By building a custom runtime, the HTTP proxy and web application can both be started within the same serverless environment. HTTP events received by the Web Action Proxy are forwarded as HTTP requests to the web application. HTTP responses from the web application are returned as Web Action responses.

Web application sources files can be injected into the runtime environment during initialisation or built straight into the custom runtime image. No significant changes are required in the web application and it does not need custom framework plugins.

Apache OpenWhisk’s support for custom Docker runtimes opens up a huge range of opportunities for running more varied workloads on serverless platforms - and this is a great example of that!

Serverless CI/CD With Travis CI, Serverless Framework and IBM Cloud Functions

How do you set up a CI/CD pipeline for serverless applications?

This blog post will explain how to use Travis CI, The Serverless Framework and the AVA testing framework to set up a fully-automated build, deploy and test pipeline for a serverless application. It will use a real example of a production serverless application, built using Apache OpenWhisk and running on IBM Cloud Functions. The CI/CD pipeline will execute the following tasks…

  • Run project unit tests.
  • Deploy application to test environment.
  • Run acceptance tests against test environment.
  • Deploy application to production environment.
  • Run smoke tests against production environment.

Before diving into the details of the CI/CD pipeline setup, let’s start by showing the example serverless application being used for this project…

Serverless Project - http://apache.jamesthom.as/

The ”Apache OpenWhisk Release Verification” project is a serverless web application to help committers verify release candidates for the open-source project. It automates running the verification steps from the ASF release checklist using serverless functions. Automating release candidate validation makes it easier for committers to participate in release voting.

Apache OpenWhisk Release Verification Tool

The project consists of a static web assets (HTML, JS, CSS files) and HTTP APIs. Static web assets are hosted by Github Pages from the project repository. HTTP APIs are implemented as Apache OpenWhisk actions and exposed using the API Gateway service. IBM Cloud Functions is used to host the Apache OpenWhisk application.

No other cloud services, like databases, are needed by the backend. Release candidate information is retrieved in real-time by parsing the HTML page from the ASF website.

Serverless Architecture

Configuration

The Serverless Framework (with the Apache OpenWhisk provider plugin) is used to define the serverless functions used in the application. HTTP endpoints are also defined in the YAML configuration file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
service: release-verfication

provider:
  name: openwhisk
  runtime: nodejs:10

functions:
  versions:
    handler: index.versions
    events:
      - http: GET /api/versions
  version_files:
    handler: index.version_files
    events:
      - http:
          method: GET
          path: /api/versions/{version}
          resp: http
...

plugins:
  - serverless-openwhisk

The framework handles all deployment and configuration tasks for the application. Setting up the application in a new environment is as simple as running the serverless deploy command.

Environments

Apache OpenWhisk uses namespaces to group individual packages, actions, triggers and rules. Different namespaces can be used to provide isolated environments for applications.

IBM Cloud Functions automatically creates user-based namespaces in platform instances. These auto-generated namespaces mirror the IBM Cloud organisation and space used to access the instance. Creating new spaces within an organisation will provision extra namespaces.

I’m using a custom organisation for the application with three different spaces: dev, test and prod.

dev is used as a test environment to deploy functions during development. test is used by the CI/CD pipeline to deploy a temporary instance of the application during acceptance tests. prod is the production environment hosting the external application actions.

Credentials

The IBM Cloud CLI is used to handle IBM Cloud Functions credentials. Platform API keys will be used to log in the CLI from the CI/CD system.

When Cloud Functions CLI commands are issued (after targeting a new region, organisation or space), API keys for that Cloud Functions instance are automatically retrieved and stored locally. The Serverless Framework knows how to use these local credentials when interacting with the platform.

High Availability?

The Apache OpenWhisk Release Verifier is not a critical cloud application which needs ”five nines” of availability. The application is idle most of the time. It does not need a highly available serverless architecture. This means the build pipeline does not have to…

New deployments will simply overwrite resources in the production namespace in a single region. If the production site is broken after a deployment, the smoke tests should catch this and email me to fix it!

Testing

Given this tool will be used to check release candidates for the open-source project, I wanted to ensure it worked properly! Incorrect validation results could lead to invalid source archives being published.

I’ve chosen to rely heavily on unit tests to check the core business logic. These tests ensure all validation tasks work correctly, including PGP signature verification, cryptographic hash matching, LICENSE file contents and other ASF requirements for project releases.

Additionally, I’ve used end-to-end acceptance tests to validate the HTTP APIs work as expected. HTTP requests are sent to the API GW endpoints, with responses compared against expected values. All available release candidates are run through the validation process to check no errors are returned.

Unit Tests

Unit tests are implemented with the AVA testing framework. Unit tests live in the unit/test/ folder.

The npm test command alias runs the ava test/unit/ command to execute all unit tests. This command can be executed locally, during development, or from the CI/CD pipeline.

1
2
3
4
5
6
$ npm test

> release-verification@1.0.0 test ~/code/release-verification
> ava test/unit/

 27 tests passed

Acceptance Tests

Acceptance tests check API endpoints return the expected responses for valid (and invalid) requests. Acceptance tests are executed against the API Gateway endpoints for an application instance.

The hostname used for HTTP requests is controlled using an environment variable (HOST). Since the same test suite test is used for acceptance and smoke tests, setting this environment variable is the only configuration needed to run tests against different environments.

API endpoints in the test and production environments are exposed using different custom sub-domains (apache-api.jamesthom.as and apache-api-test.jamesthom.as). NPM scripts are used to provide commands (acceptance-test & acceptance-prod) which set the environment hostname before running the test suite.

1
2
3
4
"scripts": {
    "acceptance-test": "HOST=apache-api-test.jamesthom.as ava -v --fail-fast test/acceptance/",
    "acceptance-prod": "HOST=apache-api.jamesthom.as ava -v --fail-fast test/acceptance/"
  },
1
2
3
4
5
6
7
8
9
10
11
12
$ npm run acceptance-prod

> release-verification@1.0.0 acceptance-prod ~/code/release-verification
> HOST=apache-api.jamesthom.as ava -v --fail-fast  test/acceptance/

  βœ” should return list of release candidates (3.7s)
    β„Ή running api testing against https://apache-api.jamesthom.as/api/versions
  βœ” should return 404 for file list when release candidate is invalid (2.1s)
    β„Ή running api testing against https://apache-api.jamesthom.as/api/versions/unknown
  ...

  6 tests passed

Acceptance tests are also implemented with the AVA testing framework. All acceptance tests live in a single test file (unit/acceptance/api.js).

CI/CD Pipeline

When new commits are pushed to the master branch on the project repository, the following steps needed to be kicked off by the build pipeline…

  • Run project unit tests.
  • Deploy application to test environment.
  • Run acceptance tests against test environment.
  • Deploy application to production environment.
  • Run smoke tests against production environment.

If any of the steps fail, the build pipeline should stop and send me a notification email.

Travis

Travis CI is used to implement the CI/CD build pipeline. Travis CI uses a custom file (.travis.yml) in the project repository to configure the build pipeline. This YAML file defines commands to execute during each phase of build pipeline. If any of the commands fail, the build will stop at that phase without proceeding.

Here is the completed .travis.yml file for this project: https://github.com/jthomas/openwhisk-release-verification/blob/master/.travis.yml

I’m using the following Travis CI build phases to implement the pipeline: install, before_script, script, before_deploy and deploy. Commands will run in the Node.js 10 build environment, which pre-installs the language runtime and package manager.

1
2
3
language: node_js
node_js:
  - "10"

install

In the install phase, I need to set up the build environment to deploy the application and run tests.

This means installing the IBM Cloud CLI, Cloud Functions CLI plugin, The Serverless Framework (with Apache OpenWhisk plugin), application test framework (AvaJS) and other project dependencies.

The IBM Cloud CLI is installed using a shell script. Running a CLI sub-command installs the Cloud Functions plugin.

The Serverless Framework is installed as global NPM package (using npm -g install). The Apache OpenWhisk provider plugin is handled as normal project dependency, along with the test framework. Both those dependencies are installed using NPM.

1
2
3
4
5
install:
  - curl -fsSL https://clis.cloud.ibm.com/install/linux | sh
  - ibmcloud plugin install cloud-functions
  - npm install serverless -g
  - npm install

before_script

This phase is used to run unit tests, catching errors in core business logic, before setting up credentials (used in the script phase) for the acceptance test environment. Unit test failures will halt the build immediately, skipping test and production deployments.

Custom variables provide the API key, platform endpoint, organisation and space identifiers which are used for the test environment. The CLI is authenticated using these values, before running the ibmcloud fn api list command. This ensures Cloud Functions credentials are available locally, as used by The Serverless Framework.

1
2
3
4
5
6
before_script:
  - npm test
  - ibmcloud login --apikey $IBMCLOUD_API_KEY -a $IBMCLOUD_API_ENDPOINT
  - ibmcloud target -o $IBMCLOUD_ORG -s $IBMCLOUD_TEST_SPACE
  - ibmcloud fn api list > /dev/null
  - ibmcloud target

script

With the build system configured, the application can be deployed to test environment, followed by running acceptance tests. If either deployment or acceptance tests fail, the build will stop, skipping the production deployment.

Acceptance tests use an environment variable to configure the hostname test cases are executed against. The npm run acceptance-test alias command sets this value to the test environment hostname (apache-api-test.jamesthom.as) before running the test suite.

1
2
3
script:
  - sls deploy
  - npm run acceptance-test

before_deploy

Before deploying to production, Cloud Functions credentials need to be updated. The IBM Cloud CLI is used to target the production environment, before running a Cloud Functions CLI command. This updates local credentials with the production environment credentials.

1
2
3
4
before_deploy:
  - ibmcloud target -s $IBMCLOUD_PROD_SPACE
  - ibmcloud fn api list > /dev/null
  - ibmcloud target

deploy

If all the proceeding stages have successfully finished, the application can be deployed to the production. Following this final deployment, smoke tests are used to check production APIs still work as expected.

Smoke tests are just the same acceptance tests executed against the production environment. The npm run acceptance-prod alias command sets the hostname configuration value to the production environment (apache-api.jamesthom.as) before running the test suite.

1
2
3
4
deploy:
  provider: script
  script: sls deploy && npm run acceptance-prod
  skip_cleanup: true

Using the skip_cleanup parameter leaves installed artifacts from previous phases in the build environment. This means we don’t have to re-install the IBM Cloud CLI, The Serverless Framework or NPM dependencies needed to run the production deployment and smoke tests.

success?

If all of the build phases are successful, the latest project code should have been deployed to the production environment. πŸ’―πŸ’―πŸ’―

Build Screenshoot

If the build failed due to unit test failures, the test suite can be ran locally to fix any errors. Deployment failures can be investigated using the console output logs from Travis CI. Acceptance test issues, against test or production environments, can be debugged by logging into those environments locally and running the test suite from my development machine.

Conclusion

Using Travis CI with The Serverless Framework and a JavaScript testing framework, I was able to set up a fully-automated CI/CD deployment pipeline for the Apache OpenWhisk release candidate verification tool.

Using a CI/CD pipeline, rather than a manual approach, for deployments has the following advantages…

  • No more manual and error-prone deploys relying on a human πŸ‘¨β€πŸ’» :)
  • Automatic unit & acceptance test execution catch errors before deployments.
  • Production environment only accessed by CI/CD system, reducing accidental breakages.
  • All cloud resources must be configured in code. No ”snowflake” environments allowed.

Having finished code for new project features or bug fixes, all I have to do is push changes to the GitHub repository. This fires the Travis CI build pipeline which will automatically deploy the updated application to the production environment. If there are any issues, due to failed tests or deployments, I’ll be notified by email.

This allows me to get back to adding new features to the tool (and fixing bugs) rather than wrestling with deployments, managing credentials for multiple environments and then trying to remember to run tests against the correct instances!

Automating Apache OpenWhisk Releases With Serverless

This blog post explains how I used serverless functions to automate release candidate verification for the Apache OpenWhisk project.

Apache OpenWhisk Release Verification Tool

Automating this process has the following benefits…

  • Removes the chance of human errors compared to the previously manual validation process.
  • Allows me to validate new releases without access to my dev machine.
  • Usable by all committers by hosting as an external serverless web app.

Automating release candidate validation makes it easier for project committers to participate in release voting. This should make it faster to get necessary release votes, allowing us to ship new versions sooner!

background

apache software foundation

The Apache Software Foundation has a well-established release process for delivering new product releases from projects belonging to the foundation. According to their documentation

An Apache release is a set of valid & signed artifacts, voted on by the appropriate PMC and distributed on the ASF’s official release infrastructure.

https://www.apache.org/dev/release-publishing.html

Releasing a new software version requires the release manager to create a release candidate from the project source files. Source archives must be cryptographically signed by the release manager. All source archives for the release must be comply with strict criteria to be considered valid release candidates. This includes (but is not limited to) the following requirements:

  • Checksums and PGP signatures for source archives are valid.
  • LICENSE, NOTICE and DISCLAIMER files included and correct.
  • All source files have license headers.
  • No compiled archives bundled in source archives.

Release candidates can then be proposed on the project mailing list for review by members of the Project Management Committee (PMC). PMC members are eligible to vote on all release candidates. Before casting their votes, PMC members are required to check release candidate meets the requirements above.

If a minimum of three positive votes is cast (with more positive than negative votes), the release passes! The release manager can then move the release candidate archives to the release directory.

apache openwhisk releases

As a committer and PMC member on the Apache OpenWhisk project, I’m eligible to vote on new releases.

Apache OpenWhisk (currently) has 52 separate source repositories under the project on GitHub. With a fast-moving open-source project, new releases candidate are constantly being proposed, which all require the necessary number of binding PMC votes to pass.

Manually validating release candidates can be a time-consuming process. This can make it challenging to get a quorum of binding votes from PMC members for the release to pass. I started thinking how I could improve my productivity around the validation process, enabling me to participate in more votes.

Would it be possible to automate some (or all) of the steps in release candidate verification? Could we even use a serverless application to do this?

apache openwhisk release verifier

Spoiler Alert: YES! I ended up building a serverless application to do this for me.

It is available at https://apache.jamesthom.as/

Apache OpenWhisk Release Verifier

Source code for this project is available here.

IBM Cloud Functions is used to run the serverless backend for the web application. This means Apache OpenWhisk is being used to validate future releases of itself… which is awesome.

architecture

Project Architecture

HTML, JS and CSS files are served by Github Pages from the project repository.

Backend APIs are Apache OpenWhisk actions running on IBM Cloud Functions.

Both the front-page and API are served from a custom sub-domains of my personal domain.

available release candidates

When the user loads the page, the drop-down list needs to contain the current list of release candidates from the ASF development distribution site.

This information is available to the web page via the https://apache-api.jamesthom.as/api/versions endpoint. The serverless function powering this API parses that live HTML page (extracting the current list of release candidates) each time it is invoked.

1
2
3
4
5
6
7
8
9
10
11
12
13
$ http get https://apache-api.jamesthom.as/api/versions
HTTP/1.1 200 OK
...
{
    "versions": [
        "apache-openwhisk-0.11.0-incubating-rc1",
        "apache-openwhisk-0.11.0-incubating-rc2",
        "apache-openwhisk-1.13.0-incubating-rc1",
        "apache-openwhisk-1.13.0-incubating-rc2",
        "apache-openwhisk-2.0.0-incubating-rc2",
        "apache-openwhisk-3.19.0-incubating-rc1"
    ]
}

release candidate version info

Release candidates may have multiple source archives being distributed in that release. Validation steps need to be executed for each of those archives within the release candidate.

Once a user has selected a release candidate version, source archives to validate are shown in the table. This data is available from the https://apache-api.jamesthom.as/api/versions/VERSION endpoint. This information is parsed from the HTML page on the ASF site.

1
2
3
4
5
6
7
8
9
10
11
$ http get https://apache-api.jamesthom.as/api/versions/apache-openwhisk-2.0.0-incubating-rc2
HTTP/1.1 200 OK
...

{
    "files": [
        "openwhisk-package-alarms-2.0.0-incubating-sources.tar.gz",
        "openwhisk-package-cloudant-2.0.0-incubating-sources.tar.gz",
        "openwhisk-package-kafka-2.0.0-incubating-sources.tar.gz"
    ]
}

release verification

Having selected a release candidate version, clicking the ”Validate” button will start validation process. Triggering the https://apache-api.jamesthom.as/api/versions/VERSION/validate endpoint will run the serverless function used to execute the validation steps.

This serverless function will carry out the following verification steps…

checking download links

All the source archives for a release candidate are downloaded to temporary storage in the runtime environment. The function also downloads the associated SHA512 and PGP signature files for comparison. Multiple readable streams can be created from the same file path to allow the verification steps to happen in parallel, rather than having to re-download the archive for each task.

checking SHA512 hash values

SHA512 sums are distributed in a text file containing hex strings with the hash value.

1
2
3
openwhisk-package-alarms-2.0.0-incubating-sources.tar.gz:
3BF87306 D424955B B1B2813C 204CC086 6D27FA11 075F0B30 75F67782 5A0198F8 091E7D07
 B7357A54 A72B2552 E9F8D097 50090E9F A0C7DBD1 D4424B05 B59EE44E

The serverless function needs to dynamically compute the hash for the source archive and compare the hex bytes against the text file contents. Node.js comes with a built-in crypto library making it easy to create hash values from input streams.

This is the function used to compute and compare the hash values.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
const hash = async (file_stream, hash_file, name) => {
  return new Promise((resolve, reject) => {
    const sha512 = parse_hash_from_file(hash_file)

    const hmac = crypto.createHash('sha512')
    file_stream.pipe(hmac)

    hmac.on('readable', () => {
      const stream_hash = hmac.read().toString('hex')
      const valid = stream_hash === sha512.signature
      logger.log(`file (${name}) calculated hash: ${stream_hash}`)
      logger.log(`file (${name}) hash from file:  ${sha512.signature}`)
      resolve({valid})
    })

    hmac.on('error', err => reject(err))
  })
}

validating PGP signatures

Node.js’ crypto library does not support validating PGP signatures.

I’ve used the OpenPGP.js library to handle this task. This is a Javascript implementation of the OpenPGP protocol (and the most popular PGP library for Node.js). Three input values are needed to validate PGP messages.

  • Message contents to check.
  • PGP signature for the message.
  • Public key for the private key used to sign the release.

The “message” to check is the source archive. PGP signatures come from the .asc files located in the release candidate directory.

1
2
3
4
5
6
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAABAgAGBQJcpO0FAAoJEHKvDMIsTPMgf0kP+wbtJ1ONZJQKjyDVx8uASMDQ
...
-----END PGP SIGNATURE-----

Public keys used to sign releases are stored in the root folder of the release directory for that project.

This function is used to implement the signature checking process.

1
2
3
4
5
6
7
8
9
10
11
12
13
const signature = async (file_stream, signature, public_keys, name) => {
  const options = {
    message: openpgp.message.fromBinary(file_stream),
    signature: await openpgp.signature.readArmored(signature),
    publicKeys: (await openpgp.key.readArmored(public_keys)).keys
  }

  const verified = await openpgp.verify(options)
  await openpgp.stream.readToEnd(verified.data)
  const valid = await verified.signatures[0].verified

  return { valid }
}

scanning archive files

Using the node-tar library, downloaded source archives are extracted into the local runtime to allow scanning of individual files.

LICENSE.txt, DISCLAIMER.txt and NOTICE.txt files are checked to ensure correctness. An external NPM library is used to check all files in the archive for binary contents. The code also scans for directory names that might contain third party libraries (node_modules or .gradle).

capturing validation logs

It is important to provide PMC members with verifiable logs on the validation steps performed. This allows them to sanity check the steps performed (including manual validation). This verification text can also be provided in the voting emails as evidence of release candidate validity.

Using a custom logging library, all debug logs sent to the console are recorded in the action result (and therefore returned in the API response).

showing results

Once all the validation tasks have been executed - the results are returned to the front-end as a JSON response. The client-side JS parses these results and updates the validation table. Validation logs are shown in a collapsible window.

Verification Results

Using visual emojis for pass and failure indicators for each step - the user can easily verify whether a release passes the validation checks. If any of the steps have failed, the validation logs provide an opportunity to understand why.

Verification Logs

other tools

This is not the only tool that can automate checks needed to validate Apache Software Foundation releases.

Another community member has also built a bash script (rcverify.sh) that can verify releases on your local machine. This script will automatically download the release candidate files and run many of the same validation tasks as the remote tool locally.

There is also an existing tool (Apache Rat) from another project that provides a Java-based application for auditing license headers in source files.

conclusion

Getting new product releases published for an open-source project under the ASF is not a simple task for developers used to pushing a button on Github! The ASF has a series of strict guidelines on what constitutes a release and the ratification process from PMC members. PMC members need to run a series of manual verification tasks before casting binding votes on proposed release candidates.

This can be a time-consuming task for PMC members on a project like Apache OpenWhisk, with 52 different project repositories all being released at different intervals. In an effort to improve my own productivity around this process, I started looking for ways to automate the verification tasks. This would enable me to participate in more votes and be a “better” PMC member.

This led to building a serverless web application to run all the verification tasks remotely, which is now hosted at https://apache.jamesthom.as. This tool uses Apache OpenWhisk (provided by IBM Cloud Functions), which means the project is being used to verify future releases of itself! I’ve also open-sourced the code to provide an example of how to use the platform for automating tasks like this.

With this tool and others listed above, verifying new Apache OpenWhisk releases has never been easier!

OpenWhisk Web Action Errors With Sequences

This week, I came across an interesting problem when building HTTP APIs on IBM Cloud Functions.

How can Apache OpenWhisk Web Actions, implemented using action sequences, handle application errors that need the sequence to stop processing and a custom HTTP response to be returned?

This came from wanting to add custom HTTP authentication to existing Web Actions. I had decided to enhance existing Web Actions with authentication using action sequences. This would combine a new action for authentication validation with the existing API route handlers.

When the HTTP authentication is valid, the authentication action becomes a ”no-op”, which passes along the HTTP request to the route handler action to process as normal.

But what happens when authentication fails?

The authentication action needs to stop request processing and return a HTTP 401 response immediately.

Does Apache OpenWhisk even support this?

Fortunately, it does (phew) and I eventually worked out how to do this (based on a combination of re-reading documentation, the platform source code and just trying stuff out!).

Before explaining how to return custom HTTP responses using web action errors in sequences, let’s review web actions, actions sequences and why developers often use them together…

Web Actions

Web Actions are OpenWhisk actions that can be invoked using external HTTP requests.

Incoming HTTP requests are provided as event parameters. HTTP responses are controlled using attributes (statusCode, body, headers) in the action result.

Web Actions can be invoked directly, using the platform API, or connected to API Gateway endpoints.

example

Here is an example Web Action that returns a static HTML page.

1
2
3
4
5
6
7
8
9
function main() {
  return {
    headers: {
      'Content-Type': 'text/html'
    },
    statusCode: 200,
    body: '<html><body><h3>hello</h3></body></html>'
  }
}

exposing web actions

Web actions can be exported from any existing action by setting an annotation.

This is handled automatically by CLI using the β€”web configuration flag when creating or updating actions.

1
wsk action create ACTION_NAME ACTION_CODE --web true

Action Sequences

Multiple actions can be composed together into a “meta-action” using sequences.

Sequence configuration defines a series of existing actions to be called sequentially upon invocation. Actions connected in sequences can use different runtimes and even be sequences themselves.

1
wsk action create mySequence --sequence action_a,action_b,action_c

Input events are passed to the first action in the sequence. Action results from each action in the sequence are passed to the next action in the sequence. The response from the last action in the sequence is returned as the action result.

example

Here is a sequence (mySequence) composed of three actions (action_a, action_b, action_c).

1
wsk action create mySequence --sequence action_a,action_b,action_c

Invoking mySequence will invoke action_a with the input parameters. action_b will be invoked with the result from action_a. action_c will be invoked with the result from action_b. The result returned by action_c will be returned as the sequence result.

Web Actions from Action Sequences

Using Action Sequences as Web Actions is a useful pattern for externalising common HTTP request and response processing tasks into separate serverless functions.

These common actions can be included in multiple Web Actions, rather than manually duplicating the same boilerplate code in each HTTP route action. This is similar to the ”middleware” pattern used by lots of common web application frameworks.

Web Actions using this approach are easier to test, maintain and allows API handlers to implement core business logic rather than lots of duplicate boilerplate code.

authentication example

In my application, new authenticated web actions were composed of two actions (check_auth and the API route handler, e.g. route_handler).

Here is an outline of the check_auth function in Node.js.

1
2
3
4
5
6
7
8
9
10
11
const check_auth = (params) => {
  const headers = params.__ow_headers
  const auth = headers['authorization']

  if (!is_auth_valid(auth)) {
    // stop sequence processing and return HTTP 401?
  }

  // ...else pass along request to next sequence action
  return params
}

The check_auth function will inspect the HTTP request and validate the authorisation token. If the token is valid, the function returns the input parameters untouched, which leads the platform the invoke the route_handler to generate the HTTP response from the API route.

But what happens if the authentication is invalid?

The check_auth action needs to return a HTTP 401 response immediately, rather than proceeding to the route_handler action.

handling errors - synchronous results

Sequence actions can stop sequence processing by returning an error. Action errors are indicated by action results which include an “error” property or return rejected promises (for asynchronous results). Upon detecting an error, the platform will return the error result as the sequence action response.

If check_auth returns an error upon authentication failures, sequence processing can be halted, but how to control the HTTP response?

Error responses can also control the HTTP response, using the same properties (statusCode, headers and body) as a successful invocation result, with one difference: those properties must be the children of the error property rather than top-level properties.

This example shows the error result needed to generate an immediate HTTP 401 response.

1
2
3
4
5
6
{
   "error": {
      "statusCode": 401,
      "body": "Authentication credentials are invalid."
    }
}

In Node.js, this can be returned using a synchronous result as shown here.

1
2
3
4
5
6
7
8
9
10
11
const check_auth = (params) => {
  const headers = params.__ow_headers
  const auth = headers['authorization']

  if (!is_auth_valid(auth)) {
    const response = { statusCode: 401, body: "Authentication credentials are invalid." }
    return { error: response }
  }

  return params
}

handling errors - using promises

If a rejected Promise is used to return an error from an asynchronous operation, the promise result needs to contain the HTTP response properties as top-level properties, rather than under an error parent. This is because the Node.js runtime automatically serialises the promise value to an error property on the activation result.

1
2
3
4
5
6
7
8
9
10
11
const check_auth = (params) => {
  const headers = params.__ow_headers
  const auth = headers['authorization']

  if (!is_auth_valid(auth)) {
    const response = { statusCode: 401, body: "Authentication credentials are invalid." }
    return Promise.reject(response)
  }

  return params
}

conclusion

Creating web actions from sequences is a novel way to implement the “HTTP middleware” pattern on serverless platforms. Surrounding route handlers with pre-HTTP request modifier actions for common tasks, allows route handlers to remove boilerplate code and focus on the core business logic.

In my application, I wanted to use this pattern was being used for custom HTTP authentication validation.

When the HTTP request contains the correct credentials, the request is passed along unmodified. When the credentials are invalid, the action needs to stop sequence processing and return a HTTP 401 response.

Working out how to do this wasn’t immediately obvious from the documentation. HTTP response parameters need to included under the error property for synchronous results. I have now opened a PR to improve the project documentation about this.

Pluggable Event Providers for Apache OpenWhisk

Recently I presented my work building ”pluggable event providers” for Apache OpenWhisk to the open-source community on the bi-weekly video meeting.

This was based on my experience building a new event provider for Apache OpenWhisk, which led me to prototype an easier way to add event sources to platform whilst cutting down on the boilerplate code required.

Slides from the talk are here and there’s also a video recording available.

This blog post is overview of what I talked about on the call, explaining the background for the project and what was built. Based on positive feedback from the community, I have now open-sourced both components of the experiment and will be merging it back upstream into Apache OpenWhisk in future.

pluggable event providers - why?

At the end of last year, I was asked to prototype an S3-compatible Object Store event source for Apache OpenWhisk. Reviewing the existing event providers helped me understand how they work and what was needed to build a new event source.

This led me to an interesting question…

Why do we have relatively few community contributions for event sources?

Most of the existing event sources in the project were contributed by IBM. There hasn’t been a new event source from an external community member. This is in stark contrast to additional platform runtimes. Support for PHP, Ruby, DotNet, Go and many more languages all came from community contributions.

Digging into the source code for the existing feed providers, I came to the following conclusions….

  • Trigger feed providers are not simple to implement.
  • Documentation how existing providers work is lacking.

Feed providers can feel a bit like magic to users. You call the wsk CLI with a feed parameter and that’s it, the platform handles everything else. But what actually happens to bind triggers to external event sources?

Let’s start by explaining how trigger feeds are implemented in Apache OpenWhisk, before moving onto my idea to make contributing new feed providers easier.

how trigger feeds work

Users normally interact with trigger feeds using the wsk CLI. Whilst creating a trigger, the feed parameter can be included to connect that trigger to an external event source. Feed provider options as provided as further CLI parameters.

1
2
3
4
5
6
wsk trigger create periodic \
  --feed /whisk.system/alarms/alarm \
  --param cron "*/2 * * * *" \
  --param trigger_payload β€œ{…}” \
  --param startDate "2019-01-01T00:00:00.000Z" \
  --param stopDate "2019-01-31T23:59:00.000Z"

But what are those trigger feed identifiers used with the feed parameter?

It turns out they are just normal actions which have been shared in a public package!

The CLI creates the trigger (using the platform API) and then invokes the referenced feed action. Invocation parameters include the following values used to manage the trigger feed lifecycle.

  • lifecycleEvent - Feed operation (CREATE, READ, UPDATE, DELETE, PAUSE, or UNPAUSE).
  • triggerName - Trigger identifier.
  • authKey - API key provided to invoke trigger.

Custom feed parameters from the user are also included in the event parameters.

This is the entire interaction of the platform with the feed provider.

Providers are responsible for the full management lifecycle of trigger feed event sources. They have to maintain the list of registered triggers and auth keys, manage connections to user-provided event sources, fire triggers upon external events, handle retries and back-offs in cases of rate-limiting and much more.

Feed providers used with a trigger are stored as custom annotations. This allows the CLI to call the same feed action to stop the event binding when the trigger is deleted.

trigger management

Reading the source code for the existing feed providers, nearly all of the code is responsible for handling the lifecycle of trigger management events, rather than integrating with the external event source.

Despite this, all of the existing providers are in separate repositories and don’t share code explicitly, although the same source files have been replicated in different repos.

The CouchDB feed provider is a good example of how feed providers can be implemented.

couchdb feed provider

The CouchDB trigger feed provider uses a public action to handle the lifecycle events from the wsk CLI.

This action just proxies the incoming requests to a separate web action. The web action implements the logic to handle the trigger lifecycle event. The web action uses a CouchDB database used to store registered triggers. Based upon the lifecycle event details, the web action updates the database document for that trigger.

The feed provider also runs a seperate Docker container, which handles listening to CouchDB change feeds from user-provided credentials. It uses the changes feed from the trigger management database, modified from the web action, to listen for triggers being added, removed, disabled or re-enabled.

When database change events occur, the container fires triggers on the platform with the event details.

building a new event provider?

Having understood how feed providers work (and how the existing providers were designed), I started to think about the new event source for an S3-compatible object store.

Realising ~90% of the code between providers was the same, I wondered if there was a different approach to creating new event providers, rather than cloning an existing provider and changing the small amount of code used to interact with the event sources.

What about building a generic event provider which a pluggable event source?

This generic event provider would handle all the trigger management logic, which isn’t specific to individual event sources. The event source plugin would manage connecting to external event sources and then firing triggers as event occurred. Event source plugins would implement a standard interface and be registered dynamically during startup.

advantages

Using this approach would make it much easier to contribute and maintain new event sources.

  • Users would be able to create new event sources with a few lines of custom integration code, rather than replicating all the generic trigger lifecycle management code.

  • Maintaining a single repo for the generic event provider is easier than having the same code copied and pasted in multiple independent repositories.

I started hacking away at the existing CouchDB event provider to replace the event source integration with a generic plugin interface. Having completed this, I then wrote a new S3-compatible event source using the plugin model. After a couple of weeks I had something working….

generic event provider

The generic event provider is based on the exiting CouchDB feed provider source code. The project contains the stateful container code and feed package actions (public & web). It uses the same platform services (CouchDB and Redis) as the existing provider to maintain trigger details.

The event provider plugin is integrated through the EVENT_PROVIDER environment variable. The name should refer to a Node.js module from NPM with the following interface.

1
2
3
4
5
6
7
8
9
10
11
12
// initialise plugin instance (must be a JS constructor)
module.exports = function (trigger_manager, logger) {
    // register new trigger feed
    const add = async (trigger_id, trigger_params) => {}
    // remove existing trigger feed
    const remove = async trigger_id => {}

   return { add, remove }
}

// valiate feed parameters
module.exports.validate = async trigger_params => {}

When a new trigger is added to the trigger feeds’ database, the details will be passed to the add method. Trigger parameters will be used to set up listening to the external event source. When external events occur, the trigger_manager can be use to automatically fire triggers.

When users delete triggers with feeds, the trigger will be removed from the database. This will lead to the remove method being called. Plugins should stop listening to messages for this event source.

firing trigger events

As event arrive from the external source, the plugin can use the trigger_manager instance, passed in through the constructor, to fire triggers with the identifier.

The trigger_manager parameter exposes two async functions:

  • fireTrigger(id, params) - fire trigger given by id passed into add method with event parameters.
  • disableTrigger(id, status_code, message) - disable trigger feed due to external event source issues.

Both functions handle the retry logic and error handling for those operations. These should be used by the event provider plugin to fire triggers when events arrive from external sources and then disable triggers due to external event source issues.

validating event source parameters

This static function on the plugin constructor is used to validate incoming trigger feed parameters for correctness, e.g. checking authentication credentials for an event source. It is passed the trigger parameters from the user.

S3 event feed provider

Using this new generic event provider, I was able to create an event source for an S3-compatible object store. Most importantly, this new event source was implemented using just ~300 lines of JavaScript! This is much smaller than the 7500 lines of code in the generic event provider.

The feed provider polls buckets on an interval using the ListObjects API call. Results are cached in Redis to allow comparison between intervals. Comparing the differences in bucket file name and etags, allows file change events to be detected.

Users can call the feed provider with a bucket name, endpoint, API key and polling interval.

1
wsk trigger create test-s3-trigger --feed /<PROVIDER_NS>/s3-trigger-feed/changes --param bucket <BUCKET_NAME> --param interval <MINS> --param s3_endpoint <S3_ENDPOINT> --param s3_apikey <COS_KEY>

File events are fired as the bucket files change with the following trigger events.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
  "file": {
    "ETag": "\"fb47672a6f7c34339ca9f3ed55c6e3a9\"",
    "Key": "file-86.txt",
    "LastModified": "2018-12-19T08:33:27.388Z",
    "Owner": {
      "DisplayName": "80a2054e-8d16-4a47-a46d-4edf5b516ef6",
      "ID": "80a2054e-8d16-4a47-a46d-4edf5b516ef6"
    },
    "Size": 25,
    "StorageClass": "STANDARD"
  },
  "status": "deleted"
}

Pssst - if you are using IBM Cloud Functions - I actually have this deployed and running so you can try it out. Use the /james.thomas@uk.ibm.com_dev/s3-trigger-feed/changes feed action name. This package is only available in the London region.

next steps

Feedback on the call was overwhelming positive on my experiment. Based upon this, I’ve now open-sourced both the generic event provider and s3 event source plugin to allow the community to evaluate the project further.

I’d like to build a few more example event providers to validate the approach further before moving towards contributing this code back upstream.

If you want to try this generic event provider out with your own install of OpenWhisk, please see the documentation in the README for how to get started.

If you want to build new event sources, please see the instructions in the generic feed provider repository and take a look at the S3 plugin for an example to follow.

CouchDB Filters With OpenWhisk Triggers

Imagine you have an OpenWhisk action to send emails to users to verify their email addresses. User profiles, containing email addresses and verification statuses, are maintained in a CouchDB database.

1
2
3
4
5
6
7
{
    ...
    "email": {
        "address": "user@host.com",
        "status": "unverified"
    }
}

Setting up a CouchDB trigger feed allows the email action to be invoked when the user profile changes. When user profiles have unverified email addresses, the action can send verification emails.

Whilst this works fine - it will result in a lot of unnecessary invocations. All modifications to user profiles, not just the email field, will result in the action being invoked. This will incur a cost despite the action having nothing to do.

How can we restrict document change events to just those we care about?

CouchDB filter functions to the rescue πŸ¦Έβ€β™‚οΈπŸ¦Έβ€.

CouchDB Filter Functions

Filter functions are Javascript functions executed against (potential) change feed events. The function is invoked with each document update. The return value is evaluated as a boolean variable. If true, the document is published on the changes feed. Otherwise, the event is filtered from the changes feed.

example

Filter functions are created through design documents. Function source strings are stored as properties under the filters document attribute. Key names are used as filter identifiers.

Filter functions should have the following interface.

1
2
3
4
5
6
7
8
9
function(doc, req){
    // document passes test
    if (doc.property == 'value'){
        return true;
    }

    // ... else ignore document upate
    return false;
}

doc is the modified document object and req contains (optional) request parameters.

Let’s now explain how to create a filter function to restrict profile update events to just those with unverified email addresses…

Filtering Profile Updates

user profile documents

In this example, email addresses are stored in user profile documents under the email property. address contains the user’s email address and status records the verification status (unverified or verified).

When a new user is added, or an existing user changes their email address, the status attribute is set to unverified. This indicates a verification message needs to be sent to the email address.

1
2
3
4
5
6
7
{
    ...
    "email": {
        "address": "user@host.com",
        "status": "unverified"
    }
}

unverified email filter

Here is the CouchDB filter function that will ignore document updates with verified email addresses.

1
2
3
4
5
6
7
function(doc){
    if (doc.email.status == 'unverified'){
        return true;
    }

    return false
}

design document with filters

Save the following JSON document in CouchDB. This creates a new design document (profile) containing a filter function (unverified-emails).

1
2
3
4
5
6
7
{
  "_id": "_design/profile",
  "filters": {
    "unverified-emails": "function (doc) {\n  if (doc.email.status == 'unverified') {\n    return true\n  }\n  return false\n}"
  },
  "language": "javascript"
}

trigger feed with filter

Once the design document is created, the filter name can be used as a trigger feed parameter.

1
2
3
wsk trigger create verify_emails --feed /_/myCloudant/changes \
--param dbname user_profiles \
--param filter "profile/unverified-emails"

The trigger only fires when a profile change contains an unverified email address. No more unnecessary invocations, which saves us money! 😎

caveats

“Why are users getting multiple verification emails?” 😑

If a user changes their profile information, whilst leaving their email address the same but before clicking the verification email, an additional email will be sent.

This is because the status field is still in the unverified state when the next document update occurs. Filter functions are stateless and can’t decide if this email address has already been seen.

Instead of leaving the status field as unverified, the email action should change the state to another value, e.g. pending, to indicate the verification email has been sent.

Any further document updates, whilst waiting for the verification response, won’t pass the filter and users won’t receive multiple emails. πŸ‘

Conclusion

CouchDB filters are an easy way to subscribe to a subset of events from the changes feed. Combining CouchDB trigger feeds with filters allows actions to ignore irrelevant document updates. Multiple trigger feeds can be set up from a single database using filter functions.

As well as saving unnecessary invocations (and therefore money), this can simplify data models. A single database can be used to store all documents, rather than having to split different types into multiple databases, whilst still supporting changes feeds per document type.

This is an awesome feature of CouchDB!

Large (Java) Applications on Apache OpenWhisk

This blog post will explain how to run large Java applications on Apache OpenWhisk.

Java actions are deployed from JAR files containing application class files. External libraries can be used by bundling those dependencies into a fat JAR file. The JAR file must be less than the maximum action size of 48MB.

So, what if the application uses lots of external libraries and the JAR file is larger than 48MB? πŸ€”

Apache OpenWhisk’s support for custom Docker runtimes provides a workaround. In a previous blog post, we showed how this feature could be used with Python applications which rely on lots of external libraries.

Using the same approach with Java, a custom Java runtime can be created with additional libraries pre-installed. Those libraries do not need to be included in the application jar, which will just contain private class files. This should hopefully reduce the JAR file to under the action size limit.

Let’s walk through an example to show how this works….

Example Java Class using External Libraries

1
2
3
4
5
6
7
8
9
10
11
import com.google.gson.JsonObject;
import org.apache.commons.text.WordUtils;

public class Capitialize {
    public static JsonObject main(JsonObject args) {
        String name = args.getAsJsonPrimitive("message").getAsString();
        JsonObject response = new JsonObject();
        response.addProperty("capitalized", WordUtils.capitalize(name));
        return response;
    }
}

This example Java action capitalises sentences from the input event. It uses the Apache Commons Text library to handle capitialisation of input strings. This external library will be installed in the runtime, rather than bundled in the application JAR file.

Build Custom Java Runtime

1
git clone https://github.com/apache/incubator-openwhisk-runtime-java
  • Edit the core/java8/proxy/build.gradle file and update the dependencies configuration with extra dependencies needed in the runtime.
1
2
3
4
dependencies {
    compile 'com.google.code.gson:gson:2.6.2'
    compile 'org.apache.commons:commons-text:1.6' // <-- the additional library
}

Note: com.google.code.gson:gson:2.6.2 is used by the runtime to handle JSON encoding/decoding. Do not remove this dependency.

  • Execute the following command to build the custom Docker image.
1
./gradlew core:java8:distDocker

Push Image To Docker Hub

If the build process succeeds, a local Docker image named java8action should be available. This needs to be pushed to Docker Hub to allow Apache OpenWhisk to use it.

1
docker tag java8action <DOCKERHUB_USERNAME>/java8action
  • Push the tagged custom image to Docker Hub.
1
docker push <DOCKERHUB_USERNAME>/java8action

Create OpenWhisk Action With Custom Runtime

  • Compile the Java source file.
1
javac Capitialize.java
  • Create the application JAR from the class file.
1
jar cvf capitialize.jar Capitialize.class
  • Create the Java action with the custom runtime.
1
wsk action create capitialize capitialize.jar --main Capitialize --docker <DOCKERHUB_USERNAME>/java8action

--main is the class file name containing the action handler in the JAR file. --docker is the Docker image name for the custom runtime.

Test it out!

  • Execute the capitialize action with input text to returned capitalised sentences.
1
wsk action invoke capitialize -b -r -p message "this is a sentence"

If this works, the following JSON should be printed to the console.

1
2
3
{
    "capitalized": "This Is A Sentence"
}

The external library has been used in the application without including it in the application JAR file! πŸ’―πŸ’―πŸ’―

Conclusion

Apache OpenWhisk supports running Java applications using fat JARs, which bundle application source code and external dependencies. JAR files cannot be more than 48MB, which can be challenging when applications uses lots of external libraries.

If application source files and external libraries result in JAR files larger than this limit, Apache OpenWhisk’s support for custom Docker runtimes provide a solution for running large Java applications on the platform.

By building a custom Java runtime, extra libraries can be pre-installed in the runtime. These dependencies do not need to be included in the application JAR file, which reduces the file size to under the action size limit. πŸ‘

Provisioning IBM Cloud Services With Terraform

This blog post will teach you how to provision applications services on IBM Cloud with Terraform.

Terraform is an open-source ”infrastructure-as-code” tool. It allows cloud resources to be defined using a declarative configuration file. The Terraform CLI then uses this file to automatically provision and maintain cloud infrastructure needed by your application. This allows the creation of reproducible environments in the cloud across your application life cycle.

IBM Cloud created an official provider plugin for Terraform. This allows IBM Cloud services to be declared in Terraform configuration files. This is a much better approach than using the CLI or IBM Cloud UI to create application services manually.

The following steps needed to set up Terraform with IBM Cloud will be explained.

  • Install Terraform CLI tools and IBM Cloud Provider Plugin.
  • Create API keys for platform access.
  • Terraform configuration for IBM Cloud services.
  • Terraform CLI commands to provision IBM Cloud services.

Ready? Let’s go! 😎😎😎

Install Terraform

Once installed, the terraform command will be available.

1
2
3
$ terraform
Usage: terraform [-version] [-help] <command> [args]
...

Install IBM Cloud Terraform Plugin

  • Download the IBM Cloud Terraform plugin binary from the Github releases page.
  • Unzip the release archive to extract the plugin binary (terraform-provider-ibm_vX.Y.Z).
  • Move the binary into the Terraform plugins directory for the platform.
    • Linux/Unix/OS X: ~/.terraform.d/plugins
    • Windows: %APPDATA%\terraform.d\plugins

IBM Cloud Authentication Credentials

IBM Cloud’s Terraform provider plugin needs authentication credentials to interact with the platform. This is best handled by creating an API key and exporting as an environment variable. API keys can be created from the IBM Cloud CLI or the web site.

using the cli

1
ibmcloud iam api-key-create terraform-api-key

The apikey property in the JSON output is the API key value.

1
2
3
4
5
6
7
8
{
  "name": "terraform-api-key",
  "description": "...",
  "apikey": "xxx-yyy-zzz",
  "createdAt": "...",
  "locked": false,
  "uuid": "..."
}

Store this value securely. API keys cannot be retrieved after creation!

using the web site.

  • From the IAM Users page, select a user account.
  • Under the ”API keys” table, click the ”Create an IBM Cloud API Key” button.
  • Give the key a name and (optional) description.
  • Make a note of the API key value returned. API keys cannot be retrieved after creation.

exporting as an environment variable

  • Expose the API key as an environment variable to provide credentials to Terraform.
1
export BM_API_KEY=API_KEY_VALUE

Terraform configuration

We can now start to write configuration files to describe IBM Cloud services we want to provision. Terraform configuration files are human-readable text files, ending with the .tf extension, which contain HashiCorp Configuration Language (HCL) syntax.

IBM Cloud platform services come in two flavours: IAM managed resource instances and older Cloud Foundry-based service instances. This is due to the history of IBM Cloud starting as Bluemix, a Cloud Foundry-based cloud platform. Both platform services types can be provisioned using Terraform.

Most IBM Cloud platform services are available today as ”resource instances”.

create new configuration file

  • Create a new infra.tf file which contains the following syntax.
1
provider "ibm" {}

add resource instances

Resource instances can be added to the configuration file as follows.

1
2
3
4
5
6
resource "ibm_resource_instance" "resource_instance_name" {
  name              = "test"
  service           = "service-id"
  plan              = "service-plan"
  location          = "region-info"
}
  • resource_instance_name - identifier for this service in the configuration, referenced by service keys.
  • name - user-provided service name used by the platform to identify service.
  • service - service identifier on the platform (can be found in the service documentation page).
  • plan - service plan used for billing.
  • location - cloud region used during service provisioning.

Here is an example of provisioning a Cloudant database using the ibm_resource_instance configuration.

1
2
3
4
5
6
resource "ibm_resource_instance" "cloudant" {
  name              = "my-cloudant-db"
  service           = "cloudantnosqldb"
  plan              = "lite"
  location          = "us-south"
}

Other parameters are supported for resource configuration, see the docs for more details…

add resource keys

Applications accessing resource instances need service credentials. Access keys can also be provisioned using Terraform configuration.

1
2
3
4
5
resource "ibm_resource_key" "resource_key_name" {
  name                 = "my-key-name"
  role                 = "<IAM_ROLE>"
  resource_instance_id = "${ibm_resource_instance.resource_instance_name.id}"
}
  • name - user-provided key name used by the platform to identify the credentials.
  • role - IBM Cloud IAM roles (as supported by the service, e.g. Writer or Reader).

Here is an example of provisioning a resource key for the Cloudant example from above.

1
2
3
4
5
resource "ibm_resource_key" "cloudant_key" {
  name                  = "my-db-key"
  role                  = "Manager"
  resource_instance_id  = "${ibm_resource_instance.cloudant.id}"
}

(optional) add services instances to configuration

Use the following configuration to provision older Cloud Foundry services.

1
2
3
4
5
6
resource "ibm_service_instance" "service_instance_name" {
  name       = "test"
  space_guid = "cf-space-guid"
  service    = "service-id"
  plan       = "service-plan"
}
  • service_instance_name - identifier for this service in the configuration, referenced by service keys.
  • name - user-provided service name used by the platform to identify the service.
  • service - service identifier on the platform (can be found in the service documentation page).
  • plan - service plan used for billing.

(optional) add service instance keys

Applications accessing service instances need service credentials. Service keys can also be provisioned using Terraform configuration.

1
2
3
4
resource "ibm_service_key" "service_key_name" {
  name                 = "my-key-name"
  service_instance_guid = "${ibm_service_instance.service_instance_name.id}"
}
  • name - user-provided key name used by the platform to identify the credentials.
  • service_instance_guid - Service instance GUID.

add output configuration

Accessing service keys and other service details is handled with output configuration in Terraform files.

1
2
3
output "app_credentials" {
  value = "${ibm_resource_key.resource_key_name.credentials}"
}

Output values can be logged to the console using the Terraform CLI.

Here is an example of accessing Cloudant credentials provisioned in the example above.

1
2
3
output "cloudant_credentials" {
  value = "${ibm_resource_key.cloudant_key.credentials}"
}

Run Terraform commands

Having finished the configuration file to describe our applications services, the Terraform CLI can now provision those services!

1
terraform init
  • Validate the configuration file for syntax errors.
1
terraform validate
  • Display the platform changes to be executed on the configuration file.
1
terraform plan

Here is the example output from running that command with the Cloudant database example.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  + ibm_resource_instance.cloudant
      id:                   <computed>
      location:             "us-south"
      name:                 "my-cloudant-db"
      plan:                 "lite"
      service:              "cloudantnosqldb"
      status:               <computed>

  + ibm_resource_key.cloudant_key
      id:                   <computed>
      credentials.%:        <computed>
      name:                 "my-db-key"
      parameters.%:         <computed>
      resource_instance_id: "${ibm_resource_instance.cloudant.id}"
      role:                 "Manager"
      status:               <computed>

Plan: 2 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------
  • Execute the planned changes using apply.
1
terraform apply -auto-approve

Terraform will now provision the platform services, resources keys and output credentials to the console.

Here is the example output from running that command with the Cloudant database example.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
ibm_resource_instance.cloudant: Creating...
  location: "" => "us-south"
  name:     "" => "my-cloudant-db"
  plan:     "" => "lite"
  service:  "" => "cloudantnosqldb"
  status:   "" => "<computed>"
ibm_resource_instance.cloudant: Still creating... (10s elapsed)
ibm_resource_instance.cloudant: Still creating... (20s elapsed)
ibm_resource_instance.cloudant: Creation complete after 21s (ID: ...)
ibm_resource_key.cloudant_key: Creating...
  credentials.%:        "" => "<computed>"
  name:                 "" => "my-db-key"
  parameters.%:         "" => "<computed>"
  resource_instance_id: "" => "crn:v1:bluemix:public:cloudantnosqldb:us-south:a/...::"
  role:                 "" => "Manager"
  status:               "" => "<computed>"
ibm_resource_key.cloudant_key: Creation complete after 8s (ID: ...)

Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

Outputs:

cloudant_credentials = {
  apikey = <API_KEY_VALUE>
  host = <DB_HOST>
  ...
}

API keys from the cloudant_credentials output section can be used applications to interact with the provisioned database! πŸ‘πŸ‘πŸ‘

Conclusion

Provisioning cloud services using Terraform is a great way to manage application resources on IBM Cloud.

Applications resources are defined in a declarative configuration file, following the “infrastructure-as-code” approach to managing cloud environments. This configuration is maintained in the application’s source code repository to enable reproducible environments.

IBM Cloud provides an official provider plugin for Terraform. This allows IBM Cloud services to be defined through custom configuration primitives. Developers can then use the Terraform CLI to provision new resources and extract service keys needed to access those services. πŸ’―πŸ’―πŸ’―