Serverless Logs With Elasticsearch

Serverless platforms can seem like magic.

Taking your code and turning it into scalable microservices in the cloud without having to set up or manage any infrastructure.

No provisioning VMs. No configuring Linux environments. No upgrading middleware packages.

Which is wonderful until something goes wrong with your microservices in production…

“Let me just log into the machine."

Serverless platforms do not allow this.

No tracing system calls. No running top. No connecting a debugger to the process. You can’t even grep through the logs!

Many of the tools and techniques we use to diagnose bugs rely on having access to the environment.

Fortunately, we do still have access to logging output generated by our serverless functions. Phew.

Storing, searching and analysing these logs is crucial to efficiently diagnosing and fixing issues on serverless platforms.

In this blog post, we’re going to look at using a popular open-source solution to manage the logs being generated by our serverless functions. This solution is also known as “The ELK Stack”.

TLDR: There is now a Logstash input plugin for OpenWhisk. This will automatically index serverless application logs into Elasticsearch. See here for usage instructions: https://github.com/jthomas/logstash-input-openwhisk

Elasticsearch, Logstash and Kibana

…are the three open-source projects that, when combined, are known as The ELK Stack. It provides a scalable search engine for indexed documents.

Elasticsearch “is a search engine based on Lucene. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents."

Logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching). If you store them in Elasticsearch, you can view and analyze them with Kibana.

Kibana is an open source analytics and visualization platform designed to work with Elasticsearch. You use Kibana to search, view, and interact with data stored in Elasticsearch.

The ELK Stack is a perfect solution for managing logs from our serverless functions.

But how do we configure this solution to automatically index logs from our serverless platform?

Let’s start by looking serverless platform we are using…

OpenWhisk

OpenWhisk is an open-source serverless platform developed by IBM. Developers deploy functions to execute in response to external events, e.g. database updates, messages on a queue or HTTP requests. The platform invokes these functions on-demand in milliseconds, rather than having services sat idle waiting for requests to arrive.

Let’s walk through an example.

Serverless Functions

Here’s a sample serverless function which returns a greeting to the user. The code logs the invocation parameters and response message.

function main (params) {
  console.log('invoked with parameters:', params)

  const user = params.user || 'Donald Trump'
  const response = { greeting: `Hello ${user}` }

  console.log('returns: ', response)
  return response
}

Deploying this serverless function to OpenWhisk and invoking it generates an activation record.

$ wsk action create logs logs.js
ok: created action logs
$ wsk action invoke logs -b -r -p user 'Bernie Sanders'
{
    "greeting": "Hello Bernie Sanders"
}
$ wsk activation list
activations
2adbbbcc0242457f80dc51944dcd2039                 logs
...

OpenWhisk activation records are available through the platform API. Each record contains the stdout and stderr logs generated during the serverless function invocation.

Serverless Logs

Retrieving the activation record for the previous invocation, we can see the output generated by the calls to console.log.

$ wsk activation get 2adbbbcc0242457f80dc51944dcd2039
ok: got activation 2adbbbcc0242457f80dc51944dcd2039
{
    "namespace": "james.thomas@uk.ibm.com",
    "name": "logs",
    "version": "0.0.3",
    "publish": false,
    "subject": "james.thomas@uk.ibm.com",
    "activationId": "2adbbbcc0242457f80dc51944dcd2039",
    "start": 1477925373990,
    "end": 1477925374063,
    "response": {
        "status": "success",
        "statusCode": 0,
        "success": true,
        "result": {
            "greeting": "Hello Bernie Sanders"
        }
    },
    "logs": [
        "2016-10-31T14:49:34.059745626Z stdout: invoked with parameters: {}",
        "2016-10-31T14:49:34.061228724Z stdout: returns:  { greeting: 'Hello Donald Trump' }"
    ],
    ...
}

OpenWhisk stores these records indefinitely, making them available for retrieval by the activation id.

However, developers need more than being able to retrieve logs to be effective at diagnosing and resolving issues with serverless functions.

Forwarding these logs to Elasticsearch will enable us to run full-text search across all logs generated, quickly retrieve all output for a particular serverless function, set up monitoring dashboards and much more…

Using Logstash will allow us to ingest and transform OpenWhisk logs into Elasticsearch documents.

Logstash Input Plugins

Logstash supports a huge variety of event sources through the use of a plugin mechanism. These plugins handle retrieving the external events and converting them to Elasticsearch documents.

Logstash has a huge repository of official and community supported input plugins. These plugins ingest everything from log files, syslog streams, databases, message queues, websockets and much more.

HTTP Polling Input Plugin

Logstash already has an input plugin for pulling events from a HTTP URL by polling. Users provide the URL in the logstash configuration, along with the polling schedule. Logstash will automatically retrieve and ingest the JSON response as an event stream.

input {
  http_poller {
    urls => {
      "my_events" => "http://localhost:8000/events"
    }
    # Poll site every 10s
    interval => 10
    request_timeout => 60
    codec => "json"
  }
}

Great, so we can configure this plugin to call OpenWhisk API for retrieving activation records and automatically ingest them into Elasticsearch?

Unfortunately not…

Polling OpenWhisk Logs?

Each time the client calls the API to retrieve the activation records, we want to retrieve only those records that have occurred since the last poll. This ensures we are not ingesting the same records more than once.

The OpenWhisk API for retrieving activation records supports a query parameter (since) which restricts results to those that occurred after the parameter value’s timestamp.

Using this parameter in the polling URL, updated to the value of the last polling time, will allow us to ensure we only retrieve new activation records.

Unfortunately, the HTTP input plugin does not support setting dynamic query string parameters.

This means we cannot use the existing plugin to efficiently ingest OpenWhisk logs into Elasticsearch.

So we started work on a new plugin to support this behaviour…

OpenWhisk Input Plugin

This input plugin drains logs from OpenWhisk into Elasticsearch.

Install the plugin with the following command.

$ bin/logstash-plugin install logstash-input-openwhisk

Once the plugin is installed, you need to configure Logstash with your platform endpoint and user credentials.

This sample configuration will poll the OpenWhisk platform for new logs every fifteen minutes and index them into Elasticsearch. Each activation record will be a separate document.

input {
  openwhisk {
    # Mandatory Configuration Parameters
    hostname => "openwhisk.ng.bluemix.net"
    username => "sample_user@email.com"
    password => "some_password"
    # Supports "cron", "every", "at" and "in" schedules by rufus scheduler
    schedule => { "every" => "15m"}
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
  }
}

The plugin supports the same configuration values for the schedule parameter as the HTTP input plugin.

More examples of using the plugin are available in the examples directory in the project repository.

Demonstration

Here’s a demonstration of the OpenWhisk input plugin being used in the ELK stack. As we invoke serverless functions in OpenWhisk, Kibana shows the activation records appearing in the dashboard. Logstash is polling the logs API and ingesting the records into Elasticsearch in real-time.

Conclusion

Developers using serverless platforms have no access to the infrastructure environment running their code. Debugging production bugs relies on using logging output to diagnose and resolve issues.

Elasticsearch, Logstash and Kibana has become the scalable open-source solution for log management and analysis.

Using the Logstash plugin for OpenWhisk, serverless logs will be automatically indexed into Elasticsearch in real-time. Developers can use the Kibana frontend to easily diagnose and monitor issues in production.

In the next post, we’ll look at using Docker to set up Elasticsearch, Logstash and Kibana with our custom OpenWhisk plugin.

Until then… 😎