Cloud Foundry Application Monitoring Bot For Slack

Cloud Foundry makes it so easy to build, deploy and manage applications that it can be a struggle just to keep up with development progress…

“Who is restarting this application?” “What is this new service instance?” “When did this application instance run out of memory?”

Development teams are increasingly using Slack to collaborate on projects and using custom bots to manage and monitor applications, triggered through the channel messages. This approach, popularised by Github, has now become known as “ChatOps”. Using group chat for development projects gives greater operational visibility to everyone in the team.

Slack has exploded in use over the past two years, recently signing up more than a million active users. The platform publishes an API for writing bots that respond automatically to messages, allowing users to write custom integrations for external services.

Users can register webhooks to receive channel messages, based upon keyword triggers, and allow bots to reply with new channel messages. The platform also provides a websocket channel with registered bots for real-time communication.

Could we write a custom bot for monitoring applications on the Cloud Foundry platform?

The bot would publish notifications about applications and services into group channels, helping keep teams updated with platform events in real-time.

Cloud Foundry Monitoring APIs

Cloud Foundry provides access to the platform through a series of RESTful APIs, exposed by the Cloud Controller component. User commands from the CF CLI tool are translated into calls to these APIs.

Tip: Setting the CF_TRACE environment parameter to true will show the API calls generated by the CLI commands.

Platform user account credentials are used to obtain OAuth2 tokens for authenticating service calls.

Looking at the documentation, there’s an endpoint for retrieving all platform events. This API is used to retrieve events for an application when using the CF CLI events command. Events can be filtered by the application, event type and timestamps. Responses include events about changes to applications, services and service instances.

Polling this API, with timestamp filtering to ignore old events, we can retrieve a continuous stream of new platform events.

Slack Integration

Setting up a new bot integration for a Slack group provides you with a token you can use to authenticate with the Real-Time Messaging API. Rather than having to implement the Websocket-based API handler ourselves, we can use one of the many existing community libraries.

Using the Node.js client library, passing in the authentication token, we just need to implement callback handlers for the API events.

var Slack = require('slack-client')

var slackToken = 'xoxb-YOUR-TOKEN-HERE' # Add a bot at https://my.slack.com/services/new/bot and copy the token here.
var autoReconnect = true # Automatically reconnect after an error response from Slack.
var autoMark = true # Automatically mark each message as read after it is processed.

var slack = new Slack(slackToken, autoReconnect, autoMark)
slack.on('message', function (message) {...})
slack.on('error', function (err) {...})
slack.on('open', function () {})

slack.login()

When platform events occur, we forward these to any channels the bot is registered in.

Plugging together the Cloud Foundry event monitoring code with the Slack bot integration, cfbot was born…

cfbot

This Cloud Foundry monitoring bot can be deployed to… Cloud Foundry!

You will need to register the bot with your Slack group to receive an authentication token. This token, along with login details for a platform account, need to be created as user-provided service credentials. The bot will read these service credentials on deployment and start monitoring for events.

Full installation instructions available in the project README.

usage

cfbot will monitor events from applications in all spaces and organisations that the user account has access to.

Users can filter the applications and events being reported using the apps and events commands. Both commands take application or event identifiers that are used to match incoming events. The wildcard ‘*’ identifier can be used to revert to matching all events.

The following events are currently registered:

  • App Creation and Deletion Events.
  • App Lifecycle Events (start, stop, restart, restage)
  • Instance Crash Events.
  • Service Creation, Deleting and Binding.
  • Scaling (memory, CPU, disk)
  • Routes Changes (map, unmap)

Other bots

Other people have written Cloud Foundry bots before cfbot. Here are the other projects I discovered that might be useful…