File Storage For Serverless Applications

Apr 27, 2018
openwhisk serverless object-storage stmm
5 min read

“Where do you store files without a server?"

…is the most common question I get asked during Q&A after one of my “Introduction to Serverless Platforms” conference talks. Searching for this question online, this is the answer you will often find.

“Use an object store for file storage and access using the S3-compatible interface. Provide direct access to files by making buckets public and return pre-signed URLs for uploading content. Easy, right?"

Responding to people with this information often leads to the following response:

🤔🤔🤔

Developers who are not familiar with cloud platforms, can often understand the benefits and concepts behind serverless, but don’t know the other cloud services needed to replicate application services from traditional (or server-full) architectures.

In this blog post, I want to explain why we do not use the file system for files in serverless applications and introduce the cloud services used to handle this.

serverless runtime file systems

Serverless runtimes do provide access to a filesystem with a (small) amount of ephemeral storage.

Serverless application deployment packages are extracted into this filesystem prior to execution. Uploading files into the environment relies on them being included within the application package. Serverless functions can read, modify and create files within this local file system.

These temporary file systems come with the following restrictions…

Maximum application package size limits additional files that can be uploaded.
Serverless platforms usually limit total usable space to around 512MB.
Modifications to the file system are lost once the environment is not used for further invocations.
Concurrent executions of the same function use independent runtime environments and do not share filesystem storage.
There is no access to these temporary file systems outside the runtime environment.

All these limitations make the file system provided by serverless platforms unsuitable as a scalable storage solution for serverless applications.

So, what is the alternative?

object stores

Object stores manage data as objects, as opposed to other storage architectures like file systems which manage data as a file hierarchy. Object-storage systems allow retention of massive amounts of unstructured data, with simple retrieval and search capabilities.

https://en.wikipedia.org/wiki/Object_storage

Object stores provide “storage-as-a-service” solutions for cloud applications.

These services are used for file storage within serverless applications.

Unlike traditional block storage devices, data objects in object storage services are organised using flat hierarchies of containers, known as “buckets”. Objects within buckets are identified by unique identifiers, known as “keys”. Metadata can also be stored alongside data objects for additional context.

Object stores provide simple access to files by applications, rather than users.

advantages of an object store

scalable and elastic storage

Rather than having a disk drive, with a fixed amount of storage, object stores provide scalable and elastic storage for data objects. Users are charged based upon the amount of data stored, API requests and bandwidth used. Object stores are built to scale as storage needs grow towards the petabyte range.

simple http access

Object stores provide a HTTP-based API endpoint to interact with the data objects.

Rather than using a standard library methods to access the file system, which translates into system calls to the operating system, files are available over a standard HTTP endpoint.

Client libraries provide a simple interface for interacting with the remote endpoints.

expose direct access to files

Files stored in object storage can be made publicly accessible. Client applications can access files directly without needing to use an application backend as a proxy.

Special URLs can also be generated to provide temporary access to files for external clients. Clients can even use these URLs to directly upload and modify files. URLs are set to expire after a fixed amount of time.

ibm cloud object storage

IBM Cloud provides an object storage service called IBM Cloud Object Storage. This service provides the following features concerning resiliency, reliability and cost.

data resiliency

Buckets’ contents can be stored with the following automatic data resiliency choices.

Cross Region. Store data across three regions within a geographic area.
Regional. Store data in multiple data centres within a single geographic region.
Single Data Centre. Store data across multiple devices in a single data centre.

Cross Region is the best choice for “regional concurrent access and highest availability”. Regional is used for “high availability and performance”. Single Data Centre is appropriate when “when data locality matters most”.

storage classes

Data access patterns can be used to save costs by choosing the appropriate storage class for data storage.

IBM Cloud Object Storage offers the following storage classes: Standard, Vault, Cold Vault, Flex.

Standard class is used for workloads with frequent data access. Vault and Cold Vault are used with infrequent data retrieval and data archiving workloads. Flex is a mixed storage class for workloads where access patterns are more difficult to predict.

costs

Storage class and data resiliency options are used to calculate the cost of service usage.

Storage is charged based upon the amount of data storage used, operational requests (GET, POST, PUT…) and outgoing public bandwidth.

Storage classes affect the price of data retrieval operations and storage costs. Storage classes used for archiving, e.g. cold vault, charge less for data storage and more for operational requests. Storage classes used for frequency access, e.g. standard, charge more for data storage and less for operational requests.

Higher resiliency data storage is more expensive than lower resiliency storage.

lite plan

IBM Cloud Object Storage provides a generous free tier (25GB storage per month, 5GB public bandwidth) for Lite account users. IBM Cloud Lite accounts provide perpetual access to a free set of IBM Cloud resources. Lite accounts do not expire after a time period or need a credit card to sign up.

conclusion

Serving files from serverless runtimes is often accomplished using object storage services.

Object stores provide a scalable and cost-effective service for managing files without using storage infrastructure directly. Storing files in an object store provides simple access from serverless runtimes and even allows the files to be made directly accessible to end users.

In the next blog posts, I’m going to show you how to set up IBM Cloud Object Storage and access files from serverless applications on IBM Cloud Functions. I’ll be demonstrating this approach for both the Node.js and Swift runtimes.