S3-compatible object storage systems generally have the ability to store objects into different tiers with different characteristics so you can get the best combination of cost and performance to match the needs of any given application workload. Storage tiers are referred to as ‘Storage Classes’ in S3 parlance with example storage classes at AWS including “STANDARD” for general purpose use and lower storage classes like “DEEP_ARCHIVE” and “GLACIER” for backups and archive use cases.

Ceph’s S3-compatible storage capabilities also includes the ability to create your own Storage Classes and by default it automatically creates a single storage class called “STANDARD” to match the default tier offered by AWS.

In this 3 part blog post we’re going to dive into auto-tiering object storage with Ceph and explore some basic Lua scripting as part of that which I think you’ll find approachable even if you’ve not used or heard of Lua before:

PART 1 - Ceph Object storage basics and why you’d want to set up different storage classes
PART 2 - How to use Lua scripting to automatically assign objects to different storage classes based on size
PART 3 - More advanced Lua scripting to dynamically match objects to storage classes based on regex matching to object names

Ceph Object Storage Basics

Ceph object storage clusters consist of two primary storage pools, one for metadata and one for data.

The metadata pool stores the index of all the objects for every bucket and contains “rgw.bucket.index” in the name. Essentially the bucket index pool is a collection of databases, one for each bucket which contains the list of every object in that bucket and information on the location of each chunk of data (RADOS object) that makes up each S3 object.

Data pools typically contain “rgw.buckets.data” in their name and they store all the actual data blocks (RADOS objects) that make up each S3 object in your cluster.

The metadata in the bucket index pool needs to be on fast storage that’s great for small reads and writes (IOPS) as it is essentially a collection of databases. As such (and for various technical reasons beyond this article) this pool must be configured with a replica layout and ideally should be stored on all-flash storage media. Flash storage for the bucket index pool is also important as buckets must resize their bucket index databases (RocksDB based) periodically to make it larger to make more room for more object metadata as a bucket grows. This process is called “resharding” and it all happens automatically behind the scenes but resharding can greatly impact cluster performance if the bucket index pool is on HDD media rather than flash media.

In contrast, the data pool (eg default.rgw.buckets.data) is typically storing large chunks of data that can be written efficiently to HDDs. This is where erasure coding layouts shine and provide one with a