Scale Indefinitely on S3 With These Secrets of the S3 Masters

In a great article, Amazon S3 Performance Tips & Tricks, Doug Grismore, Director of Storage Operations for AWS, has outed the secret arcana normally reserved for Premium Developer Support customers on how to really use S3:

  • Size matters. Workloads with less than 50-100 total requests per second don't require any special effort. Customers that routinely perform thousands of requests per second need a plan.
  • Automated partitioning. Automated systems scale S3 horizontally by continuously splitting data into partitions based on high request rates and the number of keys in a partition (which leads to slow lookups). Lessons you've learned with sharding may also apply to S3.   
  • Avoid hot spots. Like most sharding schemes, you want to avoid hot spots by the smart selection of key names. S3 objects are stored in buckets.  Each object is identified using a key. Keys are kept in sorted order. Keys in S3 are partitioned by prefix. Objects that sort together are stored together, so you want to select key names that will spread load around rather than all hash to the same partition.
    • Creating keys based on a incrementally increasing numbers or date-time constructs, as is common when creating IDs, is bad for S3 scaling: 
      • All new content is put in a single partition.
      • Partitions storing older content are wasting their potential IOPS because the data they contain is probably colder.
  • Reverse the order of the digits in an identifier. This simple trick starts a key with what is essentially a random number. This fans out transactions across many potential child partitions.  S3 will detect this parallel type of write pattern and it will automatically create multiple child partitions from the same parent simultaneously. 
  • Maintaining sort order. Instead of creating a separate indexing database, many applications depend on the sort order provided by S3 to page through data, which is ruined by the previous trick. New trick:
    • Create a partition-enabling hash, use it as a prefix-list, then name keys with the key name elements you'd like to request by furthest to the left.
    • With a 100 operations per second and 20 million stored objects per partition, a four character hex hash partition set in a bucket or sub-bucket namespace could theoretically grow to support millions of operations per second and over a trillion unique keys before we'd need a fifth character in the hash. 

The original article is much more detailed, but this is a quick look on how you can structure your S3 key space to optimize throughput using insight on how S3 actually works.