Database Sharding for startups

The most important aspect of a scalable web architecture is data partitioning. Most components in a modern data center are completely stateless, meaning they just do batches of work that is handed to them, but don't store any data long-term. This is true of most web application servers, caches like memcached, and all of the network infrastructure that connects them. Data storage is becoming a specialized function, delegated most often to relational databases. This makes sense, because stateless servers are easiest to scale - you just keep adding more. Since they don't store anything, failures are easy to handle too - just take it out of rotation.



Stateful servers require more careful attention. If you are storing all of your data in a relational database, and the load on that database exceeds its capacity, there is no automatic solution that allows you to simply add more hardware and scale up. (One day, there will be, but that's for another post). In the meantime, most websites are building their own scalable clusters using sharding.



Read more on LessonLearned blog.