Strategy

Zen and the Art of Scaling - A Koan and Epigram Approach

This is a guest post derived from an email conversation with Russell Sullivan, a computer architect and creator of Alchemy Database, A Hybrid RDBMS/NOSQL-Datastore.

Russell (AKA Jak Sprats) has been pondering, considering, and implementing distributed databases for many years. In a recent email conversation he shared 44 of the lessons he has learned from developing the infrastructure for high performance / highly scalable systems. Some are well known, some are debatable, and some obviously result from a deep experience that is worth learning from:

There are maybe 20 classic bottlenecks (CPU, NIC overload, memory fragmentation, disk seeks, swap, thread deadlock, packet loss, etc.), have a basic understanding of them, because each is a dark tunnel, and you need a specialised flashlight for each.
The true style of scalable architecture is rooted in the nature of bottlenecks. I always talk about commutative operations on a sharded relational model, and I figure if that is in place, whatever is built on top of it must scale. So if you can figure out where the endpoints are, then make those scalable, and the rest will already be scalable. This is the concept as not boxing yourself in. It’s hard to put into words...when I code something and I get the feeling that it is clean, it always scales, when something is nagging, telling me, that it is not clean, it eventually bites me in the ass.
The thing is, the key to the whole thing is that finest granularity of locking. Once you have a lock deep down in your system, your whole system is a locked system. This "lock" is not a thread-lock, it could be a cross-shard-join, it is anything that is not inherently scalable.
RDBMSes have their place. Structured data is superior in performance, but inferior in flexibility. If data makes sense being in a RDBMS, if it normalises well, then do it, but don’t get locked in, and always view your data as being in a normalised schema at the moment, but something that could also be modeled differently. Again the lock-in, avoid lock-in, it leads to bottlenecks.
I can't seem to express this well in words, but I always know it when I code it, something in my head goes: remember this approach, it ends up messing with you in the end.
Scaling is about compromises. Statelessness scales forever, but does nothing. Commutative operation on a sharded RDBMS schema scales forever, but is hard to model data in. Doing 20 table joins is convenient, but it can not be made to scale horizontally.
I don’t know how to word this, but in C code, you can hard code something and it feels wrong, so you make it a #ifdef, and it feels better, then you make it an enum, and its now OK, eventually, it evolves into a configuration parameter that can be set via REST or a signal. Gut feeling needs to tell you how to cut an app into hardcoded values and REST-changeable values.
The art of scaling is getting done what needs to be gotten done, while staying as close to statelessness as possible, without introducing unneeded complexity by doing so.
Scalability is all about loose coupling, about things happening, and effecting other things, but loose cause and effect. Computers have such speed that when not bottlenecking they are always undersaturated.
Programming is funny, when you are 22, you can program 20 hours straight, consistently, and as time goes on, this capacity wanes. when you are 30, you have programmed most algorithms at least once, and start appreciating how brilliant most advances in the field are. I'm 35 and I can still code for many hours on end, but my most productive days are ones where I reflect and solve things largely in my subconscious. I have enjoyed this evolution and look forward to where it leads, many programmers drop off, can't take the pain of concentration ... they lose.
The future of computing is getting loads of 3.5Ghz cores connected by a real fast bus and sharing a constant speed L3 cache and a variable speed NUMA RAM to do stuff. In the future there are bound to be some alternative permanent storage options, but they may just be considered slower RAM.
Identify the single point of failure. There will be multiple points of failure.
The decision on if you eventually need to horizontally scale data storage is key. Key-value and sharded data will scale almost infinitely, try to follow those lines when designing your system and try to design so if you need cross shard joins, you don’t need them very often.
Profiling, tracing, etc. should be built in from the beginning, use a finished package to do this.
The right tool for the right job, but don’t use 1000 tools.
Once you geo-distribute, the game changes radically, everything needs to be loosely coupled. Geo-distribution is often OK as a natural disaster level failover, i.e. a cold standby
Static content is your friend.
Client side logic is your friend. There are many many client side CPU tricks. This is an important part of scaling, pushing logic to where it is abundant, and not repeating anything.
Caching is an art best served by monitoring.
Hardware vendors lie, appliance vendors lie. SSDs are great, but will not save you. Vendor lock-in is the anti-scale.
Open source is good and bad.
The cloud scales great, except for disk I/O intensive apps. Its costs can be huge, so can its savings.
Hadoop scales, learn hadoop
Benchmarks lie. Isolated rollouts give you real numbers.
Know the hardware market. Some things are slowing down, some aren’t. Some are getting cheaper, some aren’t.
Async message queues (rabbitMQ, zeroMQ), when properly implemented, are brilliant.
Deploy 10GigE in the right places, possibly consider InfiniBand. Use multiple NICs.
Use lightweight async in-RAM data storage when possible.
When you need to go to disk, make it for large objects.
Correct partitioning. Never share a frontend and analytics DB, use memcached, offload whatever you can. Identify your firehose (your bottleneck). If it is the frontend it will come down to horizontal data storage. If it is backend it will be Hadoop and it will also come down to horizontal data storage.
Compression can be used to keep more data in RAM, to lessen the amount of disk I/O, to reduce network traffic, to delay or minimize the need for horizontal scaling ... and you probably have enough free CPU ticks for it.
A good single node and single core's performance is the key to scaling. everything you scale from that point will be at best linear and more likely exponential.
A bottleneck is usually skinny, i.e. most of the time it is not happening but in a few places, so fix those and you are good for a while. if your bottlenecks are happening in more than a few places, you're toast, time to rearchitect.
Threads scale funnily, sometimes they scale good on a few cores, sometimes they dont, and they almost never scale well to many cores ... they are effective when applied to yield on I/O waits, but so are events.
Event driven languages are difficult to program in. Event driven programming should used as glue putting together events. If you need to do complicated nested algorithms, do them at an endpoint, devoid of I/O.
If all your databases are behind the same switch, you dont need to worry about CAP, but you have an obvious SPOF.
The idea of using your own servers and then firing off cloud instances to handle traffic spikes is a real crafty way to use the cloud cost effectively, but in practice can be really complicated if you need to sync data to the cloud: you introduce a whole-other level of data synchronisation and security issues to contend with.
If I spend 3 months on a project, my previous experience is my greatest asset, at about 9 months, I start to get a feel for the nature of the system and data I am working with, at 18 months, my finger is fully on the pulse, and after that the changes I make are precise solutions to real world problems. Intuition and rumination provide the most important breakthroughs in CS.
Data visualisation is immensely helpful for me, because I am a visual thinker, this characteristic might be a real advantage for a programmer who deals w/ data.
Linux gurus are freaks ... whom I love working with, it takes a lifetime to understand what is going on w/ the linux ecosystem and I am not gonna spend it, so it's great having someone around who has, so I can badger them w/ questions.
Most of the scaling stories you hear about seem to go: blah blah blah, lots of work & changes, took years, we survived. So all this back and forth we are doing on the subject is helping raise awareness. Yay Us.
People being honest on the shortcomings of their platforms is just as helpful (and much more believable) than people hyping up their platforms.
Long and short running queries can not exist in peace. Try to segregate your workloads. If you cant segregate, think of relaxing ACID for the long running queries (i.e. batch them).
Scaling is a black black art.

Zen and the Art of Scaling - A Koan and Epigram Approach

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale