This is an excerpt from my article Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud.
The future looks many, big, complex, and adaptive:
- Many clouds.
- Many servers.
- Many operating systems.
- Many languages.
- Many storage services.
- Many database services.
- Many software services.
- Many adjunct human networks (like Mechanical Turk).
- Many fast interconnects.
- Many CDNs.
- Many cache memory pools.
- Many application profiles (simple request-response, live streaming, computationally complex, sensor driven, memory intensive, storage intensive, monolithic, decomposable, etc).
- Many legal jurisdictions. Don't want to perform a function on Patriot Act "protected" systems then move the function elsewhere.
- Many SLAs.
- Many data driven pricing policies that like airplane pricing algorithms will price "seats" to maximize profit using multi-variate time sensitive pricing models.
- Many competitive products. The need to defend your territory never seems to go away. Though what will map to scent-marking I'm not sure.
- Many and evolving resource gradients.
- Big concurrency. Everyone and everything is a potential source of real-time data that needs to processed in parallel to be processed at all within tolerable latencies.
- Big redundancy. Redundant nodes in an unpredictable world will provide cover for component failures and workers to take over when another fails.
- Big crushing transient traffic spikes as new mega worldwide social networks rapidly shift their collective attention from new shiny thing to new shiny thing.
- Big increases in application complexity to keep streams synchronized acrosss networks. Event handling will go off the charts as networks grow larger and denser and intelligent behaviour attaches to billions of events generated per second.
- Big data. Sources and amounts of historical and real-time data are increasing at increasing rates.
This challenging, energetic, ever changing world is a very different looking world than today. It's as if Bambi was dropped into the middle of a Velociraptor pack.
If you look at the early days of this blog, when web scalability was still in its heady bloom of youth, many of the articles had to do with leveraging MySQL and memcached. Exciting times. Shard MySQL to handle high write loads, cache objects in memcached to handle high read loads, and then write a lot of glue code to make it all work together. That was state of the art, that was how it was done. The architecture of many major sites still follow this pattern today, largely because with enough elbow grease, it works.
This was a pre-cloud, relational database dominated world, built from parts scrounged from the remnants of enterprises and datacenters past. Twitter and Digg started in this era, but are evolving into something different, as scaling pressures increase and new purpose built technologies pop into being.
With a little perspective, it's clear the MySQL+memcached era is passing. It will stick around for a while. Old technologies seldom fade away completely. Some still ride horses. Some still use CDs. And the Internet will not completely replace that archaic electro-magnetic broadcast technology called TV, but the majority will move on into a new era.
The world of scalable databases is not a simple one. They come in every race, creed, and color. Rick Cattell has brought some harmony to that world by publishing High Performance Scalable Data Stores, a nicely detailed one stop shop paper comparing scalable databases soley on the content of their character. Ironically, the first step in that evaluation is dividing the world into four groups:
- Key-value stores: Redis, Scalaris, Voldmort, and Riak.
- Document stores: Couch DB, MongoDB, and SimpleDB.
- Record stores: BigTable, HBase, HyperTable, and Cassandra.
- Scalable RDBMSs: MySQL Cluster, ScaleDB, Drizzle, and VoltDB.
The paper describes each system and then compares them on the dimensions of Concurrency Control, Data Storage Replication, Transaction Model, General Comments, Maturity, K-hits, License Language.
And the winner is: there are no winners. Yet. Rick concludes by pointing to a great convergence:
I believe that a few of these systems will gain critical mass and key players, and will pull away from the others by next year. At that point, open source contributors will likely migrate to those players.
From the paper:
There was a bit of drama earlier when I posted a free job opening for Zynga. It caused unfortunate and just plain wrong accusations. It also caused a number of requests for more free job posts, which I should have anticipated, but obviously I can't let this blog become cluttered with that kind of stuff. Earlier I tried a job board type service, but that never really worked out. So what to do? Someone suggested a sponsored post approach and I think that's a good compromise. It minimizes the noise, let's people know about work, and brings in a little revenue. It works like an advertisement. If you are interested please let me know. When we have any job openings there will be a sponsored post like this one, that you can easily ignore or pay attention to, depending on your situation.
Squarespace Looking for Full-time Scaling Expert
Interested in helping a cutting-edge, high-growth startup scale? Squarespace, which was profiled here last year in Squarespace Architecture - A Grid Handles Hundreds of Millions of Requests a Month and also hosts this blog, is currently in the market for a crack scalability engineer to help build out its cloud infrastructure. Squarespace is very excited about finding a full-time scaling expert.
Interested applicants should go to http://www.squarespace.com/jobs-software-engineer for more information.
Why migrate your database? Efficiency and availability problems are harming your business – reports are out of date, your batch processing window is nearing its limits, outages (unplanned/planned) frequently halt work. Database consolidation – remove the costs that result from a heterogeneous database environment (DBAs time, database vendor pricing, database versions, hardware, OSs, patches, upgrades etc.). OK, so the driving forces for migration are clear, what now?
Read more on BigDataMatters.com
If Twitter is the “nervous system of the web” as some people think, then what is the brain that makes sense of all those signals (tweets) from the nervous system? That brain is the Twitter Analytics System and Kevin Weil, as Analytics Lead at Twitter, is the homunculus within in charge of figuring out what those over 100 billion tweets (approximately the number of neurons in the human brain) mean.
Twitter has only 10% of the expected 100 billion tweets now, but a good brain always plans ahead. Kevin gave a talk, Hadoop and Protocol Buffers at Twitter, at the Hadoop Meetup, explaining how Twitter plans to use all that data to an answer key business questions.
What type of questions is Twitter interested in answering? Questions that help them better understand Twitter. Questions like:
While exploring deep into some dusty old library stacks, I dug up Nostradamus' long lost NoSQL codex. What are the chances? Strangely, it also gave the plot to the next Dan Brown novel, but I left that out for reasons of sanity. About NoSQL, here is what Nosty (his friends call him Nosty) predicted are the signs you may need a NoSQL database...
Joel Spolsky and Jeff Atwood are raising VC money for StackOverflow. This is interesting for three reasons: 1) Joel has always seemed like a keep it small and grow organically type of guy, so this is a big step in a different direction. 2) It means they think there's a very big market in the Q&A space and they mean to capture as much as the market as possible. 3) Most importantly for this blog, Joel gives some good advice on when to stay fresh and local and when it's time to jump for the brass ring, scale up your ambition, and go for VC money. Please see Joel's blog post for the details, but here's when to go VC: