This is a guest posting by Marty Abbott and Michael Fisher, authors of The Art of Scalability. I'm still reading their book and will have an interview with them a little later.
If 2010 is the year that you’ve decided to kickoff your startup or if you’ve already got something off the ground and are expecting double or triple digit growth, this list is for you. We all want the attention of users to achieve viral growth but as many can attest, too much attention can bring a startup to its knees. If you’ve used Twitter for any amount of time you’re sure to have seen the “Fail Whale”, which is so often seen that it has its own fan club. Take a look at the graph below from Compete.com showing Twitter’s unique visitors. One can argue that limitations in the product offering have as much to do with the flattening of growth over the past six months as does the availability, but it’s hard to believe the inability of users to actually use the service has not hindered growth.
What should you do if you want your startup to scale with double and triple digit growth? We’ve put together a list of 11 strategies that will aid in your quest for scalability. In our recently released book “The Art of Scalability” you will find more details about these and other strategies.
Terrastore is a new-born document store which provides advanced scalability and elasticity features without sacrificing consistency.
Here are a few highlights:
- Ubiquitous: based on the universally supported HTTP protocol.
- Distributed: nodes can run and live everywhere on your network.
- Elastic: you can add and remove nodes dynamically to/from your running cluster with no downtime and no changes at all to your configuration.
- Scalable at the data layer: documents are partitioned and distributed among your nodes, with automatic and transparent re-balancing when nodes join and leave.
- Scalable at the computational layer: query and update operations are distributed to the nodes which actually holds the queried/updated data, minimizing network traffic and spreading computational load.
- Consistent: providing per-document consistency, you're guaranteed to always get the latest value of a single document, with read committed isolation for concurrent modifications.
- Schemaless: providing a collection-based interface holding JSON documents with no pre-defined schema, you can just create your collections and put everything you want into.
- Easy operations: install a fully working cluster in just a few commands and no XML to edit.
- Features rich: support for push-down predicates, range queries and server-side update functions.
Ashleigh Anderson from Zynga let me know that they have an opening for a Systems Engineer working on some new games they are developing. Given the state of the job market I thought it worth posting. Here are more details...
Current disk based RDBMS can run out of steam when processing large data. Can these problems be solved by migrating from a disk based RDBMS to an IMDB? Any limitations? To find out, I tested one of each from the two leading vendors who together hold 70% of the market share - Oracle's 11g and TimesTen 11g, and IBM's DB2 v9.5 and solidDB 6.3.
read more at BigDataMatters.com
How many times have we all run across a situation where the performance tests on a piece of software pass with flying colors on the test systems only to see the software exhibit poor performance characteristics when the software is deployed in production? Read More Here...
"But it is not complicated. [There's] just a lot of it."
-- Richard Feynman on how the immense variety of the world arises from simple rules.
- Have We Reached the End of Scaling?
- Applications Become Black Boxes Using Markets to Scale and Control Costs
- Let's Welcome our Neo-Feudal Overlords
- The Economic Argument for the Ambient Cloud
- What Will Kill the Cloud?
- The Amazing Collective Compute Power of the Ambient Cloud
- Using the Ambient Cloud as an Application Runtime
- Applications as Virtual States
We have not yet begun to scale. The world is still fundamentally disconnected and for all our wisdom we are still in the earliest days of learning how to build truly large planet-scaling applications.
Today 350 million users on Facebook is a lot of users and five million followers on Twitter is a lot of followers. This may seem like a lot now, but consider we have no planet wide applications yet. None.
Tomorrow the numbers foreshadow a new Cambrian explosion of connectivity that will look as different as the image of a bare lifeless earth looks to us today. We will have 10 billion people, we will have trillions of things, and we will have a great multitude of social networks densely interconnecting all these people to people, things to things, and people to things.
How can we possibly build planet scalable systems to handle this massive growth if building much smaller applications currently stresses architectural best practices past breaking? We can't. We aren't anywhere close to building applications at this scale, except for perhaps Google and a few others, and there's no way you and I can reproduce what they are doing. Companies are scrambling to raise hundreds of millions of dollars in order to build even more datacenters. As the world becomes more and more global and more and more connected, handling the load may require building applications 4 or 5 orders of magnitude larger than any current system. The cost for an infrastructure capable of supporting planet-scale applications could be in the 10 trillion dollar range (very roughly estimated at $100 million a data center times 10K).
If you aren't Google, or a very few other companies, how can you possibly compete? For a glimmer of a possible direction that may not require a kingdom's worth of resources, please take a look at this short video:
This post draws some of the common patterns behind the various NOSQL alternatives, and how they address the database scalability challenge.
Read the full story here
One of the core assumption behind many of today’s databases is that disks are reliable. In other words, your data is “safe” if it is stored on a disk, and indeed most database solutions rely heavily on that assumption. Is it a valid assumption?
Read the full story here