This is a guest posting by Marty Abbott and Michael Fisher, authors of The Art of Scalability. I'm still reading their book and will have an interview with them a little later.
If 2010 is the year that you’ve decided to kickoff your startup or if you’ve already got something off the ground and are expecting double or triple digit growth, this list is for you. We all want the attention of users to achieve viral growth but as many can attest, too much attention can bring a startup to its knees. If you’ve used Twitter for any amount of time you’re sure to have seen the “Fail Whale”, which is so often seen that it has its own fan club. Take a look at the graph below from Compete.com showing Twitter’s unique visitors. One can argue that limitations in the product offering have as much to do with the flattening of growth over the past six months as does the availability, but it’s hard to believe the inability of users to actually use the service has not hindered growth.
What should you do if you want your startup to scale with double and triple digit growth? We’ve put together a list of 11 strategies that will aid in your quest for scalability. In our recently released book “The Art of Scalability” you will find more details about these and other strategies.
1) Scale Out – Not Up
This one shouldn’t come as a surprise to anyone. It’s at the heart of our database and application cube of scale and is a common theme throughout Todd Hoff’s High Scalability website. We often refer to it as “horizontal” scale being preferable to “vertical scale”. Scaling horizontally often means that you can rely more on more cost effective commodity hardware to reduce your cost of operations (see number 4 below). It also means that you can more easily and quickly respond to customer demands by buying one more box rather than having to spec and buy a bigger box. It’s critically important where in the database tier are concerned as that’s where most architectures tend to converge and where most problems with scale tend to occur.
2) Use Databases Appropriately
We believe in treating the database as nothing more than a persistent data store within very high transaction rate systems. Don’t rely heavily on stored procedures, especially those with business logic embedded within them, as it will keep you from quickly and easily moving to competing database products for reasons of cost or availability. Aggressively implement caching at multiple levels such as content delivery networks, object caches, page caches and in memory caches. Rely upon the database to resolve data concerns around atomicity, consistency, isolation and durability (the ACID properties). Be careful with overreliance on the database as a relational engine as that will very often lead you into thinking that you need a single monolithic database for all of your data needs and that is a scalability killer.
3) Soar Through the Clouds
If you need high availability 24x7, relying solely upon a cloud for your services will likely let you down for the near future. But they offer incredible opportunity for spiky traffic, development environments, data warehouse and ETL processing and many other temporal needs. If you need something quickly, like additional capacity, then bring on some cloud resources. If you have work that is only performed for a portion of the day, why pay for a system to sit partially idle when you can utilize a rental system in the cloud? Have development environments that aren’t used 24x7 and which require constant re-imaging – set ‘em up in the cloud!
4) Goldfish not Thoroughbreds
If you have a sick goldfish, you probably don’t take it to the veterinarian but if you have an expensive thoroughbred you are likely to spend several times its purchase price in veterinary bills over its lifetime. We are huge believers in having the beauty of your product nested within the blueprint of your architecture and your design – not in your hardware. We love to buy cheap hardware that’s easily disposed of or replaced rather than painstakingly caring for and maintaining expensive hardware with designer labels. Design your product for goldfish (commodity hardware) and you will nearly always lower your total cost of ownership and increase your ability to leverage clouds and rapid horizontal scale.
5) Simplify, Simplify, Simplify
The best engineers are those that easily reduce complex problems and solve them with simple solutions. While a good engineer can design a wonderfully complex and still workable system, a great engineer can take that system and make it easier to understand and therefore easier to maintain and scale. Take complex workloads and break them into easily digestible and scalable chunks, such as the mapreduce approach. Stay away from complex solutions that bind multiple systems together into failure clusters such as two phase commit (2PC).
6) Be the Master of Your Own Destiny!
This one shouldn’t come as a surprise to anyone either. When it comes to your site and your architecture you simply cannot build in reliance upon third parties to scale. This doesn’t mean that you should build your own database, develop your own object cache and implement your own content delivery network and load balancers. It does mean that you absolutely cannot say that “We scale with databases from SuperDatabaseCorp”. By relying on a third party, you are forcing yourself to upgrade when they say to do so, migrate product lines when they say and potentially lose a product if they choose to move away from that product line or go out of business. Moreover, what if their product doesn’t keep up with your rate of growth? It isn’t that difficult in our experience to build horizontal scale into your architecture, even within the database tier.
7) Learn Aggressively
Santayana’s Repetitive Consequences, “Those who cannot learn from history are doomed to repeat it,” is a universal truth. Some organizations fail so infrequently (such as nuclear power plants) that it is nearly impossible to develop a way to learn and get better at what they do. While there are competing theories on this particular matter, most of us have plenty of opportunity to learn from past mistakes in both our engineering and operations environments. Implement an appropriate post mortem and learn from your mistakes! Take every opportunity to learn and put that learning back into your design and architecture. Leverage review boards and joint architecture designs to achieve this feedback loop. Monitor your systems, mature that monitoring aggressively and properly in order to anticipate problems and then design around them.
8) Communicate Asynchronously As Much As Possible
Synchronicity may have been a great Police album, but it kills within high transaction product platforms. The multiplicative effect of failure tells us that as systems die that are in series (which is what you have with synchronous call), our availability will go down even if we’ve employed pooling or clustering. Moreover, slow server responses are likely to have downstream effects upon other services calling the slow servers. Developing asynchronous solutions wherever possible removes some of this availability concern. Instead of slowing down everything or making multiple services unresponsive as systems are restarted, the slow services are the only ones that don’t function properly. Other servers waiting for responses don’t “hang” on a response but instead continue doing the rest of their work and simply don’t display the portion of work the slow services were performing. Obviously this approach can’t work everywhere, but it can work in more places than most companies think.
9) Hire The Best People
If you’ve followed our posts on Giga or on our blog in the past you know that we believe that scalability all starts with human capital. Highly available and scalable solutions start with bright and dedicated engineers and good leadership and management. If you are a one person shop, it’s easy to be all of these things. If you are pumping out 1M lines of code a year to handle content, commerce and social networking features with high ROIs, you had better have some great people, great coordination and great management. You must hire great people quickly, and remove underperforming talent. It is also important not to have too many levels of management as that creates bureaucracy. Too few levels on the other hand can actually slow things down as managers are over encumbered by communication and get information out too slowly to their teams.
10) D-I-D Approach for Scalability
Consider changing your view of scaling your systems into three separate phases: Design, Implementation (in code) and Deployment. Each of these phases has a different type and amount of cost to you. It is comparatively cheap to (D)esign (architect) a system to scale, increasingly more expensive to (I)mplement that architecture within the actual software and fairly expensive to actually buy and (D)eploy all the systems within a production environment. Our D-I-D approach to scale helps you to quickly react to scale needs by always designing well ahead of need, coding or implementing slightly ahead of need and deploying the capital assets “just in time” for your growth.
11) Design with Fault Isolative “Swim Lanes”
Every time you build something, ask yourself “How can this fail, and what will fail as a result of this failing?” Build not only fault tolerance, but fault isolation into your architectures. Network architectures have long had the notion of fault isolation through collision domains. Scalable Internet architectures should leverage this concept such that failures in certain components don’t impact other zones of functionality. We refer to these fault isolation zones as “swim lanes.”