Some intrepid company blogs are posting their technical challenges and how they solve them. I wish more would open up and talk about what they are doing as it helps everyone move forward. Here are a few blogs documenting their encounters with the bleeding edge:
Update 3: The book was released! Find it on Amazon at The Art of Capacity Planning. Update 2: Maybe the iPhone can use a little capacity planning? What's Behind the iPhone 3G Glitches: One source says Apple programmed the Infineon chip to demand a more powerful 3G signal than the iPhone really requires. So if too many people try to make a call or go on the Internet in a given area, some of the devices will decide there's insufficient power and switch to the slower network—even if there is enough 3G bandwidth available. Update: To get a taste of what will be served, mySQL DBA has a nice post titled Capacity Planning, Architecture, Scaling, Response time, Throughput. You learn how to figure out when your application will break by building a 3rd order polynomial. Cool stuff! John Allspaw who is the Operations Engineering Manager at Flickr is about to publish a book with O'Reilly. There are not much details so far but it seems interesting and relevant to High Scalability. Allspaw combines personal anecdotes from many phases of Flickr's growth with insights from his colleagues in many other industries to give you solid guidelines for measuring your growth, predicting trends, and making cost-effective preparations. Topics include:
- Evaluating tools for measurement and deployment
- Capacity analysis and prediction for storage, database, and application servers
- Designing architectures to easily add and measure capacity
- Handling sudden spikes
- Predicting exponential and explosive growth
- How cloud services such as EC2 can fit into a capacity strategy
This strategy is stated perfectly by Flickr's Myles Grant: The Flickr engineering team is obsessed with making pages load as quickly as possible. To that end, we’re refactoring large amounts of our code to do only the essential work up front, and rely on our queuing system to do the rest. Flickr uses a queuing system to process 11 million tasks a day. Leslie Michael Orchard also does a great job explaining the queuing meme in his excellent post Queue everything and delight everyone. Asynchronous work queues are how you scalably solve problems that are too big to handle in real-time. The process:
Queues Give You Lots of New Knobs to Play WithAs features are added data consumers multiply, so throwing a new task into a sequential process has a good chance of blowing latencies. Queueing gives much more control and flexibility over the performance of a system. With queues some advanced strategies you have at your disposal are:
What Can You do with Your Queue?The options are endless, but here are some uses I found out in the wild:
Queuing Implies an Event Driven State Machine Based Client ArchitectureMoving to queuing has architecture implications. The client and server are nolonger connected in a direct request-response sort of way. Instead, the server continually sends events to clients. The client is event driven instead of request-response driven. Internally clients often simulates the reqest-response model even though Ajax is asynchronous. It might be better to drop the request-response illusion and just make the client an event driven state machine. An event can come from a request, or from asynchronous jobs, or events can be generated by others performing activities that a client should see. Each client has an event channel that the system puts events on for a client to consume. The client is responspible for making sense of the event in its current context and is capable of handling any event regardless of its original source.
Queuing SystemsIf you are in the market for a queuing system take a look at:
One of the cool things about Mr. Scoble is he doesn't pretend to know everything, which can be an deadly boring affliction in this field. In this case Robert is asking for help in an upcoming interview. Maybe we can help? Here's Robert's plight: I’m really freaked out. I have one of the biggest interviews of my life coming up and I’m way under qualified to host it. It’s on Thursday and it’s about Scalability and Performance of Web Services. Look at who will be on. Matt Mullenweg, founder of Automattic, the company behind WordPress (and behind this blog). Paul Bucheit, one of the founders of FriendFeed and the creator of Gmail (he’s also the guy who gave Google the “don’t be evil” admonishion). Nat Brown, CTO of iLike, which got six million users on Facebook in about 10 days. What would you ask?
Although the problem of scaling human genome sequencing is not exactly about building bigger, faster and more reliable websites it is most interesting in terms of scalability. The paper describes a new technology by the startup company Complete Genomics to sequence the full human genome for the fraction of the cost of earlier possibilities. Complete Genomics is building the world’s largest commercial human genome sequencing center to provide turnkey, outsourced complete human genome sequencing to customers worldwide. By 2010, their data center will contain approximately 60,000 processors with 30 petabytes of storage running their sequencing software on Linux clusters. Do you find this interesting and relevant to HighScalability.com?
This is a useful post by Frank Mashraqi, Director of Business Operations & Technical Strategy for a top 50 website that delivers billions of page views per month.
Since scalability is considered a non-functional requirement, it is often overlooked in the hopes of decreasing time to market. Adding scalability down the road can decrease the time to market but only after assuming significant technical debt.
Balancing performance and scalability vs. fast iteration and cost efficiency can be a significant challenge for startups. The good news is that achieving this balance is not impossible.
Read the rest of the article here and view a presentation here.
I have introduced pattern languages in my earlier post on The Pattern Bible for Distributed Computing. Achieving highest possible scalability is a complex combination of many factors. This PLoP 2007 paper presents a pattern language that can be used to make a system highly scalable. The Scalability Pattern Language introduced by Kanwardeep Singh Ahluwalia includes patterns to:
- Introduce Scalability
- Optimize Algorithm
- Add Hardware
- Add Parallelism
- Add Intra-Process Parallelism
- Add Inter-Porcess Parallelism
- Add Hybrid Parallelism
- Optimize Decentralization
- Control Shared Resources
- Automate Scalability
Compares MapReduce to other parallel processing approaches and suggests new paradigm for clouds and grids
Kent Langley was kind enough to create a profile template for Joyent, Kent's new employer. Joyent is an infrastructure and development company that has put together a multi-site, multi-million dollar hosting setup for their own applications and for the use of others. Joyent competes with the likes of Amazon and GoGrid in the multi-player cloud computing game and hosts Bumper Sticker: A 1 Billion Page Per Month Facebook RoR App. The template was originally created with web services in mind, not cloud providers, but I think it still works in an odd sort of way. Remember, anyone can fill out a profile template for their system and share their wonderfulness with the world.
Getting to Know You
How is your system architected?
How is your team setup?
What infrastructure do you use?
How do you handle customer support?We have a customer support team that is dedicated to helping our customers. Our services pretty much assume that you will have some degree of ability with building and deploying systems. However, if you don't, we have standard, extended plan, and partners that can all be combined in various ways to help our clients. Our support follows the sun around the world.
How is your data center setup?
SUMMARYThe Joyent Accelerator is an extremely flexible tool for building and deploying all manner of infrastructure. If you have questions, please just contact us at firstname.lastname@example.org. Email or at an address is the best way to reach us usually.
Software design patterns are an emerging tool for guiding and documenting system design. Patterns usually describe software abstractions used by advanced designers and programmers in their software. Patterns can provide guidance for designing highly scalable distributed systems. Let's see how! Patterns are in essence solutions to problems. Most of them are expressed in a format called Alexandrian form which draws on constructs used by Christopher Alexander. There are variants but most look like this:
- The pattern name
- The problem the pattern is trying to solve
- Design rationale: This tells where the pattern came from, why it works, and why experts use it
- Pipes and Filters