Strategy: Flickr - Do the Essential Work Up-front and Queue the Rest 

This strategy is stated perfectly by Flickr's Myles Grant: The Flickr engineering team is obsessed with making pages load as quickly as possible. To that end, we’re refactoring large amounts of our code to do only the essential work up front, and rely on our queuing system to do the rest. Flickr uses a queuing system to process 11 million tasks a day. Leslie Michael Orchard also does a great job explaining the queuing meme in his excellent post Queue everything and delight everyone. Asynchronous work queues are how you scalably solve problems that are too big to handle in real-time. The process:

  • Identify the minimum feedback the client (UI, API) needs to know an operation succeeded. It's enough, for example, to update a client's view when a posting a message to a microblogging service. The client probably isn't aware of all the other steps that happen when a message is added and doesn't really care when they happen as long as the obvious cases happen in an appropariate period of time.
  • Queue all work not on the critical path to a job queueing system so the critical path remains unblocked. Work is then load balanced across a cluster and completed as resources permit. The more sharded your architecture is the more work can be done in parallel which minimizes total throughput time. This approach makes it much easier to bound response latencies as features scale.

    Queues Give You Lots of New Knobs to Play With

    As features are added data consumers multiply, so throwing a new task into a sequential process has a good chance of blowing latencies. Queueing gives much more control and flexibility over the performance of a system. With queues some advanced strategies you have at your disposal are:
  • Horizontal scaling. Add more processing resources to do more work in parallel.
  • Priority order processing. Paying customers, can be processed first, for example. Take measures to avoid starvation.
  • Aggregation. Work sitting on the same queue for the same user can be aggregated together so it can be processed as a batch.
  • Work canceling. A request later in the queue can cancel work earlier in the queue. These can just be dropped.
  • CPU limitting. When jobs have unbounded CPU time it destroys the latency for other jobs sitting in the queue. Bounding CPU limits on jobs evens out latency for everyone.
  • Low priority work dropping. Under load low priority jobs can be dropped. Just make you have background sweep processes that catch work that should have been done and redoes it.
  • Admission control. Under load clients can be told about when to retry. This is the best form of flow control, end-to-end flow with the client. We want to push back on work as high up the stack as we can. Stop the client from pushing work to you and you've accomplished something. Just having blind retries and timeouts puts immense pressure on the whole system. These ideas have been employed in embedded real-time systems forever and now it seems they'll move into web services as well.

    What Can You do with Your Queue?

    The options are endless, but here are some uses I found out in the wild:
  • Backfill jobs. Backfill is what Flickr calls asynchronous job that: alter database tables in preparation for a new feature; fix existing features; or other operation that touch a lot of accounts, photos, or groups. For example, a sharding approach means related data is spread through many different shards. To delete a user account would require visiting each shard to delete that users data. Each of those deletes would be queued to they could be done in parallel. Now lets say a bug prevented some of the user data from deleting. After the bug was fixed the user data for all the impacted user accounts would have to be scheduled to be deleted again.
  • Low latency funciton call router.
  • Scatter/gather calls in paralellel.
  • Defer expensive library calls.
  • Parellize database queries.
  • Job queue system for a cluster. Efficiently use all your pool of CPU power.
  • Sending scheduled mail merged emails.
  • Creating guest hosts
  • Put heavy code on backend instead of the web server.
  • Call a cron script to update topic hits and popular article hits.
  • Clean useless data from database because it's outdated.
  • Resize photos.
  • Run daily reports.
  • Update search indexes.
  • Speed up batch jobs by running them in parallel.
  • SpamAssassin spamtraps.

    Queuing Implies an Event Driven State Machine Based Client Architecture

    Moving to queuing has architecture implications. The client and server are nolonger connected in a direct request-response sort of way. Instead, the server continually sends events to clients. The client is event driven instead of request-response driven. Internally clients often simulates the reqest-response model even though Ajax is asynchronous. It might be better to drop the request-response illusion and just make the client an event driven state machine. An event can come from a request, or from asynchronous jobs, or events can be generated by others performing activities that a client should see. Each client has an event channel that the system puts events on for a client to consume. The client is responspible for making sense of the event in its current context and is capable of handling any event regardless of its original source.

    Queuing Systems

    If you are in the market for a queuing system take a look at:
  • Gearman - Open Source Message Queuing System
  • Amazon's SQS. The latencies for this service tend to be high and variable so it may not be appropriate for all tasks.
  • beanstalkd.
  • Apache ActiveMQ.
  • Spread Queue
  • Rabbit MQ
  • Open AMQ
  • The Schwartz
  • Starling
  • Simple MQ
  • Roll your own.

    Related Articles

  • Flick Engineers Do it Offline by Myles Grant
  • Queue everything and delight everyone by Leslie Michael Orchard.
  • Gearman - Open Source Message Queuing System
  • GridGain: One Compute Grid, Many Data Grids

    Click to read more ...

  • Tuesday

    Help a Scoble out. What should Robert ask in his scalability interview?

    One of the cool things about Mr. Scoble is he doesn't pretend to know everything, which can be an deadly boring affliction in this field. In this case Robert is asking for help in an upcoming interview. Maybe we can help? Here's Robert's plight: I’m really freaked out. I have one of the biggest interviews of my life coming up and I’m way under qualified to host it. It’s on Thursday and it’s about Scalability and Performance of Web Services. Look at who will be on. Matt Mullenweg, founder of Automattic, the company behind WordPress (and behind this blog). Paul Bucheit, one of the founders of FriendFeed and the creator of Gmail (he’s also the guy who gave Google the “don’t be evil” admonishion). Nat Brown, CTO of iLike, which got six million users on Facebook in about 10 days. What would you ask?

    Click to read more ...


    Paper: Scaling Genome Sequencing - Complete Genomics Technology Overview

    Although the problem of scaling human genome sequencing is not exactly about building bigger, faster and more reliable websites it is most interesting in terms of scalability. The paper describes a new technology by the startup company Complete Genomics to sequence the full human genome for the fraction of the cost of earlier possibilities. Complete Genomics is building the world’s largest commercial human genome sequencing center to provide turnkey, outsourced complete human genome sequencing to customers worldwide. By 2010, their data center will contain approximately 60,000 processors with 30 petabytes of storage running their sequencing software on Linux clusters. Do you find this interesting and relevant to

    Click to read more ...


    Scalability for Startups: How to Grow Up without Blowing Up

    This is a useful post by Frank Mashraqi, Director of Business Operations & Technical Strategy for a top 50 website that delivers billions of page views per month.

    Since scalability is considered a non-functional requirement, it is often overlooked in the hopes of decreasing time to market. Adding scalability down the road can decrease the time to market but only after assuming significant technical debt.

    Balancing performance and scalability vs. fast iteration and cost efficiency can be a significant challenge for startups. The good news is that achieving this balance is not impossible.

    Read the rest of the article here and view a presentation here.

    Click to read more ...


    Paper: Scalability Design Patterns

    I have introduced pattern languages in my earlier post on The Pattern Bible for Distributed Computing. Achieving highest possible scalability is a complex combination of many factors. This PLoP 2007 paper presents a pattern language that can be used to make a system highly scalable. The Scalability Pattern Language introduced by Kanwardeep Singh Ahluwalia includes patterns to:

    • Introduce Scalability
    • Optimize Algorithm
    • Add Hardware
    • Add Parallelism
      • Add Intra-Process Parallelism
      • Add Inter-Porcess Parallelism
      • Add Hybrid Parallelism
    • Optimize Decentralization
    • Control Shared Resources
    • Automate Scalability

    Click to read more ...


    Is MapReduce going mainstream?

    Compares MapReduce to other parallel processing approaches and suggests new paradigm for clouds and grids

    Click to read more ...


    Joyent - Cloud Computing Built on Accelerators

    Kent Langley was kind enough to create a profile template for Joyent, Kent's new employer. Joyent is an infrastructure and development company that has put together a multi-site, multi-million dollar hosting setup for their own applications and for the use of others. Joyent competes with the likes of Amazon and GoGrid in the multi-player cloud computing game and hosts Bumper Sticker: A 1 Billion Page Per Month Facebook RoR App. The template was originally created with web services in mind, not cloud providers, but I think it still works in an odd sort of way. Remember, anyone can fill out a profile template for their system and share their wonderfulness with the world.

    Getting to Know You

  • What is the name of your system and where can we find out more about it? Joyent Accelerator Cloud Computing IaaS My name is Kent Langley, Sr. Director, Joyent, Inc. ( The Joyent website is located at The scope of this exercise is the Joyent Accelerator product.
  • What is your system is for? It is essentially a system that provides infrastructure primitives as a service (IaaS) for building cloud computing applications, migrating enterprise data center operations to secure private clouds, or just hosting your blog. There is a page on the site called what scales on Joyent: Java, PHP, Ruby, Erlang, Perl, Python all work beautifully on Joyent. There is no lock-in. Ever. We try to run an open cloud. It's also a "loving cloud" if you ask our CTO. We have some of the largest Rails applications in the world, very high volume ejabberd XMPP infrastructure, exceptionally large Drupal installations, commerce sites in private clouds, .NET with Mono, TomCat, Resin, Glassfish, and much more all running on Accelerators. Joyent Accelerators are the perfect building blocks for almost any PaaS (Platform as a Service) play as well. Of particular note, Java runs exceptionally well on Accelerators because Accelerators are 64bit and you can also do 64 bit Java and have a JVM that could address as much as 32 GiB of RAM! This gives excellent vertical scalability for any running JVM. more below the fold
  • Why did you decide to build this system? There is demand for a high-end but reasonably priced elastic computing infrastructure.
  • How is your project financed? Self-Funded at this time.
  • What is your revenue model? We sell Joyent Accelerators, do Scale Consulting, and some related Services. We also have a growing Parter Channel.
  • How do you market your product? WebSite, Word of Mouth, Blogs, Email Lists, Twitter, Event Sponsorships, Open Source participation, forums, friendfeed, and more...
  • How long have you been working on it? I, Kent Langley, have been working with Joyent for about 2.5 years as a client. I've been with the company as an employee for about 2 months. Joyent has existed formally for about 4 years.
  • How big is your system? Try to give a feel for how much work your system does. We have hundreds and hundreds of servers representing significant compute power across 1000's of cores in multiple locations.
  • Number of monthly page views? Billions and Billions (multiple billion+ page view per month clients)
  • What is your in/out bandwidth usage? That's a secret.
  • How many documents, do you serve? How many images? How much data? Billions per month.
  • How fast are you growing? Fast enough to give me grey hairs.
  • What is your ratio of free to paying users? Very low. Most of our users have paid accounts. We do have some free offerings to help people get started. But, the demand for those services has been high so the lines are a little long.
  • What is your user churn? About Average for the industry we think.
  • How many accounts have been active in the past month? Thousands.

    How is your system architected?

  • What is the architecture of your system? Talk about how your system works in as much detail as you feel comfortable with. Our technology stack is predicated on something we call a Pod. We have several pods and plans to add more. From the top to bottom you'd find. BigIP F5 Application Switches Force 10 (1GB and 10GB switching) Custom Dell Hardware with some secret sauce Tier1 Hosting Providers Essentially a custom Solaris Nevada based OS Core w/ a Pkgsrc install system
  • What particular design/architecture/implementation challenges does your system have? Automation. Automation. Automation. Self-Service.
  • What did you do to meet these challenges? We have an amazing team of Systems Developers that work very hard to improve our ability to grow and manage systems each day. We have some great updates on our Roadmap coming up that should be very exciting for existing and potential customers.
  • How does your system evolve to meet new scaling challenges? Our system is by it's nature evolutionary. As technology grows and changes, we grow with it. A recent example is when a client needed a private cloud computing environment to achieve PCI compliance in a cloud environment. So, we worked with the client to create this. While it is in production for two clients already we consider this a beta product. But you should expect to see it as a formal offering soon. This is an example of a way we have evolved our systems to respond to the changing cloud computing market place.
  • Do you use any particularly cool technologies are algorithms? ZFS, BigIP, DTrace, OpenSolaris Nevada, and an in-house custom provisioning system we call MCP (hat's off to Tron)
  • What do you do that is unique and different that people could best learn from? We know our approach to things is a little different. But, we think that helps us inhabit a space that is different enough from other vendors in the Cloud Computing space that we offer a significant value proposition to a large cross-section of the IT industry. From the lone developer with a great idea that comes in and picks up a $199 per year 1/4 GiB Acclerator to a deployment that has literally 100's of Acclerators running the largest Rails applications on the planet. We are able to take good care of them both.
  • What lessons have you learned? If at first you don't succeed. Try, try again. Get over your mistakes and move on.
  • Why have you succeeded? We care. Our clients care. That's a nice fit.
  • How are you thinking of changing your architecture in the future? MORE secret sauce... But seriously, we have some great additions coming up soon. I'll be in touch.

    How is your team setup?

  • How many people are in your team? Joyent has a small employee to client ratio. But, that's because we do what we do well. We are divided into several of the normal divisions you might expect like client support, marketing, sales, development, operations, and the business units.
  • Where are they located? Our corporate office is in Sausalito, CA. We have a development team in Seattle. We have a support organization that follows the sun and spans the globe. IM is a big deal at Joyent.
  • Is there anything that you would do different or that you have found surprising? I'd say managing expectations is the most challenging thing. I think that's where we stand to improve the most and where most of the surprises come from.

    What infrastructure do you use?

  • Which languages do you use to develop your system? Ruby
  • How many servers do you have? Not as many as Google!
  • How is functionality allocated to the servers?
  • How are the servers provisioned? We have a custom cloud provisioning system called MCP.
  • What operating systems do you use? Customized Sun Solaris Nevada
  • Which web server do you use? Apache and Nginx are the work horses in Joyent Accelerators
  • Which database do you use? MySQL and PostGRES are included w/ every Accelerator. Oracle works of you bring your own licenses. CouchDB works. We are certifying more all the time.
  • Do you use a reverse proxy? Well, our clients often use Nginx and now that there is a viable port of Varnish to OpenSolaris we are seeing more of that. Some of our clients use Squid as well. Most popular reverse proxy software will run find on our setup.
  • Do you collocate, use a grid service, use a hosting service, etc? We are that.
  • What is your storage strategy? DAS/SAN/NAS/SCSI/SATA/etc/other? We provide NFS to our clients for $0.15/GiB. 1 GiB = 1024 MB.
  • How much capacity do you have? Many Terabytes
  • How do you grow capacity? Add hardware
  • Do you use a storage service? We are a storage service.
  • Do you use storage virtualization? Not really. It's been and continues to be tested. But, you can't beat the real thing still in many cases.
  • How do you handle session management? Our clients do this depending on their development platform of choice at the application layer. Also, we can of course use our BigIP load balancing infrastructure to help out with that also.
  • How is your database architected? Master/slave? Shard? Other? All of the above, client by client. We know that Master-Master MySQL, Master-Slave MySQL, Oracle Clusters, MySQL Clusters, PostGRES, etc. They all work fine.
  • How do you handle load balancing? We have F5 BigIP's and we do what we call a managed load balancing service. For example, if you have two application servers, you need to load balance. Just ask us to set you up a VIP and we'll add the nodes you specify for a cost per node. All the pricing information is here.
  • Which web framework/AJAX Library do you use? We have clients that use just about everything you can think of.
  • Which real-time messaging frame works do you use? We have very large clients running ejabberd. Erlang works great on our systems.
  • Which distributed job management system do you use? This is client by client. We do not offer this out of the box.
  • How do you handle ad serving? This is up to the client. We've seen just about all of them.
  • What is your object and content caching strategy? We usually recommend memcached, it's pre-installed and ready to turn on.
  • What is your client side caching strategy? I'd say most of our clients use cookies.

    How do you handle customer support?

    We have a customer support team that is dedicated to helping our customers. Our services pretty much assume that you will have some degree of ability with building and deploying systems. However, if you don't, we have standard, extended plan, and partners that can all be combined in various ways to help our clients. Our support follows the sun around the world.

    How is your data center setup?

  • How many data centers do you run in? Several. :) Currently only domestic on both coasts and elsewhere.
  • How is your system deployed in data centers? In-House Automated provisioning systems
  • Are your data centers active/active, active/passive? Everything is always on. Our clients often co-locate in multiple locations so that they can have solid DR scenarios to keep investors happen and recover quickly should a truck hit a telephone pole or something.
  • How do you handle syncing between data centers and fail over and load balancing? This is a complex topic and can be very simple of very complex. It's a bit out of scope for this document.
  • Which DNS service do you use? We run our own based on PowerDNS
  • Which switches do you use? Force10
  • Which email system do you use? Mostly Postfix
  • How do you handle spam? Filter at a variety of levels
  • How do you backup and restore your system? High level snap shots, clients are responsible for their own data primarily. However, we have ways to help them.
  • How are software and hardware upgrades rolled out? We do quarterly releases of key software and our Accelerators. Sometimes we get a little behind but try to roll with it. You get root on your Accelerator so you are not dependent on the Joyent release cycle at all.
  • How do you handle major changes in database schemas on upgrades? This is up to the clients and highly platform and applications specific.
  • What is your fault tolerance and business continuity plan? Lots of redundancy.
  • Do you have a separate operations team managing your website? No. We do it ourselves.
  • Do you use a content delivery network? If so, which one and what for? Yes. We are currently partnered with Limelight.
  • How much do you pay monthly for your setup? Accelerator plans range from $199 per year to $4000 per month. Significant discounts can be had if you pay ahead. But, it's very important to note that we do not require or even want contracts. Some companies try to force us into contracts and if you just MUST lock yourself in for years, we'll tie you down. But, we don't recommend it at all. In essence pay for what you need when you need it on a month to month granularity.


    The Joyent Accelerator is an extremely flexible tool for building and deploying all manner of infrastructure. If you have questions, please just contact us at Email or at an address is the best way to reach us usually.

    Related Articles

  • Scaling in the Cloud with Joyent's Jason Hoffman (podcast)
  • Amazon Web Services or Joyent Accelerators: Reprise by Jason Hoffman

    Click to read more ...

  • Wednesday

    The Pattern Bible for Distributed Computing

    Software design patterns are an emerging tool for guiding and documenting system design. Patterns usually describe software abstractions used by advanced designers and programmers in their software. Patterns can provide guidance for designing highly scalable distributed systems. Let's see how! Patterns are in essence solutions to problems. Most of them are expressed in a format called Alexandrian form which draws on constructs used by Christopher Alexander. There are variants but most look like this:

    • The pattern name
    • The problem the pattern is trying to solve
    • Context
    • Solution
    • Examples
    • Design rationale: This tells where the pattern came from, why it works, and why experts use it
    Patterns rarely stand alone. Each pattern works on a context, and transforms the system in that context to produce a new system in a new context. New problems arise in the new system and context, and the next ‘‘layer’’ of patterns can be applied. A pattern language is a structured collection of such patterns that build on each other to transform needs and constraints into an architecture. The latest POSA book Pattern-Oriented Software Architecture Volume 4: A Pattern Language for Distributed Computing will guide the readers through the best practices and introduce them to key areas of building distributed software systems using patterns. The book pulls together 114 patterns and shows how to use them in the context of distributed software architectures. Although somewhat theoretical it is still a great resource for practicing distributed-systems architects. It is as close as you're going to get to a one-stop "encyclopedia" of patterns relevant to distributed computing. However it is not a true encyclopedia since "over 150" patterns are referenced but not described in POSA Volume 4. The book does not go into the details of the pattern's implementations, so the reader should already be familiar with the patterns, or be prepared to spend some time researching. The pattern language for distributed computing includes patterns such as:
    • Broker
    • Client-Dispatcher-Server
    • Pipes and Filters
    • Leaders/Followers
    • Reactor
    • Proactor
    Patterns can indeed be useful in designing highly scalable systems and solving various problems related to concurrency, synchronization and resource management and other topics. Wikipedia has more details on Pattern languages to check out.

    Click to read more ...


    Scalability Worst Practices

    Brian Zimmer, architect at travel startup Yapta, highlights some worst practices jeopardizing the growth and scalability of a system: * The Golden Hammer. Forcing a particular technology to work in ways it was not intended is sometimes counter-productive. Using a database to store key-value pairs is one example. Another example is using threads to program for concurrency. * Resource Abuse. Manage the availability of shared resources because when they fail, by definition, their failure is experienced pervasively rather than in isolation. For example, connection management to the database through a thread pool. * Big Ball of Mud. Failure to manage dependencies inhibits agility and scalability. * Everything or Something. In both code and application dependency management, the worst practice is not understanding the relationships and formulating a model to facilitate their management. Failure to enforce diligent control is a contributing scalability inhibiter. * Forgetting to check the time. To properly scale a system it is imperative to manage the time alloted for requests to be handled. * Hero Pattern. One popular solution to the operation issue is a Hero who can and often will manage the bulk of the operational needs. For a large system of many components this approach does not scale, yet it is one of the most frequently-deployed solutions. * Not automating. A system too dependent on human intervention, frequently the result of having a Hero, is dangerously exposed to issues of reproducibility and hit-by-a-bus syndrome. * Monitoring. Monitoring, like testing, is often one of the first items sacrificed when time is tight.

    Click to read more ...


    Product: Happy = Hadoop + Python

    Has a Java only Hadoop been getting you down? Now you can be Happy. Happy is a framework for writing map-reduce programs for Hadoop using Jython. It files off the sharp edges on Hadoop and makes writing map-reduce programs a breeze. There's really no history yet on Happy, but I'm delighted at the idea of being able to map-reduce in other languages. The more ways the better. From the website:

    Happy is a framework that allows Hadoop jobs to be written and run in Python 2.2 using Jython. It is an 
    easy way to write map-reduce programs for Hadoop, and includes some new useful features as well. 
    The current release supports Hadoop 0.17.2.
    Map-reduce jobs in Happy are defined by sub-classing happy.HappyJob and implementing a 
    map(records, task) and reduce(key, values, task) function. Then you create an instance of the 
    class, set the job parameters (such as inputs and outputs) and call run().
    When you call run(), Happy serializes your job instance and copies it and all accompanying 
    libraries out to the Hadoop cluster. Then for each task in the Hadoop job, your job instance is 
    de-serialized and map or reduce is called.
    The task results are written out using a collector, but aggregate statistics and other roll-up 
    information can be stored in the happy.results dictionary, which is returned from the run() call.
    Jython modules and Java jar files that are being called by your code can be specified using 
    the environment variable HAPPY_PATH. These are added to the Python path at startup, and 
    are also automatically included when jobs are sent to Hadoop. The path is stored in happy.path 
    and can be edited at runtime. 

    Click to read more ...