Real World Web: Performance & Scalability

We've referenced this 189 slide masterpiece by Ask Bjorn Hansen before, but it was hidden without its own first class link. He describes his presentation as 3 hours of 5 minute lightening talks and that sounds about right.

The presentation covers: overall platform and architecture considerations involved in tuning applications from a holistic perspective. You’ll be shown design scalable architectures for dynamic, high-volume web sites. Topics covered include caching, scalable database design, replication architecture, load-balancing, and architectural decisions derived from many years of experience.

His prime directive of scaling: Think Horizontally at every point in your architecture, not just at the web tier.

You may not agree with everything, but there's a lot of useful advice. Here's a summary of some of what is covered:

  • Benchmarking
  • Vertical scaling sucks.
  • Horizontal scaling rocks.
  • Run many application servers
  • Don't keep state in the app server
  • Be stateless
  • Optimization is necessary, but is different than scalability.
  • Cache things you hit all the time.
  • Measure, don't assume, check.
  • Make pages static.
  • Caching is a trade-off.
  • Cache full pages.
  • Cache partial pages.
  • Cache complex data.
  • MySQL query cache is flushed on update.
  • Cache invalidation is hard.
  • Replication scales reads, not writes.
  • Partition to scale writes. 96% of applications can skip this step.
  • Master-master setup facilitates on-line schema changes.
  • Create summary tables and summary databases rather than do COUNT and GROUP-BY at runtime.
  • Make code idempotent. If it fails you should just be able to run it again.
  • Load data asynchronously. Aggregate updates into batches.
  • Move processing to application and out of the database as much as possible.
  • Stored procedures are dangerous.
  • Add more memory.
  • Enable query logging and take a look at what your app is doing.
  • Run different MySQL instances for different work loads.
  • Config tuning helps, query tuning works.
  • Reconsider persistent DB connections.
  • Don't overwork the database. It's hard to scale.
  • Work in parallel.
  • Use a job queuing system.
  • Log http requests.
  • Use light processes for light tasks.
  • Build on APIs internally. Clean loosely coupled APIs are easy to scale.
  • Don't incur technical debt.
  • Automatically handle failures.
  • Make services that always work.
  • Load balancing is the key to horizontal scaling.
  • Redundancy is not load-balancing. Always have n+1 capacity.
  • Plan for disasters.
  • Make backups.
  • Keep software deployments easy.
  • Have everything scripted.
  • Monitor everything. Graph everything.
  • Run one service per server.
  • Don't ever swap memory for disk.
  • Run memcached if you have extra memory.
  • Use memory to save CPU or IO. Balance memory vs CPU vs IO.
  • Netboot your application servers.
  • There's lot of good slides on what to graph.
  • Use a CDN.
  • Use YSlow to find client side problems.

    This is just a high level blitz through the presentation. Topics are given a lot more detail in the presentation. Audio of Ask's dulcet tones would be nice, but there's still a lot to learn here.
  • Sunday

    TechDev Stages

    Tech Dev Stages explains the basic steps involved for the product development given business problems. A must read for newbie or starters for architecture development.


    ThePort Network  Architecture

    ThePort Network's Director of Engineering, TJ Muehleman was kind of enough to share some of the architectural details for their white label social media system. It currently runs about 50 social networks varying in size from less than 1000 members to more than 300,000 members, all on a Microsoft stack. In addition to their social networking platform, they offer Javascript APIs and web service APIs (both REST and SOAP) which account for a significant percentage of overall system usage.

    ThePort is an excellent example of a real world in-the-trenches product offering real value to customers. One of the most interesting problems they have to solve is multi-tenancy. How do you provide good performance, complete customization, support, develop new features, and provide individual search indexes for each customer? It's not an easy problem to solve.

    How did they solve their problems and build a successful system? 



  • Microsoft.NET 3.5
  • C# / VB.NET
  • SQL Server 2005
  • Visual Studio 2008 Pro Edition
  • Prototype
  • Subversion
  • TortoiseSVN
  • Trac (for internal defect tracking. Will possibly move all internal and external issue tracking to it)
  • Beyond Compare 3
  • Web Tier
    * 6 x Dell blade servers running windows 2008 / IIS 7
  • Data Tier
    * 1 r/w SQL Cluster – dell 6850s (6 single core processors, 32 GB RAM)
    * 2 read-only dell 2950 (2 quad core processors, 16 GB RAM)
    * 1 distribution server – dell 2950 (2 quad core processors, 16 GB RAM)
  • We also use SQL Server Service Broker as a queuing system for some of our saves. It's an alternative to MSMQ that uses the DB for persistence in case of failure. We will most likely be moving to MSMQ in the near future to remove us from SQL dependence.
  • Caching
    * 2 Dell blade servers 8 GB RAM each to total 16 GB of available RAM
    * Running SharedCache (Basically an open source .NET port of MemCacheD. We initially looked at MemCacheD but our internal benchmarking indicated SharedCache had better performance – at least w/in a Microsoft environment. We may still investigate Microsoft's Velocity cache platform when it goes live)
  • Search
    * 2 Dell 2950s with 725 GB Storage
    * Running Lucene + SOLR
    * We chose Lucene over Lucene.NET because Lucene.NET's wildcard search was a little buggy in our initial beta testing. SQL Full Text wasn't a viable option because there was no clear and easy way to split indexes between customers. SOLR cores make this part easy. Above and beyond that, Lucene is lightning fast and is available with features we couldn't turn down (proximity search, searching w/in documents, and built-in RESTful APIs to name a few)

    How do you handle multi-tenancy?

    A multi-tenant platform has two primary hurdles to overcome:

    1. Preventing a single, large customer from overwhelming the system?

    The primary bottleneck for this is in the data layer. Our current DB architecture has helped mitigate this problem. The read-only servers help offset most of this by absorbing the bulk of the data calls. We did have to beef up the distribution server because latency between the r/w server and the read only servers had crept too high. Getting a new machine (2 quad cores with 16 GB of RAM) helped reduce the latency to less than a second.

    However robust the cluster is, we've concluded that we will eventually have to move to a sharded architecture with MySQL. MS SQL licensing fees makes both continuing to enhance the cluster and scaling out to multiple machines prohibitive. Additionally, sharding allows us to scale either by customer (because some may be more active than others) or by functional area (photos, comments, etc).

    2. Allowing clients to have total control over the look, feel, and user experience of their sites.

    Allowing CSS control isn't enough; we needed a templating system that allows total control over the site. We looked at using .NET master pages and user controls to accomplish this. But that assumes a level of knowledge in .NET for outside developers. We built a proprietary templating system that unfortunately became too limiting and would one day lead to a drag on performance.

    So we settled on using XML / XSLT. All of our business / entity objects are serializable to XML. This made XSLT a natural choice from the templating angle. We've seen a considerable boost in performance from this upgrade and an even greater increase in flexibility in terms of what our designers can do. Once the learning curve is overcome, the web designers love the amount of control they get.

    What did you do that was especially cool that people could learn from?

    XSLT as Custom Templating System

    Building a templating system in XSLT that actually allows the template author to make a web service call to our internal web service layer (or external web services) straight from the templating system. This allowed the development team to build a flexible, powerful system that allows a web designer to embed real-time calls into a given template. We accomplish this using XSLT Extension Objects. What we've found in our internal testing is that these extension objects scale way better than our previous templating system (a homegrown proprietary system). We've used ANTS profiler to compare the two and the difference is in orders of magnitude.

    Obviously we have to cache the hell out of this or the performance of the pages the calls are embedded in would suffer. For now, we make the internal web services calls via HTTP, but we will soon be moving this to a TCP call to take advantage of the better connection pooling offered by TCP. We're most likely to use WCF because of it's native support of TCP bindings. However, we haven't yet benchmarked that so it's possible it could change.

    Not Using the Database to Build Collections

    Another cool thing we've done is to move strongly away from using the database to retrieve collections of 'things'. For instance, if we needed a collection of comments, previously we'd hit the database for the 5, 10, 100, etc comments we wanted, do the sorting / filtering in the DB, return a single dataset, cache that, and then display.

    However, this is a database intensive operation, especially if you're going to join against user data (which you inevitably will). What we've started doing recently is caching the recent comment objects, and using our cache providers MultiGet ability to simultaneously retrieve all comments at the same time. We then sort / filter in memory in the application tier, discard whatever comments we don't need, and then display. We found that doing it this way, we save lots of hits to our database and in fact, saw a considerable performance gain from it.

    Our tests (on a developer laptop) fetched 10,000 objects from cache in about 1 second, then sorted them by date time in about .015 second.

    What prompted you to move to a SOA architecture?

    To better compartmentalize our code.

    Given the growth of our templating system mentioned above, we realized it was best to truly separate the tiers into discrete areas. Since our application is easily accessed via a set of REST APIs and our own internal skinning system (and who knows what in the future), dividing the application like this gives us a lot of leeway in being able to swap out components. Additionally, we're doing more and more queuing which lines up nicely with SOA.


    Since modern web apps deal with complex data, breaking the work into more discrete operations handled by offline processes on their own infrastructure makes a lot of sense from a performance point of view.

    How do you handle consistency between the database and the search engine?

    We have a multi-threaded windows service that scans our database once every 5 minutes looking for new data. The service then adds the new items to the Lucene index. We keep audit columns on all our database tables so capturing new data is pretty simple. Once a night, we purge the Lucene index and run a full rescan of the database. We think this system will work for the near to mid term but long term, we'll take advantage of a queuing system to keep the index in sync.

    How you handle your release, support, bug fixing, development, etc.

    We have a decent sized dev team. 1 platform architect responsible for overall system architecture (selecting which systems to use, tuning them), 1 lead software architect, and 3 senior – mid level developers. Since we're a start-up in a fast evolving market (social media) we find that we're constantly having to adjust to market demands and the latest in social functionality. So we have a 2 month build cycle which is pretty aggressive.

    In terms of actual development, we've found the following to be keys to success:
    1. Daily stand-ups: it's absolutely necessary for everyone on the team to know what the other is doing. A code base as large as ours, it's very likely I'm writing a function someone has already written or solving a problem someone has solved previously. Daily stand-ups help with that
    2. Iterate: Build the core functionality, get it into QA and / or beta, beat the bugs out of it, move to the next piece. We've found this to be easier said than done. Market pressures sometimes dictate you roll with something more feature rich than you'd like. Sticking to an iterative cycle creates better code and more market ready products.
    3. Beta test: This goes hand-in-hand w/ #2 above. Get something done and get it in the hands of actual users. This is the best way to find where your app falls down

    With regard to support / bug fixing, we're moving to a forums based support model for many of our customers. We've found the same problems, especially in an app as configurable as ours, occur over and over. Getting those answers into an open, searchable format should hopefully cut down on confusion and get developers talking directly to developers.

    Internally we use Trac for bug tracking and devote roughly 20% of our week maintaining, supporting, and fixing issues. That may seem like a lot but given how configurable our system is, we're essentially running 50 heavily data driven websites.

    WCF sounds like a buggy underpeforming mess. How is it working?

    So far we have no complaints with WCF. I think baking it directly into .NET 3.5 helped iron a lot of the big kinks out. It does come with it's quirks, no doubt. We built our REST libraries on top of it and found that posting XML is not exactly the easiest thing in the world. But it was more than made up for with the ease in deploying all our GET operations with REST. Our next step will be to set up TCP and MSMQ bindings with WCF to handle our internal service requests and queuing, respectively. Since WCF exposes all of these bindings natively, we think we will see a lot of effective code re-use out of this.

    I'd like to thank TJ for taking the time and making the effort to write up their architecture for people to learn from. I'm sure it will help others when they are trying to build their own systems.

    You too can share the architecture for your amazing system. Come on, you've learned a lot from others, it's time to return the favor and give back. It's not that hard, really. If interested please contact me and we can get started.
  • Thursday

    Reconnoiter - Large-Scale Trending and Fault-Detection

    One of the top recommendations from the collective wisdom contained in Real Life Architectures is to add monitoring to your system. Now! Loud is the lament for not adding monitoring early and often. The reason is easy to understand. Without monitoring you don't know what your system is doing which means you can't fix it and you can't improve it. Feedback loops require data.

    Some popular monitor options are Munin, Nagios, Cacti and Hyperic. A relatively new entrant is a product called Reconnoiter from Theo Schlossnagle, President and CEO of OmniTI, leading consultants on solving problems of scalability, performance, architecture, infrastructure, and data management. Theo's name might sound familiar. He gives lots of talks and is the author of the very influential Scalable Internet Architectures book.

    So right away you know Reconnoiter has a good pedigree. As Theo says, their products are born of pain, from the fire of solving real-life problems and that's always a harbinger of good things to come.

    The problem Reconnoiter is trying to solve is monitoring thousands of nodes across many datacenters where the nodes can vary widely in power, architecture, and software configuration. With that kind of problem what they really want is the ability to:


  • Configure everything from one place.
  • Cheap checks that are made on the specified time interval and aren't late and don't cause a heavy load on the machine.
  • Change the configuration from any datacenter without coordination.
  • Add checks in the field.
  • Separate data collection from visualization and fault-detection.
  • Analyze trends for long-term capacity planning and postmortem analysis.
  • Detect when faults have happened and when they are about to happen.
  • Support trending: the intelligent data correlation, regression analysis/curve fitting and looking into the past to see how much you go where you are now so you can do better next time.
  • Create a monitoring system that doesn't require a separate powerful network and its own set of hosts on which to run.

    If you've ever used or written a distributed stats collection system the architecture of Reconnoiter will look somewhat familiar:

    Some of the more interesting bits of the architecture are:
  • PostgresSQL stores all the data. The data isn't stuck in funky little files.
  • Fault-detection is based on Esper, a streaming complex event processing system. It's not clear how well this approach will work but the hooks are there.
  • A Comet-style web server is used to feed real-time updates. Much better than your traditional polling cycle.
  • Although the web console is PHP based, PHP is used mainly to execute Json calls. Rendering happens in the browser in an AJAX client.
  • Canvas is used for real time graphics. No images are created on the fly.
  • Data is transferred securely over SSL.
  • The system is robust against failures.
  • Data is not thrown away as it is with some systems so you can check against history.

    Reconnoiter isn't completely pain free. Lua for an extension language is an interesting choice. The installation and configuration process is very complex. There are a lot of separate steps and bits to configure. Another potential problem is monitoring produces a lot of real-time data. I have to wonder if PostgresSQL can handle that flow for very large systems. The data is partitioned by month, but a large number of machines and a large number of events can be crushing. And I wasn't sure if graph data could be correlated with released features or other system changes. In the video Theo mentions seeing in the graphs that using deflate improved performance, but I'm not sure just looking at the graph how you would be able correlate system data with system changes.

    It's droolingly clear where Reconnoiter shines is on creating complex graphs, charts, and other visualizations. The graphs look useful and quick to render. The real time visualizations are spectacular and extremely are difficult to do in other systems.

    Related Articles

  • OmniTI Reconnoiter: Web Management and Analysis by Eric J. Bruno
  • Reconnoiter Update by Theo Schlossnagle
  • Reconnoiter Project Home Page
  • Video: Reconnoiter: a whirlwind tour
  • Big Picture of the Overall System
  • Reconnoiter: Monitoring and Trend Analysis from OSCON
  • OmniTI Unveils Open Source Monitoring Tool, Reconnoiter by Jayashree Adkoli
  • The sad state of open source monitoring tools by Grig Gheorghiu
  • How to Succeed at Capacity Planning Without Really Trying : An Interview with Flickr's John Allspaw on His New Book.
  • New open source IT management tool: Lighter-weight than Nagios, more granular than Cacti by Matt Stansberry
  • Tuesday

    13 Scalability Best Practices

    AFK Partners has release what they feel are the Best Practices for Scalability:

    1. Asynchronous - Use asynchronous communication when possible. 
    2. Swim Lanes – Create fault isolated “swim lanes” of hardware by customer segmentation.
    3. Cache - Make use of cache at multiple layers.
    4. Monitoring - Understand your application’s performance from a customer’s perspective. 
    5. Replication - Replicate databases for recovery as well as to off load reads to multiple instances.
    6. Sharding - Split the application and databases by service and / or by customer using a modulus. 
    7. Use Few RDBMS Features – Use the OLTP database as a persistent storage device as much as possible. 
    8. Slow Roll – Roll out new code versions slowly, to a small subset of your servers without bringing the entire site down.
    9. Load & Performance Testing – Test the performance of the application version before it goes into production. 
    10. Capacity Planning / Scalability Summits – Know how much capacity you have on all tiers and services in your system. 
    11. Rollback – Always have the ability to rollback a code release.
    12. Root Cause Analysis - Ensure you have a learning culture that is evident by utilizing Root Cause Analysis to find and fix the real cause of issues.
    13. Quality From The Beginning – Quality can’t be tested into a product, it must be designed in from the beginning.
    This is just a quick summary, more details on their site.


    Writing about cisco loadbalancer?


    At one of my jobs I have to administer a CISCO ACE (application control engine) hardware load-balancer.
    I don't particularly love this beast, but it's very very powerful.

    There appears to be little real-world info out there, so it could be interesting writing an article on that.

    But I don't have other HW LB's to compare it to and I don't want to rehash the product page.

    What would interest you in a 'product review' of a loadbalancer?
    No replies means it's not an interesting topic, so no article then ;-)


    NoSQL: If Only It Was That Easy

    In this blog post BJ Clark overviews the available key-value stores that aren't SQL based.
    On some projects he yields interesting experiences. Make sure to check out the comments as well.

    The post has spread throughout social tiny text services so there is a chance you've already read it, but still worthwhile to put here.


    1dbase vs. many and cloud hosting vs. dedicated server(s)?

    Me and my partner are making a blueprint for an online webshop service. The purpose of this project is to make webshops available for small company's/ individuals automatically just by creating an account with us. Our webapp can be used to add products/pages/... to the store and we'll handle secure checkout by paypal.

    Our app should be scalable and manageable. Because we also want to offer free webshops, the amount of webshops could be +10.000 within a few years. We are building on the Zend framework and are using mysql for database.

    From the start we want to build our application for optimal and easy scalability in the future, to avoid a lot changes to our app/database in the future.

    Now our questions are:

    Should we use?:
    * one database for all shops (or limited to X shops );
    * one database for each new shop (each having products, orders... tables);

    I think both approaches have PRO/CONS. What do you think ? Does anyone has experience with this kind of structure ?

    one database: easier to make changes to database layout
    multiple databases:more scalable, easier to backup/restore

    one database: harder to code because of extra keyfields in tables , slower or more difficult to backup/restore.
    multiple databases: Takes longer to push changes to database layout

    We are totally not clear on what hosting we should use. Would it be a solution to use a cloud service such as mosso/amazon/gogrid? Or is it better to just start with one dedicated server and expand 'manually' later?

    Thanks in advance for your help!


    Yahoo!'s PNUTS Database: Too Hot, Too Cold or Just Right?

    So far every massively scalable database is a bundle of compromises. For some the weak guarantees of Amazon's eventual consistency model are too cold. For many the strong guarantees of standard RDBMS distributed transactions are too hot. Google App Engine tries to get it just right with entity groups. Yahoo! is also trying to get is just right by offering per-record timeline consistency, which hopes to serve up a heaping bowl of rich database functionality and low latency at massive scale:

    We describe PNUTS [Platform for Nimble Universal Table Storage], a massively parallel and geographically distributed database system for Yahoo!’s web applications. PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of con-current requests including updates and queries, and novel per-record consistency guarantees. It is a hosted, centrally managed, and geographically distributed service, and utilizes automated load-balancing and failover to reduce operational complexity. The first version of the system is currently serving in production. We describe the motivation for PNUTS and the design and implementation of its table storage and replication layers, and then present experimental results.

    Some of the cool things about PNUTS are:

  • They actually talk about the hard problem of how to scale a system to 10 different data centers (each with 1,000+ servers) while supporting secondary indexes, materialized views, the ability to create multiple tables, and hash-organized tables. Multi-datacenter operation is so difficult it's usually ignored. PNUTS is designed specifically to operate in many datacenters with a strongish consistency model, which makes it a very interesting design point.
  • You can subscribe to a reliable ordered stream of updates on a table. This is massively convenient. For many applications numerous processes are tied to data changes and this is normally a pain to implement.
  • The consistency model is a per-record timeline consistency: all replicas of a given record apply all updates to the record in the same order. This provides a consistency model that is between the two extremes of serialized transactions and eventual consistency. Conflicting records can't exist at the same time as is allowed by Dynamo.
  • Supports records, but accepts queries only on individual tables. There's no fixed schema for records and columns can be typed or be blobs. Transactions exist only at the record level. This means like with other NoSQL databases denormalization is the modeling strategy as there's no way to have transactions across tables.
  • The degree of read consistency desired can be specified. Records are versioned. You can ask for the latest record version or allow for potentially stale records.
  • Asynchronous replication is used to ensure low write latency while providing geographic replication. Reads should be fast everywhere and may return older versions, writes should be fast locally (in the same datacenter). [3]
  • A message broker that serves both as the replication mechanism and redo log of the database. The message broker guarantees no replica can receive updates out of order because it provided a reliable, totally ordered message channel. This also means you can't have transactions across tables, consistent joins, or foreign keys. They chose this approach over a gossip mechanism (like Dynamo) because it can be optimized for geographically distant replicas and because replicas do not need to know the location of other replicas.
  • PNUTS is hosted and centrally managed. It was built to reduce the overhead of creating and maintaining new applications rather than every property creating their own system. Failover, adding capacity, resharding, performance isolation and supporting applications with different usage profiles are all completely automated.
  • Predicate queries are supported using a scatter-gather mechanism which sends the query to every relevant storage tablet at once. The are gathered sent back to the client.
  • Performance is 1-10ms/request when caching layers are in place.

    From a system perspective PNUTS offers a lot of the good things: hosted, reliability, lowish latency, automation, scalability, supports many application models, and there's a lot of room to improvement that all applications will be able to take advantage of when available.

    From a programmer perspective the are also a lot of good things: it's hosted so fewer worries, notifications, flexible schemas, ordered records, secondary indexes, lowish latency, strong consistency on a single record, scalability, high write rates, reliability, and range queries over a small set of records.

    Unfortunately Goldilocks still needs to keep searching for just right, though she may be getting closer. From a system perspective Yahoo!'s ideas are good, but they don't help you as the system isn't available for you to use. From a programmer perspective the programmer's job is still way too hard. To be just right programmer's need low latency aggregate operators, complex transactions, scalable counters, automatic relationship management, and all the other features that will help them just buy instant porridge and be done with it.

    Related Articles

  • Anti-RDBMS: A list of distributed key-value stores
  • Details on Yahoo's distributed database by Greg Linden
  • Thoughts on Yahoo's PNUTS distributed database by Marton Trencseni
  • Data Challenges at Yahoo! - Ricardo Baeza-Yates & Raghu Ramakrishnan
    Yahoo! Research
  • PNUTS: Yahoo!'s Hosted Data Serving Platform by Lucian
  • Yahoo’s PNUTS by Henry Robinson. A very thoughtful and informative overview of the paper.
  • How robust are gossip-based communication protocols? by Lorenzo Alvisi et al.
  • Trading consistency for scalability in distributed architectures.
  • Asynchronous View Maintenance for VLSD Databases by Parag Agrawal et al.
  • BigTable
  • Dynamo
  • Are Cloud Based Memory Architectures the Next Big Thing?
  • The Story of Goldilocks and the Three Bears
  • PNUTS - Platform for Nimble Universal Table Storage
  • Friday

    Strategy: Break Up the Memcache Dog Pile 

    Update: Asynchronous HTTP cache validations. A proposed HTTP caching extension: if your application can afford to show slightly out of date content, then stale-while-revalidate can guarantee that the user will always be served directly from the cache, hence guaranteeing a consistent response-time user-experience.

    Caching is like aspirin for headaches. Head hurts: pop a 'sprin. Slow site: add caching. Facebook must have a lot of headaches because they popped 805 memcached servers between 10,000 web servers and 1,800 MySQL servers and they reportedly have a 99% cache hit rate. But what's the best way for you to cache for your application? It's a remarkably complex and rich topic. Alexey Kovyrin talks about one common caching problem called the Dog Pile Effect in Dog-pile Effect and How to Avoid it with Ruby on Rails. Glenn Franxman also has a Django solution in MintCache.

    Data is usually cached because it's too expensive to calculate for every hit. Maybe it's a gnarly SQL query you want to avoid and a little stale data is OK. Or maybe the amount of data you have is simply larger than physical memory on any one machine. Or maybe you have the temerity to write to your database and cause its cache to flush so database caching isn't sufficient at a certain level of scale.

    Typical examples are for caching article vote counts, comment threads, and event streams. One familiar example that bit me hard is displaying the the top N blog articles. Do you want to scan through your entire access log table for every page display? Absolutely not. Especially when the nightly backups are going on and the network is very slow. Not good :-) Yet you still want to update the results every X minutes so the stats stay fresh.

    Data freshness requires a refrigeration truck or an expiry time on your cache entry that causes stats to be periodically recalculated. Now, what happens when your cached data expires and a 1000 requests simultaneously try to recalculate the expensive to calculate data? Database load spikes and the world nearly ends. And since memcached operations are not atomic it's possible stale data could be cached and you'll serve stale data. Which kind of defeats of the purpose of taking load off the data while providing accurate data. So, how do you unpile the dogs?

    No Expire Solution

    If cache items never expire then there can never be a recalculation storm. Then how do you update the data? Use cron to periodically run the calculation and populate the cache. Take the responsibility for cache maintenance out of the application space. This approach can also be used to pre-warm the the cache so a newly brought up system doesn't peg the database.

    The problem is the solution doesn't always work. Memcached can still evict your cache item when it starts running out of memory. It uses a LRU (least recently used) policy so your cache item may not be around when a program needs it which means it will have to go without, use a local cache, or recalculate. And if we recalculate we still have the same piling on issues.

    This approach also doesn't work well for item specific caching. It works for globally calculated items like top N posts, but it doesn't really make sense to periodically cache items for user data when the user isn't even active. I suppose you could keep an active list to get around this limitation though.

    Stale Date Solution

    This solution introduces a stale date in addition to the expiration date. Glen describes it as:

    The first client to request data past the stale date is asked to refresh the data,
    while subsequent requests are given the stale but not-yet-expired data as if it
    were fresh, with the understanding that it will get refreshed in a 'reasonable'
    amount of time by that initial request

    In the memcached FAQ a one key approach is described:

  • Set the cache item expire time way out in the future.
  • Embed the "real" timeout serialized with the value. For example, set the item to timeout in 24 hours, but the embedded timeout might be five minutes in the future.
  • On a get from the cache determine if the stale timeout expired and on expiry immediately set a time in the future and re-store the data as is. This closes down the window of risk.
  • Fetch data from the DB and update the cache with the latest value.

    Alexey describes a different two key approach:
  • Create two keys in memcached: MAIN key with expiration time a bit higher than normal + a STALE key which expires earlier.
  • On a get read STALE key too. If the stale has expired, re-calculate and set the stale key again.

    I dislike embedding meta data with data so I like Alexey's approach a bit better, even though it doubles the key space.

    None of these options prevent the problem for ever happening, but they do greatly reduce the failure window for relatively little cost.

    Related Articles

  • Memcached Tag at High Scalability
  • Caching Makes Your Brain Explode by Craig Ambrose.
  • The Secret to Memcached by Tobias Lütke.
  • Memcached FAQ.
  • Dog-pile Effect and How to Avoid it with Ruby on Rails memcache-client Patch by Alexey Kovyrin.
  • MintCache by Glenn Franxman.
  • Advanced Rails Caching.. on the Edge by Aaron Batalion.