« Comet - An Example of the New Key-Code Databases | Main | PaaS shouldn’t be built in Silos »

Google Pro Tip: Use Back-of-the-envelope-calculations to Choose the Best Design

How do you know which is the "best" design for a given problem? If, for example, you were given the problem of generating an image search results page of 30 thumbnails, would you load images sequentially? In parallel? Would you cache? How would you decide?

If you could harness the power of the multiverse you could try every possible option in the design space and see which worked best. But that's crazy impractical, isn't it?

Another option is to consider the order of various algorithm alternatives. As a prophet for the Golden Age of Computational Thinking, Google would definitely do this, but what else might Google do?

Use Back-of-the-envelope Calculations to Evaluate Different Designs

Jeff Dean, Head of Google's School of Infrastructure Wizardry—instrumental in many of Google's key systems: ad serving, BigTable; search, MapReduce, ProtocolBuffers—advocates evaluating different designs using back-of-the-envelope calculations. He gives the full story in this Stanford video presentation.

Back-of-the-envelope calculations are estimates you create using a combination of thought experiments and common performance numbers to a get a good feel for which designs will meet your requirements. Dr. Dean thinks an important skill for every software engineer is the ability to estimate the performance of alternative systems, using back of the envelope calculations, without having to build them. 

He gives a great example of the process in the video, but first...

Numbers Everyone Should Know

To evaluate design alternatives you first need a good sense of how long typical operations will take. Dr. Dean gives this list:

  • L1 cache reference 0.5 ns
  • Branch mispredict 5 ns
  • L2 cache reference 7 ns
  • Mutex lock/unlock 100 ns
  • Main memory reference 100 ns
  • Compress 1K bytes with Zippy 10,000 ns
  • Send 2K bytes over 1 Gbps network 20,000 ns
  • Read 1 MB sequentially from memory 250,000 ns
  • Round trip within same datacenter 500,000 ns
  • Disk seek 10,000,000 ns
  • Read 1 MB sequentially from network 10,000,000 ns
  • Read 1 MB sequentially from disk 30,000,000 ns
  • Send packet CA->Netherlands->CA 150,000,000 ns 

Some things to notice:

  • Notice the magnitude differences in the performance of different options.
  • Datacenters are far away so it takes a long time to send anything between them.
  • Memory is fast and disks are slow.
  • By using a cheap compression algorithm a lot (by a factor of 2) of network bandwidth can be saved.
  • Writes are 40 times more expensive than reads.
  • Global shared data is expensive. This is a fundamental limitation of distributed systems. The lock contention in shared heavily written objects kills performance as transactions become serialized and slow.
  • Architect for scaling writes.
  • Optimize for low write contention.
  • Optimize wide. Make writes as parallel as you can.

Example: Generate Image Results Page of 30 Thumbnails

The is the example given in the video. Two design alternatives are used as design thought experiments.

Design 1 - Serial 

  • Read images serially. Do a disk seek. Read a 256K image and then go on to the next image.
  • Performance: 30 seeks * 10 ms/seek + 30 * 256K / 30 MB /s = 560ms

Design 2 - Parallel 

  • Issue reads in parallel.
  • Performance: 10 ms/seek + 256K read / 30 MB/s = 18ms
  • There will be variance from the disk reads, so the more likely time is 30-60ms

Which design is best? It depends on you requirements, but given the back-of-the-envelope calculations you have a quick way to compare them without building them.

Now you have a framework for asking yourself other design questions and comparing different design variations:

  • Does it make sense to cache single thumbnail images?
  • Should you cache a whole set of images in one entry?
  • Does it make sense to precompute the thumbnails?
To make these estimates realistic you'll have to know the performance of your services. If there is an unknown variable then perhaps you could rapidly prototype just that part to settle the question.  To know if caching is a good design alternative, for example, you'll have to know how long it takes to write into your cache.

Lessons Learned

  • Back-of-the-envelope calculations allow you to take a look at different variations.
  • When designing your system, these are the kind of calculations you should be doing over and over in your head.
  • Know the back of the envelope numbers for the building blocks of your system. It's not good enough to just know the generic performance numbers, you have to know how your subsystems perform. You can't make decent back-of-the-envelope calculations if you don't know what's going on.
  • Monitor and measure every part of your system so you can make these sorts of projections from real data.

I personally quite like this approach. It seems much more grounded in the end-to-end nature of a system than is common. The practice today is to focus on the trickeration of various algorithms, which are really a researchable and pluggable part of this larger, more holistic analysis.

Related Articles

Reader Comments (12)

If the images are all on the same disk (the most likely case), there is no benefit to parallelism, as the disk will be single-threaded. The problem with guestimates is that they are often too abstract, especially when thinking inside the box.

It's still a great technique, and there's a very good book called "Guesstimation" (http://www.amazon.com/Guesstimation-Solving-Worlds-Problems-Cocktail/dp/0691129495) that walks you through the process for real-world (non-computer) examples.

January 26, 2011 | Unregistered CommenterRoss Patterson

So, in the design 2 example 30 reads will be spinning and reading a single disk all at the same time. That is some magical disk!
In that case, I suggest a design 3 example - invent a time machine, travel back in time 560ms, and read 30 files sequentially.
Back-of-the-envelope that!

Round trip within same datacenter 500,000 ns this number is off by at least a factor of 2.

January 26, 2011 | Unregistered CommenterTim

@Ross , Tim.

So you think there's some single magical disk that holds ALL the thumbnails stored by Google???? Really???

If Google (or any website of non-trivial scale) needs to "generate image results page of 30 thumbnails", it's almost certain that those 30 thumbnails are stored in 30 different disks.

January 26, 2011 | Unregistered CommenterJohn

Another Google lesson that stuck with me is the phrase, "we usually try not to spend too long optimizing any one binary." In other words, first try to make your application use more hardware effectively. After that, if it's costing too much in hardware or it's still slow with plenty of resources, you can tune the code.

January 26, 2011 | Unregistered CommenterRandall

"So you think there's some single magical disk that holds ALL the thumbnails stored by Google?"

No, but that's how the problem was posed. You want to spread those out, you need ti add network transit time too - but it still wouldn't be the problem that was posed.

January 26, 2011 | Unregistered CommenterRoss Patterson

Round trip within same datacentre:

Northern California to the Netherlands is about 9,000 km great-circle according to Google Erf. *2 = 18000km/18,000,000 metres.

Let's say two randomly selected servers in a Google data centre are 100 metres apart = 200 metres roundtrip.

18,000,000/200 = 90,000 = the round trip should be 90,000 times longer between Google.com and Google.nl than it is between server17.switch23.datacentre4.google.com and server3.switch22.datacentre4.google.com.

150,000,000/500,000 = 3,000. hmm, there's some rong floating about.

c in glass is about 200,000Km/s, so 18,000Km = 0.09s = 90ms = 90,000 ns. Obviously it won't be a great circle and there's router buffers and the target's response time to tot up, but it's far enough that the speed of light would dominate.

January 27, 2011 | Unregistered CommenterAlex

Re: If Google (or any website of non-trivial scale) needs to "generate image results page of 30 thumbnails", it's almost certain that those 30 thumbnails are stored in 30 different disks.

Not really. I program/architect a top 100 Web site and thumbnails are not sharded to anything like this degree. (1) See birthday paradox. (2) We serve better than 99% of images from in-memory cache, so there would be little point.

February 7, 2011 | Unregistered CommenterPete Austin

How do you know those stats are correct? In fact, it's pretty obvious that no one set of stats will work in even a majority of situations. That type of data is too hardware and environment dependent to transfer from situation to situation. And your analysis is pretty worthless unless you can rely on your stats. That's why people end up going with intuition and experimentation sense rather than relying on techniques like this. They're too much of a crutch and the danger or getting lazy is too great.

March 6, 2012 | Unregistered CommenterD

'parallel' reads of multiple files from the same spindle might outperform 'serial' reads if you consider that the files themselves may be fragmented. It is conceivable that requesting all 30 images files at once might resolve to less seeking overall. If you have an apple pie cut into four pieces, one in Seattle, one in Denver, one in Miami and one in Dallas, and you have a cherry pie spread among Orlando, Kansas City, Nashville, and L.A., you could make a faster trip knowing you need both pies and having their locations up front, allowing you to map a more optimized round trip than being sent out for each pie one at a time. You can't assume a file exists sequentially on disk.

October 7, 2015 | Unregistered CommenterMike Eldredge

If I am not mistaking the Thumbnail calculations are wrong in both cases.

According to the provided data, it takes 30 ms to read 1MB. I.e. disk speed 30ms/MB
therefore reading 256KB takes 0.4MB * 30 ms/MB = 12 m to read 256KB from disk.

Based on this, the sequential calculation should be:

30 seeks * 10ms/seek + .4MB * 30ms/MB * 30 files = 660 ms

What the heck is 256K / 30 MB /s = 560ms mixing KB with MB and seconds with milliseconds in the same calculation?

Am I missing anything?

October 26, 2017 | Unregistered CommenterXavyBoy

Stanford presentation link is broken, alternate link

May 20, 2018 | Unregistered CommenterSathish

Good read. Now it makes sense why they ask estimation questions during product management or tech pm job interviews. Its good to know these basics as it will help a candidate with day to day job.

Thanks for an awesome writeup!

December 24, 2018 | Unregistered CommenterSweety

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>