Handle 700 Percent More Requests Using Squid and APC Cache

This post on documents some impressive system performance improvements by the addition of Squid Cache (a caching proxy) and APC Cache (opcode cache for PHP).

  • Apache is able to deliver roughly 700% more requests per second with Squid when serving 1KB and 100KB images.
  • Server load is reduced using Squid because the server does not have to create a bunch of Apache processes to handle the requests.
  • APC Cache took a system that could barely handle 10-20 requests per second to handling 50-60 requests per second. A 400% increase.
  • APC allowed the load times to remain under 5 seconds even with 200 concurrent threads slamming on the server.
  • These two caches are easy to setup and install and allow you to get a lot more performance out of them.
The post has an in-depth discussion and a number of supporting charts. The primary point is how simple it can be to improve performance and scalability by adding caching.


Latency is Everywhere and it Costs You Sales - How to Crush it

Update 8: The Cost of Latency by James Hamilton. James summarizing some latency info from  Steve Souder, Greg Linden, and Marissa Mayer.  Speed [is] an undervalued and under-discussed asset on the web.

Update 7: How do you know when you need more memcache servers?. Dathan Pattishall talks about using memcache not to scale, but to reduce latency and reduce I/O spikes, and how to use stats to know when more servers are needed.
Update 6: Stock Traders Find Speed Pays, in Milliseconds. Goldman Sachs is making record profits off a 500 millisecond trading advantage. Yes, latency matters. As an interesting aside, Libet found 500 msecs is about the time it takes the brain to weave together an experience of consciousness from all our sensor inputs.
Update 5: Shopzilla's Site Redo - You Get What You Measure. At the Velocity conference Phil Dixon, from Shopzilla, presented data showing a 5 second speed up resulted in a 25% increase in page views, a 10% increase in revenue, a 50% reduction in hardware, and a 120% increase traffic from Google. Built a new service oriented Java based stack. Keep it simple. Quality is a design decision. Obsessively easure everything. Used agile and built the site one page at a time to get feedback. Use proxies to incrementally expose users to new pages for A/B testing. Oracle Coherence Grid for caching. 1.5 second page load SLA. 650ms server side SLA. Make 30 parallel calls on server. 100 million requests a day. SLAs measure 95th percentile, averages not useful. Little things make a big difference.
Update 4: Slow Pages Lose Users. At the Velocity Conference Jake Brutlag (Google Search) and Eric Schurman (Microsoft Bing) presented study data showing delays under half a second impact business metrics and delay costs increase over time and persist. Page weight not key. Progressive rendering helps a lot.
Update 3: Nati Shalom's Take on this article. Lots of good stuff on designing architectures for latency minimization.
Update 2: Why Latency Lags Bandwidth, and What it Means to Computing by David Patterson. Reasons: Moore's Law helps BW more than latency; Distance limits latency; Bandwidth easier to sell; Latency help BW, but not vice versa; Bandwidth hurts latency; OS overhead hurts latency more than BW. Three ways to cope: Caching, Replication, Prediction. We haven't talked about prediction. Games use prediction, i.e, project where a character will go, but it's not a strategy much used in websites.
Update: Efficient data transfer through zero copy. Copying data kills. This excellent article explains the path data takes through the OS and how to reduce the number of copies to the big zero.

Latency matters. Amazon found every 100ms of latency cost them 1% in sales. Google found an extra .5 seconds in search page generation time dropped traffic by 20%. A broker could lose $4 million in revenues per millisecond if their electronic trading platform is 5 milliseconds behind the competition.

The Amazon results were reported by Greg Linden in his presentation Make Data Useful. In one of Greg's slides Google VP Marissa Mayer, in reference to the Google results, is quoted as saying "Users really respond to speed." And everyone wants responsive users. Ka-ching! People hate waiting and they're repulsed by seemingly small delays.

The less interactive a site becomes the more likely users are to click away and do something else. Latency is the mother of interactivity. Though it's possible through various UI techniques to make pages subjectively feel faster, slow sites generally lead to higher customer defection rates, which lead to lower conversation rates, which results in lower sales. Yet for some reason latency isn't a topic talked a lot about for web apps. We talk a lot about about building high-capacity sites, but very little about how to build low-latency sites. We apparently do so at the expense of our immortal bottom line.

I wondered if latency went to zero if sales would be infinite? But alas, as Dan Pritchett says, Latency Exists, Cope!. So we can't hide the "latency problem" by appointing a Latency Czar to conduct a nice little war on latency. Instead, we need to learn how to minimize and manage latency. It turns out a lot of problems are better solved that way.

How do we recover that which is most meaningful--sales--and build low-latency systems?

I'm excited that the topic of latency came up. There are a few good presentations on this topic I've been dying for a chance to reference. And latency is one of those quantifiable qualities that takes real engineering to create. A lot of what we do is bolt together other people's toys. Building high-capacity low-latency system takes mad skills. Which is fun. And which may also account for why we see latency a core design skill in real-time and market trading type systems, but not web systems. We certainly want our nuclear power plant plutonium fuel rod lowering hardware to respond to interrupts with sufficient alacrity. While less serious, trading companies are always in a technological arms race to create lower latency systems. He with the fastest system creates a sort of private wire for receiving and acting on information faster than everyone else. Knowing who has the bestest price the firstest is a huge advantage. But if our little shopping cart takes an extra 500 milliseconds to display, the world won't end. Or will it?

Latency Defined

My unsophisticated definition of latency is that it is the elapsed time between A and B where A and B are something you care about. Low-latency and high-latency are relative terms. The latency requirements for a femptosecond laser are far different than for mail delivery via the pony express, yet both systems can be characterized by latency. A system has low-latency if it's low enough to meet requirements, otherwise it's a high-latency system.

Latency Explained

The best explanation of latency I've ever read is still It's the Latency, Stupid by admitted network wizard Stuart Cheshire. A wonderful and detailed rant explaining latency as it relates to network communication, but the ideas are applicable everywhere.

Stuart's major point: If you have a network link with low bandwidth then it's an easy matter of putting several in parallel to make a combined link with higher bandwidth, but if you have a network link with bad latency then no amount of money can turn any number of them into a link with good latency.

I like the parallel with sharding in this observation. We put shards in parallel to increase capacity, but request latency through the system remains the same. So if we want to increase interactivity we have to address every component in the system that introduces latency and minimize or remove it's contribution. There's no "easy" scale-out strategy for fixing latency problems.

Sources of Latency

My parents told me latency was brought by Santa Clause in the dead of night, but that turns out not to be true! So where does latency come from?

  • Low Level Infrastructure. Includes OS / Kernel, Processors / CPU's, Memory, Storage related I/O, and Network related I/O.
  • High Level Infrastructure. Analysis of sources of latency in downloading web pages by Marc Abrams. The study examines several sources of latency: DNS, TCP, Web server, network links, and routers. Conclusion: In most cases, roughly half of the time is spent from the moment the browser sends the acknowledgment completing the TCP connection establishment until the first packet containing page content arrives. The bulk of this time is the round trip delay, and only a tiny portion is delay at the server. This implies that the bottleneck in accessing pages over the Internet is due to the Internet itself, and not the server speed.
  • Software Processing. Software processing accounts for much of the difficult to squeeze out latency in a system. In very rough terms a 2.0 GHz microprocessor can execute a few hundred lines of code every microsecond. Before a packet is delivered to an endpoint many thousands of instructions have probably already been executed. Then the handling software will spend many thousands more processing the message and then sending a reply. It all can add up to a substantial part of the latency budget. Included in this category are support services like databases, search engines, etc.
  • Frontend. 80-90% of the end-user response time is spent on the frontend, so it makes sense to concentrate efforts there before heroically rewriting the backend.
  • Service Dependency Latency. Dependent components increase latency. If component A calls compont B then the latency is the sum of the latency for each component and overall availability is reduced.
  • Propagation Latency. The speed at which data travels through a link. For fibre optic cable, the rate of signal propagation is roughly two-thirds the speed of light in vacuum. Every 20km takes about 100 microseconds of propagation latency. To reduce latency your only choice is to reduce the distance between endpoints.
  • Transmission Latency. The speed at which a data is transmitted on a communication link. On a 1Gbps network a 1000 bit packet takes about one millionth of a second to transmit. It's not dependent on distance. To reduce latency you need a faster link.
  • Geographical Distribution. BCP (Business Continuity Planning) requires running in multiple datacenters which means added WAN latency constraints.
  • Messaging Latency. The folks at 29west provide a great list forces that increase message latency: Intermediaries, Garbage Collection, Retransmissions, Reordering, Batching, CPU Scheduling, Socket Buffers, Network Queuing, Network Access Control, Serialization, Speed of Light.

    Draw out the list of every hop a client request takes and the potential number of latency gremlins is quite impressive.

    The Downsides of Latency

    Lower sales may be the terminal condition of latency problems, but the differential diagnosis is made of many and varied ailments. As latency increases work stays queued at all levels of the system which puts stress everywhere. It's like dementia, the system forgets how to do anything. Some of the problems you may see are: Queues grow; Memory grows; Timeouts cascade; Memory grows; Paging increases; Retries cascade; State machines reset; Locks are held longer; Threads block; Deadlock occurs; Predictability declines; Throughput declines; Messages drop; Quality plummets.

    For a better list take a look at The Many Flavors of System Latency.. along the Critical Path of Peak Performance by Todd Jobson. A great analysis of the subject.

    Managing Latency

    The general algorithm for managing latency is:
  • Continually map, monitor, and characterize all sources of latency.
  • Remove and/or minimize all latency sources that are found.

    Hardly a revelation, but it's actually rare for applications to view their work flow in terms of latency. This is part of the Log Everything All the Time mantra. Time stamp every part of your system. Look at mean latency, standard deviation, and outliers. See if you can't make the mean a little nicer, pinch in that standard deviation, and chop off some of those spikes. With latency variability is the name of the game, but that doesn't mean that variability can't be better controlled and managed. Target your latency slimming efforts where it matters the most and you get the most bang for your buck.

    Next we will talk about various ideas for what you can do about latency once you've found it.

    Dan Pritchett's Lessons for Managing Latency

    Dan Pritchett is one of the few who has openly written on architecting for latency. Here are some of Dan's suggestions for structuring systems to manage latency:
  • Loosely Couple Components. Loose coupling has a number of benifits: Tightly coupled systems are impossible distribute across data centers, tightly couples systems fail together, and loosely coupled systems can be independently scaled and engineered for latency.
  • Use Asynchronous Interfaces. Set an expectation of asynchronous behavior between components. This allows you to add latency when you need to make changes. Getting users on hooked on synchronous low-latency interactions doesn't allow for architecture flexibility. So start from the beginning with asynch semantics.
  • Horizontally Scale from the Start. It's very difficult to change a monolithic schema once you meet a scaling wall. Start with a horizontal architecture so you don't build in too many problems that will be hard to remove later.
  • Create an Active/Active Architecture. Most approaches to BCP take an active/passive approach, only one data center is active at a time. Creating an active/active system, where all data centers operate simultaneously allows users to be served from the closest data center which decreases latency.
  • Use a BASE (basically available, soft state, eventually consistent) Instead of ACID (atomicity, consistency, isolation, durability) Shared Storage Model. BASE is derived from the CAP Theorem which is the highly counter intuitive notion that database services cannot ensure all three of the following properties at once: Consistency, Availability, Partition tolerance. A BASE based system is more tolerant to latency because it is an inherently partitioned and loosely coupled architecture and it uses eventual consistency. With eventual consistency you can make an update to one partition and return. You don't have to coordinate a transaction across multiple database servers, which makes a system have a higher and more variable latency.

    Clearly each of these principles is a major topic all on their own. For more details please read: Dan Pritchett has written a few excellent papers on managing latency: The Challenges of Latency, Architecting for Latency, Latency Exists, Cope!.

    GigaSpaces Lessons for Lowering Latency

    GigsSpaces is an in-memory grid vendor and as such is on the vanguard of the RAM is the New Disk style of application building. In this approach disk is pushed aside for keeping all data in RAM. Following this line of logic GigaSpaces came up with these low latency architecture principles:

  • Co-location of the tiers (logic, data, messaging, presentation) on the same physical machine (but with a shared-nothing architecture so that there is minimal communication between machines)
  • Co-location of services on the same machine
  • Maintaining data in memory (caching)
  • Asynch communication to a persistent store and across geographical locations

    The thinking is the primary source of latency in a system centers around accessing disk. So skip the disk and keep everything in memory. Very logical. As memory is an order of magnitude faster than disk it's hard to argue that latency in such a system wouldn't plummet.

    Latency is minimized because objects are in kept memory and work requests are directed directly to the machine containing the already in-memory object. The object implements the request behavior on the same machine. There's no pulling data from a disk. There isn't even the hit of accessing a cache server. And since all other object requests are also served from in-memory objects we've minimized the Service Dependency Latency problem as well.

    GigaSpaces isn't the only player in this market. You might want to also take a look at: Scaleout Software, Grid Gain, Teracotta, GemStone, and Coherence. We'll have more on some of these products later.

    Miscellaneous Latency Reduction Ideas

  • Cache. No, really? Well it had to be said. See A Bunch of Great Strategies for Using Memcached and MySQL Better Together.
  • Use a CDN. No, really? See What CDN would you recommend?.
  • Use a Caching Proxy Server. At least this is a little less obvious. See Strategy: Front S3 with a Caching Proxy.
  • Enhance Your Web Operations Capability. There are plenty of available tools to help you pinpoint and correct operation related problems. See Velocity Conference for more information.
  • Use Yslow to Make Your Pages Go. Yslow is a tool to show sources of latency on the client side and suggest ways to fix any problems found. See Yslow to speed up your web pages.
  • Use an Edge DNS Accelerator. This type of service "will ensure that a name server most accessible to the end user will pick up the request and respond." See Edge Acceleration Strategies: Akamai.
  • Optimize Virtual Machines. People often forget VM's exact a performance tax. Virtualized I/O can suffer a substantial performance penalty. See if it can't be tuned.
  • Use Ajax to minimize perceived latency to the user. Clever UI design can make a site feel faster than it really is.
  • Use a faster network. A high speed InfiniBand link can have an end-to end latency of about 1 microsecond. Another option is a 10 GigE network.
  • Scale up. Faster processors means less software induced latency.
  • Optimize firewalls. An often hidden latency enhancer is your firewall system.
  • Use Small Memory Chunks When Using Java. GC in Java kills latency. One way to minimize the impact of garbage collection on latency is to use more VMs and less memory in each VM instead of VM with a lot of memory. This prevents a large GC run and makes latency more predictable.
  • Use a TCP Offload Engine (TOE). TOE tech offloads the TCP/IP stack from the main CPU and puts it on the network controller. This means network adapters can respond faster which means faster end-to-end communication. Network adapters respond faster because bus wait time is reduced as the number of transactions across the system I/O bus and memory bus are reduced.
  • Design low latency network topoligies. Phil Dykstra in Issues Impacting Gigabit Networks:
    Why don't most users experience high data rates?
    pinpoints poor network design as one major source of latency: On a single high performance network today, measured latencies are typically ~1.5x - 3x that expected from the speed of light in fiber. This is mostly due to taking longer than line-of-site paths. Between different networks (via NAPs) latency is usually much worse. Some extra distance is required, based on the availability of fiber routes and interconnects, but much more attention should be given to minimizing latency as we design our network topologies and routing.
  • Make TCP Faster. FastTCP, for example, tweaks TCP to provide smoother and faster data delivery.
  • Copy Data Zero Times. Efficient data transfer through zero copy. Copying data kills. This excellent article explains the path data takes through the OS and how to reduce the number of copies to the big zero.
  • Increase the speed of light. Warp capability could really help speed up communication. Get to work on that!

    Application Server Architecture Matters Again

    With the general move over the past few years to a standard shared nothing two-tierish architecture, discussion of application server architectures has become a neglected topic, mainly because there weren't application servers anymore. Web requests came in, data was retrieved from the database, and results were calculated and returned to the user. No application server. The web server became the application server. This was quite a change from previous architectures which were more application server oriented. Though they weren't called application servers, they were call daemons or even just servers (as in client-server).

    Let's say we buy into RAM is the New Disk. This means we'll have many persistent processes filled with many objects spread over many boxes. A stream of requests are directed at each process and those requests must be executed in each process. How should those processes be designed?

    Sure, having objects in memory reduces latency, but it's very easy through poor programming practice to lose all of that advantage. And then some. Fortunately we have a ton of literature on how to structure servers. I have a more thorough discussion here in Architecture Discussion. Also take a look at SEDA, an architecture for highly concurrent servers and ACE, an OO network programming toolkit in C++.

    A few general suggestions:
  • Stop Serializing/Deserializing Messages. It boggles my mind why we still serialize and deserialize messages. Leave messages in a binary compressed format and decode only on access. Very few activities waste more CPU and cause more lock contention through the memory library than does serialization.
  • Load Balance Across Read Replicas. The more copies of objects you have the more work you can perform in parallel. Consider keeping objects replicas for both high availability and high scalability. This is the same strategy distributed file systems use handle more load. It works in-memory as well.
  • Don't Block. The goal for a program is to use its whole CPU time quanta when it's scheduled to run. Don't block. Don't give the processor back to the OS for someone else to get it. Block for any reason and your performance tanks because not only do you incur the latency of the operation but there's added rescheduling latency as well. Who knows when your thread will get scheduled again?
  • Minimize Paging. Thrashing is when a system experiences excessive page faults. More work is spent on moving memory around than is being given to tasks to perform real work. It's usually three orders of magnitude slower to access a page from disk instead of memory. Unfortunately, memory managers in most languages make reducing paging difficult as you have no control over where memory is placed or how it us used. With CPU speeds these days basically an operation is free when you are operating on paged-in memory.
  • Minimize/Remove locking. Locks add latency and variability to a processing pipeline. A lock is a blocking operation. So you are choosing not to run when you have the CPU which means you incur a number of different forms of latency. Select a server architecture that minimizes the need for locks.


    Locating applications together reduces latency by reducing data hops. The number and location of network hops a message has to travel through is a big part of the end-to-end latency of a system.

    For example, from New York to the London Stock Exchange a round trip message takes 84 milliseconds to send, from Frankfurt it take 18 milliseconds, and from Tokyo it takes 208 milliseconds. If you want to minimize latency then the clear strategy is to colocate your service in the London Stock Exchange. Distance is minimized and you can probably use a faster network too.

    Virtualization technology makes it easier than ever to compose separate systems together. Add a cloud infrastructure to that and it becomes almost easy to dramatically lower latencies by colocating applications.

    Minimize the Number of Hops

    Latency increases with each hop in a system. The fewer hops the less latency. So put those hops on a diet. Some hop reducing ideas are:
  • Colocation. Colocation is one hop reducing strategy. It reduces the number of WAN links, routers, etc that a message has to go through. If a router takes 400 microsecond for each packet, for example, getting rid of that router reduces latency. Colocation also works for code and data, as in the GigaSpaces architecture. They maintain a sharded in-memory object cache so an extra database hop is avoided when executing an operation.
  • Simplify Software Architecture. Remove intermediate daemons, brokers and other latency adding components. Dispatch work to where it will be processed as simply and fast as possible. Peer-to-peer architectures and sharding approaches are good at this. Avoid sending work into a hub for central dispatching. Dispatch as far out at the edge as possible.
  • Open a New Datacenter. Facebook opened a new datacenter on the east coast in order to save 70 milliseconds.

    Build Your own Field-programmable Gate Array (FPGA)

    This one may seem a little off the wall, but creating your own custom FPGA may be a killer option for some problems. A FPGA is a semiconductor device containing programmable logic. Typical computer programs are a series of instructions that are loaded and interpreted by a general purpose microprocessor, like the one in your desk top computer. Using a FPGA it's possible to bypass the overhead of a general purpose microprocessor and code your application directly into silicon. For some classes of problems the performance increases can be dramatic.

    FPGAs are programmed with your task specific algorithm. Usually something compute intensive like medical imaging, modeling bond yields, cryptography, and matching patterns for deep packet inspections. I/O heavy operations probably won't benefit from FPGAs. Sure, the same algorithm could be run on a standard platform, but the advantage FPGAs have is even though they may run at a relatively low clock rates, FPGAs can perform many calculations in parallel. So perhaps orders-of-magnitude more work is being performed each clock cycle. Also, FPGAs often use content addressable memory which provides a significant speedup for indexing, searching, and matching operations. We also may see a move to FPGAs because they use less power. Stay lean and green.

    In embedded projects FPGAs and ASICS (application-specific integrated circuit) are avoided like the plague. If you can get by with an off-the-shelf microprocessors (Intel, AMD, ARM, PPC, etc) you do it. It's a time-to-market issue. Standard microprocessors are, well, standard, so that makes them easy to work with. Operating systems will already have board support packages for standard processors, which makes building a system faster and cheaper. Once custom hardware is involved it becomes a lot of work to support the new chip in hardware and software. Creating a software only solution is much more flexible in a world where constant change rules. Hardware resists change. So does software, but since people think it doesn't we have to act like software is infinitely malleable.

    Sometimes hardware is the way to go. If you are building a NIC that has to process packets at line speed the chances are an off-the-shelf processor won't be cost effective and may not be fast enough. Your typical high end graphics card, for example, is a marvel of engineering. Graphics cards are so powerful these days distributed computation projects like Folding@home get a substantial amount of their processing power from graphics cards. Traditional CPUs are creamed by NVIDIA GeForce GPUs which perform protein-folding simulations up to 140 times faster. The downside is GPUs require very specialized programming, so it's easier to write for a standard CPU and be done with it.

    That same protein folding power can be available to your own applications. ACTIV Financial, for example, uses a custom FGPA for low latency processing of high speed financial data flows. ACTIV's competitors use a traditional commodity box approach where financial data is processed by a large number of commodity servers. Let's say an application takes 12 servers. Using a FPGA the number of servers can be collapsed down to one because more instructions are performed simultaneously which means fewer machines ar needed. Using the FPGA architecture they process 20 times more messages than they did before and have reduced latency from one millisecond down to less than 100 microseconds.

    Part of the performance improvement comes from the high speed main memory and network IO access FPGAs enjoy with the processor. Both Intel and AMD make it relatively easy to connect FPGAs to their chips. Using these mechanisms data moves back and forth between your processing engine and the main processor with minimal latency. In a standard architecture all this communication and manipulation would happen over a network.

    FPGAs are programmed using hardware description languages like Verilog and VHDL. You can't get away from the hardware when programming FPGAs, which is a major bridge to cross for us software types. Many moons ago I took a Verilog FPGA programming class. It's not easy, nothing is ever easy, but it is possible. And for the right problem it might even be worth it.

    Related Articles

  • The Challenges of Latency by Dan Pritchett
  • Latency Exists, Cope! by Dan Pritchett
  • Architecting for Latency by Dan Pritchett
  • BASE: An ACID Alternative by Dan Pritchett, eBay
  • Comet: Sub-Second Latency with 10K+ Concurrent Users by Alexander Olaru
  • It's the Latency, Stupid by Stuart Cheshire
  • Latency and the Quest for Interactivity by Stuart Cheshire
  • The importance of bandwidth versus latency by Dion Almaer
  • Fallacies of Distributed Computing - The second fallacy is "Latency is Zero"
  • List of device bandwidths
  • Computing over a high-latency network means you have to bulk up by Raymond Chen
  • AJAX Latency problems: myth or reality? by Jep Castelein
  • RAM Guide: Part I DRAM and SRAM Basics and Part 2 by Jon "Hannibal" Stokes
  • The Many Flavors of System Latency.. along the Critical Path of Peak Performance and Processors and Performance : Chips, MIPS, and Sizing blips.. by Todd Jobson. A very detailed and helpful analysis of latency sources.
  • Ethernet Latency: The Hidden Performance Killer by Kevin Burton
  • Network latency vs. end-to-end latency by Nati Shalom
  • Low-Latency Delivery Enters Mainstream; But Standard Measurement Remains Elusive by Andrew Delaney
  • The three faces of latency by By Scott Parsons, Chief Scientist at Exegy, Inc.
  • Architecture Discussion
  • The JVM needs Value Types - Solving the next bottleneck. Value types use less space, less paging, less memory allocation, better cache usage, better garbage collection profile.
  • Latency, Bandwidth, and Response Times by Chris Loosley.
  • Anatomy of real-time Linux architectures by M. Time Jones.
  • True Cost of Latency by GemStone
  • Tuesday

    Paper: Parallelizing the Web Browser

    There have been reports that software engineering is dead. Maybe, like the future, software engineering is simply not evenly distributed? When you read this paper I think you'll agree there is some real engineering going on, it's just that most of the things we need to build do not require real engineering. Much like my old childhood tree fort could be patched together and was "good enough." This brings to mind the old joke: If a software tree falls in the woods would anyone hear it fall? Only if it tweeted on the way down...

    What this paper really showed me is we need not only to change programming practices and constructs, but we also need to design solutions that allow for deep parallelism to begin with. Grafting parallelism on later is difficult. Parallel execution requires knowing precisely how components are dependent on each other and that level of precision tends to go far beyond the human attention span.

    In particular this paper deals with how to parallelize the browser on cell phones. We are entering a multi-core smartphone dominated world. As network connections become faster, applications, like the browser, become CPU bound:

    On an equivalent network connection, the iPhone browser is 5 to 10 times slower than Firefox on a fast laptop. The browser is CPU-bound because it is a compiler (for HTML), a page layout engine (for CSS), and an interpreter (for JavaScript); all three tasks are on a user’s critical path.

    To speed up the browser they worked on: offloading computation, removing the abstraction tax, and parallelizing the browser using energy efficient data and task approaches. The problem is technologies like HTML, CSS, DOM, Javascript, events, and page layout were not designed to be parallel. They were designed to be run on a single CPU. And the paper goes to brilliant and heroic lengths to parallelize this part of the stack. They designed new work-efficient FSM algorithms, speculative parallelization for flow layouts, eliminating as much shared state as possible, callback dependency analysis, using actors to implement behaviours, and many more.

    What's clear though is their job would have been a heck of a lot easier if the stack would have been designed with parallelization in mind from the beginning.

    Leo Meyerovich, one of the authors of the paper, talks about the need for a more rigorous underpinning in blog postThe Point of Semantics:

    As part of the preparation for a paper submission, I'm finishing up my formalization of a subset of CSS 2.1 (blocks, inlines, inline-blocks, and floats) from last year. My first two, direct formalization approaches failed the smell test so Ras and I created a more orthogonal kernel language. It's small, and as the CSS spec is a scattered hodge-podge of prose and visual examples riddled with ambiguities, we phrase it as a total and deterministic attribute grammar that is easy to evaluate in parallel. 

    I asked Leo what rules we could follow to create more parallelizable constructs from the beginning and he said that's what he'll be working on for the next couple years :-) Some advice he had was:

  • Be clear on what you want to parallelize. Figuring out where the parallelism should be, at a conceptual level, is always the first step.
  • Understand how it should run in parallel.
  • Focus on making it easy to do just that (and worry about the rest later).
  • It's better to completely solve a problem for some folks than almost solve a problem for many: you can help more and more in the former, but with the latter, you might never end up helping anybody.

    Some things Leo will be working on are:
    I've been enjoying higher-order data flow models (Flapjax) and task parallelism (Cilk++) for awhile now and have been thinking about this, including support for controlled sharing (e.g., SharC for type qualifiers and I'm still trying to figure out implicitly transactional flows for FRP). For a browser, I think it will remain as specialized libraries written in privileged languages where good engineers can rock and put together and be exposed in higher-level languages. Hopefully gradually typing will extend into lower levels to support this. The above hints at a layered framework with the bulk in the high-level -- think parallel scripting. However, as a community, we don't know how to include performance guides in large software, so parallelism is a challenge. I prototyped one of my algorithms in a parallel python variant: the sequential C was magnitudes faster than then 20-core python. Of course, the parallel C++ was even faster :) 

    Related Articles

  • Parallelizing the Web Browser by Christopher Grant Jones, Rose Liu, Leo Meyerovich, Krste Asanovi´c, Rastislav Bodík.
  • Parallelizing the Web Browser
    Browsing Web 3.0 on 3 Watts
  • Leo Meyerovich's Project Website
  • Flapjax - a new programming language designed around the demands of modern, client-based Web applications: event-driven, reactive evaluation, event-stream abstraction for communicating with web services, interfaces to external web services.
  • Bell's Law of Computer Classes
  • Leo Meyerovich's Blog
  • Monday

    A Scalability Lament

    In Scalability issues for dummies Alex Barrera talks movingly about the challenges he faces trying to scale his startup inkzee as the lone developer. Inkzee is an online news reader that automatically groups similar topics. This is a cool problem and is one you know right away is going to have some killer scalability problems as the number of feeds and the number of users increase. And these problems lead to the point of the post, to explain here what are scalability problems and how deep the repercussions are for a small company, which Alex does admirably.

    Some takeaways:

  • Sites are composed of a frontend and backend. The backend isn't visible to users, but it does all the work and this is where the scalability problems show up.
  • As more and more users use a site it becomes slow because more users reveal bottlenecks in the system that weren't visible before.
  • There can be many many reasons for these bottlenecks. They are often very hard to find because the backend systems are very complex and have a lot of complex interactions.
  • It takes a while to find the bottlenecks and create fixes for them. You are never quite sure if the fixes will really work. Many of these problems are unique to the problem space so pre-canned solutions aren't always available. And because you don't want to destroy your production servers it takes a while to put fixes into the system. This means your release cycle is slow which means progress on your site is slow.
  • The process of redesign sucks up all your resources so progress on the site stalls almost completely, especially for a small development group. This stops growth as you can't handle new customers and your existing customers become disenchanted.
  • The process is unfortunately iterative. Solving one problem just puts you in the queue for the next problem. There's a reason it took Twitter a while to get on their feet.
  • Scalability problems aren’t something you can discard as being ONLY technical, it’s roots might be technical but its effects will shake the whole company.

    I found Alex's commentary quite touching and familiar. As I imagine many of you do too. It's the modern equivalent of an explorer following a dream. Going alone into uncharted territory where the Dragons live and trying to survive when everything seems against you. For every great returning hero there are 10 who do not make it back. And that's hard to deal with.

    Everyone will certainly have their ideas on how to "fix" the problem, as that's what engineers do. But it also doesn't hurt to use our Venus brain for a moment and simply recognize the toll this process can take. It can be dispiriting. The continual stream of problems and lack of positive feedback can wear you down after a while. To stick with it takes a bit of craziness in the heart.

    Switching back to being Mars brained I might suggest:
  • Find a partner. There's always a Jerry to go along with Ben. A Martin to go along with Lewis. And Rambo is just a movie. Going it alone is hard hard hard. Partners can pick each other up and give each other a breather when necessary.
  • Use the Cloud. Whatever your opinions of the economics of the cloud, it makes testing of the type Alex was needing a breeze. Architect for the cloud and you can spin up a system in parallel, run a load generator, and never touch your production system at all. And do it all for cheap. This is one of the biggest wins for the cloud and would seem to solve a lot of problems.

    Related Articles

  • Scalable Web Architectures and Application State by Marton Trencseni
  • Scalability for dummies by Royans Tharakan
  • Friday

    Against all the odds

    This article not about Mariah Carey, or its song. It's about Storing System, Database.

    First let's describe what means by odds: In my social network, I found 93% of the mainstream developers sanctify the database, or at least consider it in any data persistence challenge as the ultimate, superhero, and undefeatable solution.

    I think this problem come from the education, personally, and some companies also I think it's involved in this.

    To start to fix this bad thinking, we all should agree in the following points:

    • Every challenge have its own solutions, so whatever you want to save/persistent, there are always many solutions. For example the Web search engines, such as: Google, Kngine, Yahoo, Bing don't use database at all instead we use Indexes (Index file) for better performance.
    • The Database in general whatever the vendor it's slow compared with other solutions such as: Key-Value storing system, Index file, DHT.
    • The Database currently employ Relation Data model, or Object relational data model, so don't convince yourself to save non-relation data into relation data model store system such as: Database.
    • The Database system architecture didn't changed very much in last 30 years, and it's content a lot of limits, and fails in its performance, scalability character. If you don't believe me check out this papers:
    1. The End of an Architectural Era (It's Time for a Complete Rewrite)

    2. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks

    I hope if you agreed with me in the previous points. So the question do we really need Database in every application?

    There are many scenario shouldn't use Database resisters, such as: Web search engine, Caching, File sharing system, DNS system, etc. In the other hand there many of scenarios should use Database, such as: Customer database, Address book, ERP, etc.

    Tiny URL services for example, shouldn't use Database at all because it's require very simple needs, just map a small/tiny URL to the real/big URL. If you start agreed with me, you likely want ask: But what we can use beside or instead of Databases?

    There are a lot of tools that fallowing CAP, BASE model, instead of ACID model. But first let's describe ACID:

    • Atomicity: A transaction is all or nothing
    • Consistency: Only valid data is written to the database
    • Isolation: Pretend all transactions are happening serially and the data is correct
    • Durability: what you write is what you get
    1. The problem with ACID is that it gives you too much; it trips you up when you are trying to scale a system across multiple nodes.

    2. Down time is unacceptable. So your system needs to be reliable. Reliability requires multiple nodes to handle machine failures.

    3. To make scalable systems that can handle lots and lots of reads and writes you need many more nodes.

    4. Once you try to scale ACID across many machines you hit problems with network failures and delays. The algorithms don't work in a distributed environment at any acceptable speed.

    In other hand CAP model is about:

    • Consistency: Your data is correct all the time. What you write is what you read.

    • Availability: You can read and write and write your data all the time.

    • Partition Tolerance: If one or more nodes fails the system still works and becomes consistent when the system comes on-line.
    1. CAP is easy to scale, distribute. CAP is scalable by nature.

    2. Everyone who builds big applications builds them on CAP. Who use CAP: Google, Yahoo, Facebook, Kngine, Amazon, eBay, etc.

    For example in any in-memory or in-disk caching system you will never need all the Database features. You just need CAP like system. Today there are a lot of: column oriented, and key-value oriented systems. But first let's describe Column oriented:

    A column-oriented is a database management system (DBMS) which stores its content by column rather than by row. This has advantages for databases such as data warehouses and library catalogues, where aggregates are computed over large numbers of similar data items. This approach is contrasted with row-oriented databases and with correlation databases, which use a value-based storage structure. For more information check Wikipedia page.

    Distributed key-value stores:

    Distributed column stores (Bigtable-like systems):

    Something a little different:



    Scaling Traffic: People Pod Pool of On Demand Self Driving Robotic Cars who Automatically Refuel from Cheap Solar

    Update 17: Are Wireless Road Trains the Cure for Traffic Congestion? BY ADDY DUGDALE. The concept of road trains--up to eight vehicles zooming down the road together--has long been considered a faster, safer, and greener way of traveling long distances by car

    Update 16: The first electric vehicle in the country powered completely by ultracapacitors. The minibus can be fully recharged in fifteen minutes, unlike battery vehicles, which typically takes hours to recharge.

    Update 15: How to Make UAVs Fully Autonomous. The Sense-and-Avoid system uses a four-megapixel camera on a pan tilt to detect obstacles from the ground. It puts red boxes around planes and birds, and blue boxes around movement that it determines is not an obstacle (e.g., dust on the lens).
    Update 14: ATNMBL is a concept vehicle for 2040 that represents the end of driving and an alternative approach to car design. Upon entering ATNMBL, you are presented with a simple question: "Where can I take you?" There is no steering wheel, brake pedal or driver's seat. ATNMBL drives for you. Electric powered plus solar assist, with wrap-around seating for seven, ATNMBL offers living and/or working comfort, views, conversations, entertainment, and social connectedness.

    Update 13: The Next Node on the Net? Your Car!. A new radio system developed in Australia is transforming the vehicles on the street into nodes on a network.
    Update 12: United Arab Emirates building network of driverless electric taxis. When the system's fully built, planners say the podcars will be able to deliver riders within 100 meters of any location in the city. The whole network of tracks for the cars will be two stories beneath street level.

    Update 11: Self-driving cars set to cut fuel consumption. Large-scale test seeks to put humans in the back seat. NEDO says it will start testing several key technologies that allow for autonomous driving between 2010 and 2012.
    Update 10: Fighting Traffic Jams With Data. Researchers from different universities are working on ways for cars to better communicate with each other and relay crucial driver information such as traffic speed, weather and road conditions.
    Update 9: Accident Ahead? New Software Will Enable Cars To Make Coordinated Avoidance Maneuvers. In dangerous situations, the cars can independently perform coordinated maneuvers without their drivers having to intervene. In this way, they can quickly and safely avoid one another.
    Update 8: Great article in Wired on Better Place's proposal for a new electric car distribution system. The idea is to blanket the country with "smart" charge spots. You buy your car from them and purchase a recharge plan. Profit come from selling electricity.
    Update 7: Capturing solar energy from asphalt pavements. An interesting way to make the system self-sufficient.
    Update 6: Why We Drive the Way We Do Unlocks How to Unclog Traffic. Vanderbilt says: The fundamental problem is that you've got drivers who make user-optimal rather than system-optimal decisions. Josh McHugh replies: Make the packets (cars) dumb and able to take marching orders from traffic routing nodes.
    Update 5: Traffic jams are not caused by flaws in road design but by flaws in human nature. Nearly 80 percent of crashes involve drivers not paying attention for up to three seconds. The both good and scary thing about computers is they always pay attention.
    Update 4: Volvo Says It Will Have An Injury Proof Car By 2020.
    Update 3: Map Reading For Dummies. Europe (again) is developing a system that will read satellite navigation maps and warn the driver of upcoming hazards – sharp bends, dips and accident black spots – which may be invisible to the driver. Even better, the system can update the geographic database. Another key capability of the People Pod system.
    Update 2: Road Safety: The Uncrashable Car?. A European research project basic could lead to a car that is virtually uncrashable. An uncrashable car would definitely ease people's concerns over computerized navigation.
    Update: Shockwave traffic jam recreated for first time - "Pinpointing the causes of shockwave jams is an exercise in psychology more than anything else. 'If they had set up an experiment with robots driving in a perfect circle, flow breakdown would not have occurred. Human error is needed to cause the fluctuations in behaviour.'"

    Traffic in the San Francisco Bay area is like Dolly Parton, 10 pounds in a 5 pound sack. Mass transit has been our unseen traffic woe savior for a while. But the ring of political fire circling the bay has prevented any meaningful region wide transportation solution. As everyone scrambles to live anywhere they can afford, we really need a region wide solution rather than the local fixes that can never go quite far enough. The solution: create a People Pod Pool of On Demand Self Driving Robotic Cars who Automatically Refuel from Cheap Solar.

    Commuters are Satisfied Not Carpooling

    You might think we would car pool more. But people of the bay don't like carpools and they don't much like mass transit either. In the Metro, a local weekly, they published a wonderful article Fueling the Fire, on how we need to cure our car addiction using the same marginalization techniques used to "stop" smoking.

    A telling quote shows how difficult going cold turkey off our cars will be:

    Mitch Baer, a public policy and environment graduate student at George Mason University in Virginia, recently surveyed more than 2,000 commuters in the Washington, D.C., area. He found that people who drove to work alone were more emotionally satisfied with their commute than those who rode public transportation or carpooled with others.

    Even stuck in traffic jams, those commuters said they felt they had more control over their arrival and departure times as well as commuting route, radio stations and air conditioning levels.

    Commuters said that driving alone was both quicker and more affordable, according to the study.

    "They will have a tougher time moving people out of their cars," Baer said. "It's easier for most people to drive than take mass transit."

    The key phrase to me is: people who drove to work alone were more emotionally satisfied. How can people jostled in the great pinball machine that are our roadways be emotionally satisfied? That's crazy talk. Shouldn't we feel less satisfied?

    In Our Cars We Feel Good Because We Are in Control

    Solving the mystery of why we feel satisfied while stuck in traffic turns on an important psychological clue: the more we perceive ourselves in control of a situation the less stress we feel. Robert Sapolsky talks about this surprising insight into human nature in Why Zebras Don't Get Ulcers.

    Notice we simply need more "perceived" control. Take control of a situation in your mind and stress goes down. You don't actually need to be in more control of a situation to feel less stress. If you have diabetes, facing your possibly bleak future can be less stressful if you try to control your blood sugars. If you are a speed demon, buying a radar detector can make you feel more in control and less stressed as you zoom along the seldom empty highways. If you are bullied, figuring out ways to avoid your torturer puts you more in control and therefor less stressed.

    Figure out a way to control and an out of control situation and you'll feel happier. That's what I think we are accomplishing by driving alone in cars. In our car we have complete control. Cars are our castles with a 2 inch air moat cushion. Most cars are plusher than any room in your average house. Fine leather, a rad sound system, perfect temperature control, and a nice beverage of choice within easy reaching distance. In our cars we've created a second womb. The result is we feel more control, less stress, and more satisfaction, even when outside, across the moat, a tempestuous sea of stressors await.

    Our Mass Transit System Must Supply Perceived Control

    Given the warm inner glow we feel from being wrapped in the cold steel of our cars, if you want people to get out of their cars and onto mass transit you must provide the same level of perceived control. None of our mass transit options do that now. Buses are on fixed schedules that don't go where I want to go when I want to go. Neither do trains, BART, or light rail. So the car it is. Unless a system could be devised that provided the benefits of mass transit plus the pleasing characteristics of control our cars give us.

    With Recent Technological Advances We Can Create a New Type of Mass Transit System

    New technologies are being developed the will allow us to create a mass transit system that matches our psychological and physical needs. Just berating people and telling them they should take mass transit to save the planet won't work. The pain is too near and the benefits are too far for the mental cost-benefit calculation to go the way of mass transit.

    The technologies I am talking about are:

  • Inexpensive solar with $1/watt solar panels. Our mass transit must of course be green and cost effective.
  • Breakthrough battery could boost electric cars. Toshiba promises 'energy solution' with nearly full recharge in 5 minutes.
  • Personal transportation pods. A reusable vehicle that can take anyone anywhere they want to go.
  • Self driving vehicles. We are making great strides in creating robot cars that can drive themselves in traffic. Already they drive better than most humans can drive (low bar, I know).

    Mix these all together and you get a completely different type of mass transit system. A mashup, if you will.

    Create a People Pod Pool of On Demand Autonomous Self Driving Robotic Cars that Automatically Refuel from Cheap Solar

    Many company campuses offer a pool of bicycles so workers can ride between buildings and make short trips. Some cities even make bikes available to their citizens. The idea is to do the same for cars, but with a twist or two.

    The cars (people pods) can be stored close to demand points and you can call for one anytime you wish. The cars are self driving. You don't actually drive them and are free to work or play during transit. Different kinds would be available depending on your purpose. Just one person on a shopping trip would receive a different car than a family. The pods would autonomously search out and find energy sources as needed to recharge.There's no reason to assume a centralized charging and storage facility. When repair was needed they could drive themselves to a repair depot or wait for the people pod ambulance service.

    The advantages of such a system are:
  • Perceived control. You have a personal "car" you control the destination for, the interior environment of, and your own actions inside. This gets over the biggest hurdle with current mass transit options.
  • Better regional traffic flow. The autonomous cars could drive cooperatively to smooth out traffic jams. Traffic jams are largely caused by people speeding up and slowing down which causes ripples of slowness up and down the road. And automated system could prevent that.
  • Go where you want to go. It would be used because people can go to exactly where they need to go and be picked up exactly where they need to leave from at exactly the time they wish. None of these are characteristic of current systems.
  • Leverage existing road ways. Creating light rail and trains is expensive and wasteful (except for the high speed point-point variety). They don't extend to where people live and they don't go where people go. So it creates a multi-hop mess out of every trip. We already have an expansive road system that goes where everyone wants to go. Using the road infrastructure more efficiently makes a lot more sense than creating hugely expensive partial solutions. And since these cars would be eco-friendly, most arguments against using cars fall away.
  • Cheaper delivery. One force keeping truly distributed manufacturing and retailing from blossoming is high delivery costs. A $2 item is simply too expensive to buy remotely and ship because shipping costs more than the product. An automated transportation system would make this model more affordable.
  • Live where you want to live. Most mass transit systems are based on trying to socially reengineer our current suburbian and exurbian living pattern into a high density live-work pattern. While this should be an option, most mass transit proposals assume this pattern as a given and can't deal with current realities. For the foreseeable future people will not give up their houses or their lifestyles. The People Pod approach solves the mass transit problem and the "difficulties" of having to change a whole populace to behave in a completely different way for less than compelling reasons.
  • Still can own your own car. This isn't a replacement for the current car culture. It's leveraging the car culture. You can still own and drive your own car. Nobody is trying to steal your car away from you.
  • Cleaner and safer. Mass transit is disliked by many because it is perceived as dirty and unsafe. The pods would be safe and clean.
  • Road safety. Our new robot overloads will make our lives safer. Hopefully, possibly, maybe...


  • Current transportation budgets. Money could be redeployed from existing less than successful approaches.
  • Advertising. The outside of vehicles could contain advertising as could the inside, especially from the internal search system. Imagine wanting a new place to eat and asking the pod to suggest one. That's prime targeted marketing. Social networks and massive multi-player games could also be created between pods.
  • In-flight services. Movies on demand and so on.
  • Efficiencies. The plug-in cars are electric and efficient and low maintenance. That will save money.
  • Up sells. Individuals could buy their own pods and trick them out. Also, people could pay for a higher class of pod from the pod pool.

  • Licensing. Technology used in making the pods could be sold to other manufacturers. Create a standardized market so competition and cooperation can erupt.
  • Sponsorship. Companies could buy rights to play music, stock the food locker, use their equipment, etc.
  • Naming rights. The rights to name parts of the system could be sold.


  • Challenge prize. Maybe someone with a vision and a dream can put up a $50 million prize to get it going. Something like the Xprize.
  • Government funding. Don't laugh, it might happen.
  • Startup. I'm available if interested :-) With a large enough challenge prize this is a viable model.

    It's a Usable System so People Would Use It

    After a lot of reading on the topic and a lot of self-examination on why I am such a horrible person that I don't use mass transit more, this is the type of system I could really see myself using. It doesn't try to change the world, it uses what we got, and gives people what they want. It just might work.
  • Thursday

    Scalable Web Architectures and Application State

    In this article we follow a hypothetical programmer, Damian, on his quest to make his web application scalable.

    Read the full article on Bytepawn


    SPHiveDB: A mixture of the Key/Value Store and the Relational Database.

    The Key/Value Store becames more and more popular. When we use the Key/Value Store to store objects, we need to serialize/deserialize the objects as binary buffer. We have many ways to serialize/deserialize objects. A possible way is to use the Relational Database. Every value we store in the Key/Value Store is a SQLite instance. We can use the power of the Relational Database to manipulate the value. The SQL is very powerful for processing query request.

    SPHiveDB = TokyoCabinet + SQLite

    SPHiveDB is a server for sqlite database. It use JSON-RPC over HTTP to expose a network interface to use SQLite database. It supports combining multiple SQLite databases into one file ( through tokyo cabinet ). It also supports the use of multiple files.


    No to SQL? Anti-database movement gains steam – My Take

    In this post i wrote my view on the anti SQL database movement and where the alternative approach fits in:

    - SQL databases are not going away anytime soon.
    - The current "one size fit it all" databases thinking was and is wrong.
    - There is definitely a place for a more a more specialized data management solutions alongside traditional SQL databases.

    In addition to the options that was mentioned on the original article i pointed out the the in-memory alternative approach and how that fits into the puzzle. I used a real life scenario: scalable Social network based eCommerce site where i outlined how in-memory approach was the only option they could scale and meet their application performance and response time requirements.


    Servers Component - How to choice and build perfect server

    There are a lot of questions about how the server components, and how to build perfect server with consider the power consumption. Today I will discuss the Server components, and how we can choice better server components with consider the power consumption, efficacy, performance, and price.

    Key points:

    • What kind of components the servers needs?
    • The Green Computing and the Servers components
    • How much power the server consume
    • Choice the right components:
      • Processor
      • Hard Disk Drive
      • Memory
      • Operating system
    • Build Server, or buy?