Paper: Parallelizing the Web Browser
There have been reports that software engineering is dead. Maybe, like the future, software engineering is simply not evenly distributed? When you read this paper I think you'll agree there is some real engineering going on, it's just that most of the things we need to build do not require real engineering. Much like my old childhood tree fort could be patched together and was "good enough." This brings to mind the old joke: If a software tree falls in the woods would anyone hear it fall? Only if it tweeted on the way down...
What this paper really showed me is we need not only to change programming practices and constructs, but we also need to design solutions that allow for deep parallelism to begin with. Grafting parallelism on later is difficult. Parallel execution requires knowing precisely how components are dependent on each other and that level of precision tends to go far beyond the human attention span.
In particular this paper deals with how to parallelize the browser on cell phones. We are entering a multi-core smartphone dominated world. As network connections become faster, applications, like the browser, become CPU bound:
On an equivalent network connection, the iPhone browser is 5 to 10 times slower than Firefox on a fast laptop. The browser is CPU-bound because it is a compiler (for HTML), a page layout engine (for CSS), and an interpreter (for JavaScript); all three tasks are on a user’s critical path.
To speed up the browser they worked on: offloading computation, removing the abstraction tax, and parallelizing the browser using energy efficient data and task approaches. The problem is technologies like HTML, CSS, DOM, Javascript, events, and page layout were not designed to be parallel. They were designed to be run on a single CPU. And the paper goes to brilliant and heroic lengths to parallelize this part of the stack. They designed new work-efficient FSM algorithms, speculative parallelization for flow layouts, eliminating as much shared state as possible, callback dependency analysis, using actors to implement behaviours, and many more.
What's clear though is their job would have been a heck of a lot easier if the stack would have been designed with parallelization in mind from the beginning.
Leo Meyerovich, one of the authors of the paper, talks about the need for a more rigorous underpinning in blog postThe Point of Semantics:
As part of the preparation for a paper submission, I'm finishing up my formalization of a subset of CSS 2.1 (blocks, inlines, inline-blocks, and floats) from last year. My first two, direct formalization approaches failed the smell test so Ras and I created a more orthogonal kernel language. It's small, and as the CSS spec is a scattered hodge-podge of prose and visual examples riddled with ambiguities, we phrase it as a total and deterministic attribute grammar that is easy to evaluate in parallel.
I asked Leo what rules we could follow to create more parallelizable constructs from the beginning and he said that's what he'll be working on for the next couple years :-) Some advice he had was:
Some things Leo will be working on are:
I've been enjoying higher-order data flow models (Flapjax) and task parallelism (Cilk++) for awhile now and have been thinking about this, including support for controlled sharing (e.g., SharC for type qualifiers and I'm still trying to figure out implicitly transactional flows for FRP). For a browser, I think it will remain as specialized libraries written in privileged languages where good engineers can rock and put together and be exposed in higher-level languages. Hopefully gradually typing will extend into lower levels to support this. The above hints at a layered framework with the bulk in the high-level -- think parallel scripting. However, as a community, we don't know how to include performance guides in large software, so parallelism is a challenge. I prototyped one of my algorithms in a parallel python variant: the sequential C was magnitudes faster than then 20-core python. Of course, the parallel C++ was even faster :)
Related Articles
Browsing Web 3.0 on 3 Watts