hot links

Stuff The Internet Says On Scalability For June 3, 2011

Submitted for your scaling pleasure:

Twitter indexes an average of 2,200 TPS (peek is 4x that) while serving 18,000 QPS (1.6B queries per day). eBay serves 2 billion page views every day requiring more than 75 billion database requests.
Quotable Quotes:
- Infrastructure is adaptation --Kenneth Wright, referencing reservoir building by the Anasazi
- bnolan: I see why people are all 'denormalize' / 'map reduce' / scalability. I've seen a bunch of megajoins lately, and my macbook doesnt like them.
- MattTGrant: You say: "Infinite scalability" - I say: "fractal infrastructure"
Like the rich, More is different, says Zillionics. Large quantities of something can transform the nature of those somethings. Zillionics is a new realm, and our new home. The scale of so many moving parts require new tools, new mathematics, new mind shifts. Amen.
Data mine yourself says the Quantified Self. All that jazz about monitoring and measuring services to continually improve them-- that works for you too! You may not be a number, but self-numbers are a path towards being all you can be. Motivated by this same spirit, some time ago I published an empirical process control method for weight control centered on creating and using a feed back system. More at: The 10 Designer Principles for Controlling Your Weight and the The Designer Way.
Josh Berkus with Scale Fail (part 2). More sources of failure: No Caching, Scaling the Impossible Things, SPoF, Cloud Addiction
Twitter is also talking about their new search features in The Engineering Behind Twitter’s New Search Experience. Twitter created new core search components Blender and Earlybird. Blender, replaces the Ruby on Railes front-end, which improved search latencies by 3x, gave 10x throughput, and allowed the removal of RoR from the search infrastructure. Earlybird, a real-time, reverse index based on Lucene, not only gave us an order of magnitude better performance than MySQL for real-time search, it doubled our memory efficiency and provided the flexibility to add relevance filtering.
Gnutella, often a favorite scalability disaster example, because it used a broadcast network, has a defender: Why Gnutella scales quite well. GnuFU explains the architecture used in later versions. Ultrapeers keep a really large address book so not everyone must be pinged to find a file. Dynamic Querying, File-Magnets, Swarming, and Partial File Sharing were other improvements.
The Architecture of PIER: an Internet-Scale Query Processor. PIER is the first general-purpose relational query processor targeted at a peer-to-peer (p2p) architecture of thousands or millions of participating nodes on the Internet. Some day we may actually be able to use all this power.
The Quora post that killed Bitcoins. Please discuss if his arguments are valid. The age old dilimma: don't use it because it won't scale vs why build a scalable version until you know you have to.
0-60: Deploying Goliath on Heroku Cedar. Ilya Grigorik shows how to deploy Goliath, an async ruby web server on Heroku's new process model, which allows any application to be run in their cloud. With GAE now supporting GAE threading, which results in much better performance and lower costs, we might be seeing a trend away from syncronous single threading. Michael van Rooijen with more on that subject in More concurrency on a single Heroku dyno with the new Celadon Cedar stack.
Supporting multiple users, do you use a database per customer or multiple customers per database. Brent Ozar with a good discussion on How to Design Multi-Client Databases. "It depends" is the ultimate answer. Salesforce built an entire infrastructure in order to do multi-tenancy well. It's not easy.
Stumbleupon shows how they use ElasticSearch when Searching for Serendipity in 2.5 billion new data points a week. ElasticSearch is the search analogue to HBase that frees us from some restrictions that Solr imposes.
J2EE & Massive Main Memories(or: rewrite it all from scratch?). Chet Murthy argues persuasively that J2EE has failed to penetrate a new generation of startups because its design limits the use of the large main memories that are required for modern well performing applications.
Using Redis to Control Recurring Sets of Tasks. Alex Rockwell with a nice short example of how to use Redis features to check the availability of a large number of URLs. Let the database do the data structures.
Online migration for geodistributed storage systems, Usenix ATC 2011. Murat Demirbas explores a paper on migrating data between data centers. Data needs to be moved from one center to another based on the access patterns. A signigicant problem when you are working out how to be highly available.
Netflix shares their take on the Performance on Top ISP Networks. Useful and hard to get intel.
Transparent Caching and a A Summary Of Transparent Caching Architectures and Associated Performance by Dan Rayburn. The high bandwidth required for video causes a conflict in financial incentives among the different players. How would you like to be on the hook to provide bandwidth for YouTube and Netflx without access to their higher margins? While CDNs make their money from content publishers who typically pay based on volume, network service providers' money comes from their subscribers who pay a fixed amount per month. So while CDNs (theoretically at least) stand to gain from this increase in video traffic, network service providers are stuck between the proverbial rock and hard place. Transparent caching is network wide caching instead of caching based on contracts ala CDNs. It is embedded inside the carriers network and provides the operator control over what to cache, when to cache, and how fast to accelerate the delivery. Many Telcos, MSOs, and Mobile Operators are now looking at transparent Internet caching as a required element in their network to control “over-the-top” content consumption and to provide the best possible end-to-end user experience. It is a unique technology that simultaneously benefits a content owner, a network operator, and most importantly a broadband or wireless subscriber.
In order to think more effectively about clouds, it is helpful to think about them as one of four types: inside/outside and legacy/webscale says Troy Angrignon in Private Cloud Choices. If you do not require a legacy architecture for your private cloud, you’d gain substantial CAPEX, OPEX and flexibility advantages by choosing the webscale architecture. The other thing to realize is that webscale clouds are moving quickly up the feature ramp as they disrupt the legacy clouds. We’ve seen this recently as AWS has embraced Oracle and SAP.

Stuff The Internet Says On Scalability For June 3, 2011

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale