hot links

Stuff The Internet Says On Scalability For August 26, 2011

High Scalability

26 Aug 2011 — 4 min read

You may not scale often, but when you scale, please drink HighScalability:

Facebook Hits 1 Trillion Pageviews; IBM builds 120 petabyte cluster out of 200,000 hard drives; Codecademy Surges To 200,000 Users, 2.1 Million Lessons Completed In 72 Hours;
Potent quotables:
- @kimberlycraven : To address scalability @ integration, use asynchronous communication, granularity, proximity and partitioning
- @scankp : Turn up the bitrate to the data furnace honey. We’re expecting a frost tonight. We can use the extra cash, too
- @gazzar_rj : The need for scale-out, automated, one storage pool +PB, linear performance scalability, the same admins" Sujal Patel
- @tweetimages : We served 551,296,643 @twitter avatars for the month of July. Not bad!
- @codie : Tech interview pro tip: when you're asked for a high-level app design, do factor in biz analytics. Its importance is at par w/ scalability.
You thought a memcache service would be a great product idea for Amazon? So did Amazon. They've released ElastiCache - a web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud. Aren't you glad you waited on the idea? Sebastian Kreutzberger warns Amazon is making the same mistakes by hosting the service in only one Availability Zone. Also take a look at Sebastian's very well done Scalability for Dummies series.
Is That Review a Fake? I've noticed spam articles and comments follow a similar pattern: trying to write something while knowing nothing.
Why is Stack Overflow so awesome?, a happy user from China asks. CDN, clean living, clean code, small code, and a little effort continuously applied: A day in the life of a slow page at Stack Overflow
Twitter is not lying when they say the've released Finagle: A Protocol-Agnostic RPC System. Finagle is a protocol-agnostic, asynchronous RPC system for the JVM that makes it easy to build robust clients and servers in Java, Scala, or any JVM-hosted language. It consists of: connection pools, failure detector, failover strategies, load-balancers, back-pressure techniques. In other words, it's all growns up.
Distributed Storage, Phase Change Memory and the Rebirth of the In-Memory Database. Ben Stopford conjectures: whilst the disruption of late may have been lead by the ‘big-data’ driven, Internet behemoths, the next set of disruptive technologies may well come from enterprise space. Enterprise users’ need for fast analytical processing will drive the reinvention of in-memory databases. By enterprise Ben is talking OLAP, which can fit in RAM for many enterprises.
William Louth will take a Pass on PaaS. When applications become mere containers for piping data between other services over a grid, a programmer somewhere loses a compiler. Insightful stuff: The last thing we need at the moment is a black box container for the edge execution node that restricts the service world to that of a particular PaaS vendor.
Switch Fabrics: Fabric Arbitration and Buffers by Greg Ferro. Packets go from port A to port B over some sort of a high speed switch fabric. Greg dives into more detail on Crossbar switching. It's all about buffer management, which is why you should never use a light weight fabric like chiffon.
Meet zCloud: The Private Cloud Infrastructure Behind Zynga. Games have demand arcs. Plot the slope of that curve, is it vertical or a more traditional growth, and plan your mix of public and private cloud resources. CityVille rapidly grew to millions of users in just six weeks on the back of AWS, trading he cost of operating expense in AWS for capital expense. Their infrastructure: redundant power to each rack, state-of-the-art servers with high memory capacity, a fully non-blocking network infrastructure, the use of inline hardware-based load balancers and local disk storage, Rightscale, Cloud.com, Apache, MySQL, memcache, Couchbase and Nagios.
How do you make scalable high speed counters on Google App Engine's HRD? Start with super scalable counter? Nope, try memcache with periodic writes using cron or tasks and/or backends. Another thread on the same question. Memcache won't be fast enough. Ikai Lan talks about sharding counters.
Reliable, Scalable, and Kinda Sorta Cheap: A Cloud Hosting Architecture for MongoDB. Cody Powell with a really nice description of how to set up replica sets on MongoDB.
If you are interesting in building high availability using VMware, Ivan Pepelnjak has some High Availability Falacies you may want to consider. I have some bad news for the true believers in virtualization-supported high availability – quite a few of them probably don’t understand how it works.
Cassandra does not use vector clocks and here's the fascinating long and intricate story of why. The primary use case for classic vector clocks of merging non-conflicting updates to different fields w/in a value is already handled by cassandra breaking a row into columns.
Script to install CouchDB on an AMI. On a related note, using Puppet to install HBase.
So you're using S3 to serve your assets, eh? You should rethink that. S3 Bandwidth is Expensive; S3 Requests come from a central location; CloudFront is expensive. MaxCDN is used as better solution. A commenter points out the potentially interesting economics of a CDN: CloudFront REDUCED our total bill. CloudFront only pull the file from s3 once and then caches it, the cost there is extremely small. That means switching to CloudFront would save you 92.5% on the requests portion of your bill.
Don't write lexers and parsers with regular expressions as the
starting point. Your code will be faster, cleaner, and much easier to

understand and to maintain says Rob Pike. On Hacker News. I think if you are Rob Pike this is sound advice.
Cloud Wars: Windows Azure vs. Amazon AWS, an IaaS perspective. Nice table comparing the different features. Amazon is better at naming, but they match well point by point. Speaking of Amazon, here are some spot instance videos to watch between Khan Academy and cat videos.
Netmap: A Novel Framework for High Speed Packet I/O
Scaling with Single Threading by John Urberg. While a single threaded architecture may not fit every application, it is an important architecture to keep in your tool belt. If you are having trouble scaling your application.
Two New Videos: SuperPages and NanoBSD. Superpages aggregate together standard-sized hardware pages into much larger "superpages".

Stuff The Internet Says On Scalability For August 26, 2011

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale