Stuff The Internet Says On Scalability For July 22, 2011

Submitted for your scaling pleasure: 

For a lot more Stuff the Internet says, please read more below...

Click to read more ...


Netflix: Harden Systems Using a Barrel of Problem Causing Monkeys - Latency, Conformity, Doctor, Janitor, Security, Internationalization, Chaos

With a new Planet of the Apes coming out, this may be a touchy subject with our new overlords, but Netflix is using a whole lot more trouble injecting monkeys to test and iteratively harden their systems. We learned previously how Netflix used Chaos Monkey, a tool to test failover handling by continuously failing EC2 nodes. That was just a start. More monkeys have been added to the barrel. Node failure is just one problem in a system. Imagine a problem and you can imagine creating a monkey to test if your system is handling that problem properly. Yury Izrailevsky talks about just this approach in this very interesting post: The Netflix Simian Army.

I know what you are thinking, if monkeys are so great then why has Netflix been down lately. Dmuino addressed this potential embarrassment, putting all fears of cloud inferiority to rest:

Unfortunately we're not running 100% on the cloud today. We're working on it, and we could use more help. The latest outage was caused by a component that still runs in our legacy infrastructure where we have no monkeys :)

To continuously test the resilience of Netflix's system to failures, they've added a number of new monkeys, and even a gorilla:

Click to read more ...


Building your own Facebook Realtime Analytics System  

Recently, I was reading Todd Hoff's write-up on FaceBook real time analytics system. As usual, Todd did an excellent job in summarizing this video from Engineering Manager at Facebook Alex Himel.

In the first post, I’d like to summarize the case study, and consider some things that weren't mentioned in the summaries. This will lead to an architecture for building your own Realtime Time Analytics for Big-Data that might be easier to implement, using Facebook's experience as a starting point and guide as well as the experience gathered through a recent work with few of GigaSpaces customers. The second post provide a summary of that new approach as well as a pattern and a demo for building your own Real Time Analytics system..

Click to read more ...


New Relic Architecture - Collecting 20+ Billion Metrics a Day

This is a guest post by Brian Doll, Application Performance Engineer at New Relic.

New Relic’s multitenant, SaaS web application monitoring service collects and persists over 100,000 metrics every second on a sustained basis, while still delivering an average page load time of 1.5 seconds.  We believe that good architecture and good tools can help you handle an extremely large amount of data while still providing extremely fast service.  Here we'll show you how we do it.

  •  New Relic is Application Performance Management (APM) as a Service
  •  In-app agent instrumentation (bytecode instrumentation, etc.)
  •  Support for 5 programming languages (Ruby, Java, PHP, .NET, Python)
  •  175,000+ app processes monitored globally
  •  10,000+ customers

The Stats

Click to read more ...


Stuff The Internet Says On Scalability For July 15, 2011

Submitted for your scaling pleasure: 

  • That's a lot of data...CERN: ATLAS produces up to 320M bytes per second, followed by CMS with 220M Bps. Amazon Cloud Now Stores 339 Billion Objects. CERN also has an open source hardware effort.
  • Domas Mituzas on why Facebook may just outlast their MySQL heritage: I feel somewhat sad that I have to put this truism out here: disks are way more cost efficient, and if used properly can be used to facilitate way more long-term products, not just real time data. Think Wikipedia without history, think comments that disappear on old posts, together with old posts, think all 404s you hit on various articles you remember from the past and want to read. Building the web that lasts is completely different task from what academia people imagine building the web is. What happens in real world if one gets 2x efficiency gain? Twice more data can be stored, twice more data intensive products can be launched.
  • Quotes that are quoted because they are quotable:
    • @Werner - If you have never developed anything of that scale you cannot be taken serious if you call for the reengineering of facebook's data store
    • @Werner - Acaling data systems in real life has humbled me. I would not dare criticize an architecture that the holds social graphs of 750M and works
    • Dwight Merriman -  I'm not smart enough to do distributed joins that scale horizontally, widely, and are super fast. You have to choose something else. We have no choice but to not be relational.
For a lot more Stuff the Internet says, please read below...

Click to read more ...


Google+ is Built Using Tools You Can Use Too: Closure, Java Servlets, JavaScript, BigTable, Colossus, Quick Turnaround

Joseph Smarr, former CTO of Plaxo (which explains why I recognized his picture), in I'm a technical lead on the Google+ team. Ask me anything, reveals the stack used for building Google+:

Click to read more ...


Sponsored Post: New Relic, eHarmony, TripAdvisor, NoSQL Now!, Surge, BioWare, Tungsten, deviantART, Aconex, Hadapt, Mathworks, AppDynamics, ScaleOut, Membase, CloudSigma, ManageEngine, Site24x7

Who's Hiring?

  • TripAdvisor is Hiring Engineers at all Levels: Scalable Web Engineering Program. To apply for our Scalable Web Engineering Program, visit
  • Are you a scalability expert? eHarmony is looking for Senior Java Engineers to help implement and scale our Matching compatibility systems. Please visit:
  • BioWare Austin is looking for a Performance Test Engineer for our Austin team. To apply, please visit
  • BioWare Austin is looking for a Contract Build Engineer for our Austin team. To apply, please visit
  • deviantART is looking for Network and Systems Operations Engineer. Please apply here.
  • Aconex is looking for a Systems Engineer in San Bruno. Please apply here.
  • Hadapt brings high-performance SQL to Hadoop, and is looking for a systems engineer to join this fast-growing company. Please apply at
  • MathWorks Looking for Multiple, Full-time Scaling Experts. Apply now: 

Fun and Informative Events

  • NoSQL Now! is a new conference covering the dynamic field of NoSQL technologies. August 23-25 in San Jose. For more information please visit:
  • Surge 2011: The Scalability and Performance Conference. Surge is a chance to identify emerging trends and meet the architects behind established technologies. Early Bird Registration.
  • Join our webinar as we introduce Tungsten Enterprise Summer '11 Edition with improved usability, performance and ease of management for MySQL and PostgreSQL clusters.
  • Couchbase is having a Special Offer for Apache CouchDB Developer Training!

Cool Products and Services

  • New Relic - real user monitoring optimize for humans, not bots. Live application stats, SQL/NoSQL performance, web transactions, proactive notifications. Take 2 minutes to sign up for a free trial.
  • AppDynamics is the very first free product designed for troubleshooting Java performance while getting full visibility in production environments. Visit
  • ScaleOut StateServer - Scale Out Your Server Farm Applications! 
  • CloudSigma. Instantly scalable European cloud servers.
  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
  • : Monitor End User Experience from a global monitoring network.

For a longer description of each sponsor, please read more below...

Click to read more ...


ATMCash Exploits Virtualization for Security - Immutability and Reversion

This is a guest post by Ran Grushkowsky, Head of Technology at ATMCash.

Virtualization and cloud-based systems are very hype in the industry; however, most financial companies stray from those solutions. At ATMCash, we’ve approached virtualization not for the usual reason of scalability, but for the usually missed value of security.

In this article, I will introduce the concept of security added value in the utilization of virtualization and why people should consider deploying mini-clouds for those use-cases. 

How do virtual machines help mitigate risk?

I am sure most of you have heard of the recent hacking in the financial sector. Financial companies are under constant hacking attempts and security is of the utmost importance. One of the bigger risks in system deployment is a breach in one of the stack components. Regular system patches and maintenance fix known exploits and issues, however, sometimes it may be too late and the component has already been breached. If the system has already been compromised in a natural environment where patches are applied to existing systems, sometimes the patch may come too late and a Trojan horse or some sort of malicious code has already been injected (as may have been seen in recent cases). Virtual machines provide a great hidden gem: immutability and reverting images

Example of how ATMCash uses those features for security in the stack:

Click to read more ...


Stuff The Internet Says On Scalability For July 8, 2011

Submitted for your scaling pleasure: 

For even more Stuff the Internet says, please read more below...

Click to read more ...


Myth: Google Uses Server Farms So You Should Too - Resurrection of the Big-Ass Machines

For a long epoch there was a strategy of scaling up by making ever bigger super computers. I had the pleasure of programming on a few large massively multi-processor machines from SGI and DEC. Beautiful, highly specialized machines that were very expensive. These met the double-tap extinction event of Moore's law and a Google inspired era of commodity machine based clusters and extreme software parallelism. Has the tide turned? Does it now make more sense to use big machines instead of clusters?

In Big-Ass Servers™ and the myths of clusters in bioinformatics, Jerm makes the case that for bionformatics, it's more cost effective to buy a Big-Ass Server instead of using a cluster of machines and a lot of specialized parallel programming techniques. It's a classic scale-up argument that has been made more attractive by the recent development of relatively inexpensive large machines. SeaMicro has developed a 512 core machine. Dell has a new 96 core server. Supermicro has 48 core machines. These are new options in the scale-up game that have not been available before and could influence your architecture choice.

Jerm's reasoning for preferring big-ass servers is:

Click to read more ...