Stuff The Internet Says On Scalability For July 8, 2011

Submitted for your scaling pleasure:

  • Facebook confirms 750 million users, sharing 4 billion items daily; Yahoo: 42,000 Hadoop nodes storing 180-200 petabytes; Formspring hits 25 million users.
  • Zynga's Cadir Lee: It’s not the amount of hardware that matters. It’s the architecture of the application. You have to work at making your app architecture so that it takes advantage of Amazon. You have to have complete fluidity with the storage tier, the web tier. We are running our own data centers. We are looking more at doing our own data centers with more of a private cloud.
  • Love the sensing making described by Hunch’s Infographic on their Taste Graph. 500 million people, 200 million items, 30 billion edges. 48 processors. 1 TB RAM.
  • Is MongoDB is the New MySQL? Stephen O'Grady thinks so using a worse is better argument: wide adoption by applications, enterprise inroads, simple feature set, and the number complainers. Who plays PostgreSQL in this movie?
  • Java is the startup founder getting kicked out of the startup it helped create. Programmers want to use the JVM as a common integration point while not using Java. How embarrassing. On this theme, Havoc cries: Keep the JVM, dump the rest (Scala+Play+MongoDB).
  • Nice Q&A on ServerFault describing how to scale out software load balancer. Why chickens are involved I have no idea.
  • Apache Hadoop Goes Realtime at Facebook. This paper describes the reasons why Facebook chose Hadoop and HBase over other systems such as Apache Cassandra  and  Voldemort  and  discusses the application requirements  for consistency, availability, partition tolerance, data model and scalability.
  • From our scalability in different contexts department: Large Scale, Decentralized Humanure Toilets in Haiti. Dry, composting toilets truly create a complete cycle of life, as the wastes are collected, treated, and transformed into fertile gardens, thus reducing the need for foreign aid, interrupting the spread of disease and providing jobs, education, and dignity.
  • Curt Monash with some great stats on petabyte cluster stats. 7 Vertica installations with a petabyte plus. Cloudera contributes 22 petabyte size Cloudera Distribution [of] Hadoop) clusters. Probably 10 petabyte clusters live at Yahoo.
  • leveldb - a fast and lightweight key/value database library from the authors of  MapReduce and BigTable. It's a library and not a server, stores data in sorted order, written in C++, uses variable sized keys to save memory, and uses Log-Structured Merge Trees for better random write performance (compared to btrees).  It always appends to a log file, or merges existing files together to produce new ones. So an OS crash will cause a partially written log record (or a few partially written log records). Leveldb recovery code uses checksums to detect this and will skip the incomplete records.On Hacker News.
  • InfoQ with a good article on Twitter Shifting More Code to JVM, Citing Performance and Encapsulation As Primary Drivers. Title says it all. "But as we move into a light-weight Service Oriented Architecture model, static typing becomes a genuine productivity boon. And Scala gives you the same thing."
  • Let's wish IBM luck with their new kind of “universal” memory chip that can record data 100 times faster than today’s flash memory chips. That means scientists are one step closer to creating a universal memory chip that is fast, permanent, and has lots of capacity. Someday one of these technologies may even hit the real world. That would be cool.
  • Brawny cores still beat wimpy cores, most of the time. Google's Urs Hölzle finds slower but energy efficient “wimpy” cores only win for general workloads if their single-core speed is reasonably close to that of mid-range “brawny” cores.