Stuff The Internet Says On Scalability For April 12, 2013

Hey, it's HighScalability time:


(Ukrainian daredevil scaling buildings)

  • 877,000 TPS: Erlang and VoltDB. 
  • Quotable Quotes:
    • Hendrik Volkmer: Complexity + Scale => Reduced Reliability + Increased Chance of catastrophic failures
    • @TheRealHirsty: This coffee could use some "scalability"
    • @billcurtis_: Angular.js with Magento + S3 json file caching = wicked scalability
    • Dan Milstein: Screw you Joel Spolsky, We're Rewriting It From Scratch!
    • Anil Dash: Terms of Service and IP trump the Constitution
    • Jeremy Zawodny: Yeah, seek time matters. A lot.
    • @joeweinman: @adrianco proves why auto scaling is better than curated capacity management. < 50% + Cost Saving
    • @ascendantlogic: Any "framework" naturally follows this progression. Something is complex so someone does something to make it easier. Everyone rushes to it but needs one or two things from the technologies they left behind so they introduce that into the "new" framework. Over the years everyone's edge cases are accounted for with frameworks on top of frameworks and suddenly everyone is looking for the next big simplification.
  • Imagine if you had a beowulf cluster of tiny antennas? You could build a TV rebroadcasting service that has old media running for the Galt's Gulch of pay TV.

  • As a technologically advanced nation, why haven't we done this yet? Nationwide Google Fiber would cost $11B over five years, probably will never happen. I say this while using my nation wide power/telephone/road/defense system.

  • Great list of technical talks. I'm partial to Big Ball of Mud.

  • Making Black Swans work for you: Stick to simple rules; Decentralize; Develop layered systems; Build in redundancy and overcompensation; Resist the urge to suppress randomness; Ensure everyone has skin in the game; Give higher status to practitioners rather than theoreticians.

  • Edmund Jorgensen goes all counter intuitive in When it Comes to Chaos, Gorillas Before Monkeys: I believe that startups should (mostly) worry less about EC2 instances failing, and more about entire AZs degrading. This leads to a different kind of initial tech/devops investment—one that I believe represents a better return for most early-stage companies and products.

  • Do you really feel comfortable with a currency whose supply can be grown via anyone with a 50 GH/s Bitcoin Miner?

  • Good thread on Measuring Riak disk usage. Riak stores 3 copies of the data  to automatic detect and corrects of data loss / corruption. This can be turned off.

  • How can applications be built on eventually consistent infrastructure given no guarantee of safety? UC Berkeley's Peter Bailis and Ali Ghodsi "describe several notable developments in the theory and practice of eventual consistency, with a focus on immediately applicable takeaways for practitioners running distributed systems in the wild. We—and several others—are developing transactional algorithms that show this need not be the case. By rethinking the concurrency-control mechanisms and re-architecting distributed databases from the ground up, we can provide safety guarantees in the form of transactional atomicity, ANSI SQL Read Committed and Repeatable Read, and causality between transactions—matching many existing ACID databases—without violating high availability. "

  • LOL: Edward Capriolo replicates Netflix's Cassandra 1,000,000 inserts/sec on raspberry pi

  • That's one way to look at it: Why Startups Should Choose Canada Over Silicon Valley: But for all the positives of the Bay Area, there’s one downside that few talk about which can kill startups: false positives. False positives lead to premature scaling. And premature scaling leads to startup’s death.

  • Nice list of tools for analyzing time-series data.

  • Sweet series of articles by Steve Sistare on Massive Solaris Scalability. In the age of Linux it's sad to think how much impressive OS work may go unremembered. The series starts by asking "How do you scale a general purpose operating system to handle a single system image with 1000's of CPUs and 10's of terabytes of memory?" and goes on with a great breakdown of all the different issues involved.

  • Kickstarter disects a MySQL replication problem in The Day the Replication Died that caused inconsistent data for over a month. A nonstarter. The eventual cause was that ORDER BY clauses are sometimes ignored which causes data to be written in a different order on the master than slaves.

  • A quick message queue benchmark: ActiveMQ, RabbitMQ, HornetQ, QPID, Apollo: Except for big messages, RabbitMQ seems to be the best bet as it outperforms others by a factor of 3.

  • What's new in Lucene and Solr 4.x. There's always been a thought that if I have this nifty search engine why would I need a separate key-value database too? Looks like Solr is adding NoSQL capabilities that may make that true. Lots of other cool new features too: multi-tenancy support; more powerful geospatial queries; and SolrCloud, with dynamically scalable auto sharding.

  • The continued adventures...Actually running Magento on Amazon’s Elastic Beanstalk Cloud platform. That's a lot of steps, but I guess it always a lot of steps.

  • The Internet will provide. In the past I've wanted exactly this, now it exists - elephant: Elephant is an S3-backed key-value store with querying powered by Elastic Search. Your data is persisted on S3 as simple JSON documents, but you can instantly query it over HTTP. Suddenly, your data becomes as durable as S3, as portable as JSON, and as queryable as HTTP.

  • Forget the battle between good and evil. Sometimes more is better. Sometimes less is better. Here's another win for less - In-kernel memory compression: Wouldn't it be nice if it were possible to increase the effective amount of data stored in RAM? And, since those CPUs are waiting anyway, perhaps we could use those spare CPU cycles to contribute towards that objective? This is the goal of in-kernel compression: We keep more data — compressed — in RAM and use otherwise idle CPU cycles to execute compression and decompression algorithms.

  • Do 100Gbps links mean we are going back to a mainframe architecture? dekhn says it best: The PC (including servers) world went down a path of high physical integration- the bus, the CPU, the memory, and the peripherals are all one unit that couldn't really be carved up. Now people are realizing they can build a server from a bunch of parts connected by fast fabric, and you can pull/place items in the fabric and use them immediately, or take them apart. The MULTICS machine was actually carved into two pieces live, every night, to run two instances, then re-merged in the morning(!) Hardware virtualization - and component aggregation - were always great ideas, and now the technology exists to deploy it at the middle-tier commodity server level.

  • In Parallel Parsing isn't Hard (Or, Parallel JSON via Web Workers!) Leo Meyerovich identifies a powerful technique that can help anyone parsing large chunks of data.

  • Harmonic Averaging of Monitored Rate Data: The correct way to average rates (i.e., inverse-time metrics) is to apply the harmonic mean, not the arithmetic mean.

  • Alex Podelko tackles a big topic - Performance vs. Scalability: The frontend performance indeed doesn’t matter here and is independent of scalability. But the backend performance is directly related to scalability. The relationship, of course, may be non-linear and quite sophisticated – but it does exist.

  • Facebook's LinkBench: A database benchmark for the social graph. Most interesting is the explanation of the model used to think about social graph problems.

  • Interview with Michael Blaha on Graphs vs. SQL: “For traditional business applications, the schema is known in advance, so there is no need to use a graph database which has weaker enforcement of integrity. If instead, you’re dealing with at best a generic model to which it conforms, then a schema-oriented approach does not provide much. Instead a graph-oriented approach is more natural and easier to develop against.”

  • The Bw-Tree: A B-tree for New Hardware: Our new form of B tree, called the Bw-tree achieves its very high performance via a latch-free approach that effectively exploits the processor caches of modern multi-core chips. Our storage manager uses a unique form of log structuring that blurs the distinction between a page and a record store and works well with flash storage. 

  • 17 answers to why Quora uses MySQL instead of NoSQL...easier to spell not among the reasons. Some reasons: it just works, NoSQL isn't stable yet, Quora is read heavy vs write heavy, it's not real-time, Quora is solving relationship type problems, architecgture is more important than technology. These answers are a couple of years old and no doubt the NoSQL sucks arguments have been largely iterated out of existence.