Stuff The Internet Says On Scalability For April 27, 2012

It's HighScalability Time:

  • Quotable quotes:
    • The unbearable cost of the cloud? @marcoarment: My experiment with Amazon CloudSearch was going very well until I imported half of subscribers' bookmarks and saw it would cost $10k/month.
    • @JBossMike: Serious question: Do the Node.js hipsters enjoy not knowing WTF they're talking about when they talk about software scalability?
    • @timoelliott: The #Fashion industry doesn't have #bigdata -- they call it size XXL data instead...
  • Some steamy employee policies at Valve: no managers, 100% self-directed projects, crepe flat organizational structure, completely self owned.
  • Good discussion on the advisability of early stage ventures developing APIs that must be supported forever. Charlie Kindel says Don’t Build APIs, unil you get more experience with your product you'll just get them wrong and be stuck with suppoprt. Pretty much everyone else disagrees, thinking the value of APIs outshines the risk.  Insightful discussion on the Guerrilla Capacity Planning group.
  • Amen.  John Sloan in All the Interesting Problems Are Scalability Problems is spot on: Writing software for these tiny microcontrollers forces you to consider serious resource constraints. To face time-space tradeoffs right up front. To really think about not just how to scale up, but how to scale down. To come to grips with how things work under the hood and make good decisions. There is no room to be sloppy or careless.
  • A New Breed of Heterogeneous Computing. Interesting idea of the big/little model having serial CPU chip with and a throughput coprocessor cores on the same die. Provides better throughput and higher energy efficiency along with single-threaded performance. This approach differs from a CPU-GPU integration because the bit/little processors will share the same instruction set.
  • IBM has their own ideas on what future computers will look like with their 98,304-core IBM BlueGene/Q supercomputer. “Today the Blue Gene/Q has 16GB of memory per core, if we are going to 1GB or 0.5GB per core [in future machines] we’re going to have to do a major redesign of the code. Future CPUs are likely to include additional circuitry to aid processing, such as field programmable gate arrays. Software will also need to be rewritten to take advantage of the more diverse range of processing units inside the chips of the future."
  • If you are looking for detailed tech talk on hardware then watch This Week in Computer Hardware. Excellent and insightful discussions. 
  • Baseball, Business, And Big Data: Place a lot of small, low-ball bets, rather than a few big ones, and those you win will really pay off. You won’t get the stars, but you’ll get a lot of good producers. And it’s the way businesses need to deal with the uncertainties in innovation and technological progress as well.
  • Code green. How we program for mobile devices has a huge impact on battery life. Quite an interesting explanation in Bloated website code drains your smartphone's battery. Please help. We can make a difference if we all program this together. 
  • A parallel and lock free bonanza: Multithreaded data structures for parallel computing, Practical lock-free data structures, Intel's Threading Building Blocks,  Hacker News
  • Are you running on the right size instance? 53% are paying for more than double the amount of power that they need. Also, interesting thread on the lack of caps
  • I'd like to write more about cloud providers other than Amazon, but there's really not much material. Here's something about moving from EC2 to Rackspace. AWS wins on features and price. Rackspace has better support. Akamai as the CDN provider is superior, and the performance is better. Good discussion on Hacker News.
  • Edward Capriolo makes the point the 5% overhead for virtualization means you are still paying a penalty for nothing. Virtualization is a win for cloud providers, but since they don't give you the price break all you are doing is paying more.
  • Airbrake is changing and a new stack may be evolving. Ruby is being replaced with Go and Mongo is being replaced with Riak. Go is a go because it "provides the right mix of message queuing, systems programming and exceptional concurrency." Riak was chosen to handle their expanding traffic (storing 100 millon errors a day).
  • Clicks in search. Hugh Williams with a fascinating discussion of how inverse power law distributions everywhere: search results click curve, query curve, document access curve, clicks on related searches, clicks on just about any list, and word frequencies. In fact:  when you see a curve that isn’t an inverse power law distribution, you should worry. There’s probably something wrong. Also, Ranking at eBay (Part #1)
  • A High Frequency Trader's Apology, Pt 1. Simple and clear explanation of the gambling technology that is HFT.
  • Efficient online index maintenance for contiguousinverted lists: In this paper, we experimentally evaluate the two main alternative strategies for index maintenance in the presence of insertions, with the constraint that inverted lists remain contiguous on disk for fast query
  • Lu Dongxiang was kind enough to translate 7 Years Of YouTube Scalability Lessons In 30 Minutes into Chinese.