Stuff The Internet Says On Scalability For August 29th, 2014

Hey, it's HighScalability time:


In your best Carl Sagan voice...Billions and Billions of Habitable Planets.

  • Quotable Quotes:
    • @Kurt_Vonnegut: Another flaw in the human character is that everybody wants to build and nobody wants to do maintenance.
    • @neil_conway: "The paucity of innovation in calculating join predicate selectivities is truly astounding."
    • @KentBeck: power law walks into a bar. bartender says, "i've seen a hundred power laws. nobody orders anything." power law says, "1000 beers, please".
    • @CompSciFact: RT @jfcloutier: Prolog: thinking in proofs Erlang: thinking in processes UML: wishful thinking

  • For your acoustic listening pleasure let me present...The Orbiting Vibes playing Scaling Doesn't Matter. I don't quite understand how it relates to scaling, but my deep learning algorithm likes it. 

  • The Rise of the Algorithm. Another interesting podcast with James Allworth and Ben Thompson. Much pondering of how to finance content. Do you trust content with embedded affiliate links? Do you trust content written by writers judged on their friendliness to advertisers? Why trust at all is the bigger question. Facebook is the soft news advertisers love. Twitter is the hard news advertisers avoid. A traditional newspaper combined both. Humans are the new horses. < Capitalism doesn't care if people are employed anymore than it cared about horses being employed. Employment is simply a byproduct of inefficient processes. The Faith that the future will provide is deliciously ironic given the rigorous rationalism underlying most of the episodes.

  • Great reading list for Berkeley CS286: Implementation of Database Systems, Fall 2014. 

  • Is it just me or is it totally weird that all the spy systems use the same diagrams that any other project would use? It makes it seem so...normal. The Surveillance Engine: How the NSA Built Its Own Secret Google.

  • The Mathematics of Herding Sheep. By little border collie Annie embodies a very smart algorithm to herd sheep:  When sheep become dispersed beyond a certain point, dogs put their effort into rounding them up, reintroducing predatory pressure into the herd, which responds according to selfish herd principles, bunching tightly into a more cohesive unit. < What's so disturbing is how well this algorithm works with people.

  • Inside Google's Secret Drone-Delivery Program. What I really want are pick-up drones, where I send my drone to pick stuff up. Or are pick-up and delivery cars a better bet? Though I can see swarms of drones delivering larger objects in parts that self-assemble

  • Lambda Architecture at Indix: "break down the various stages in your data pipeline into the layers of the architecture and choose technologies and frameworks that satisfy the specific requirements of each layer." Started with HBase to keep a copy of millions of web pages. Hadoop to run machine learning algorithms. They experienced data corruption and data loss issues and operational issues. The new improved version uses the Lamda Architecture: batch, serving, speed. Batch computes arbitrary functions using HDFS and Scalding. Serving serves precomputed views at low latency. Uses HBase, Solr, Oogway. Speed exposes new data. Uses a stripped down version of the batch layer. Result: handles 3x more data. 

  • Is your product insightful? Jonathan Rosenberg: At Google we realized a few years back that our successful products were based on a strong set of technical insights...Pixar pioneered the art of using computer animation to tell compelling human stories...Salesforce understood that enterprise software could be built in the cloud...Uber enabled people to use their smartphones to hail and pay for cabs. < The comments have more great examples.

  • Not wanting to be left out of the Backend-as-a-Service market, Azure has released a preview of their Azure DocumentDB. Here's a good primer. Looks like a good first cut at a document database. It supports search out of the box, which is unusual. Ayende@Rahien came up with a large list of where it falls short: no sorting options; lots of table scans; poor API; small document sizes; no cross document transactions; etc. Keep in mind it's a preview. These are not uncommon limitations for early document database versions. I didn't see pricing details, which could make a big difference.

  • Lots of specific advice here. A must read if this is your problem to solve. Billion Messages - Art of Architecting scalable ElastiCache Redis tier:   In this article i am going to share my experience on designing large scale Redis tiers supporting billions of messages per day on AWS, step by step guide on how to deploy the same, what are the Implications you face at scale? Best Practices to be adopted while designing sharded+replicated Redis Tiers etc

  • Context, Situation, Components, PaaS, Dead or Alive … it's all semantics isn't it?: Forget it, Docker will become a highly useful but also invisible component of PaaS and the success of PaaS will depend upon the limitation of choice and certainly not the exposure of underlying systems like Docker to end users. It’s extremely easy to take a path that will lead you down a route of sprawl. There are some very exceptional edge cases where you will need such flexibility but these are niches. I'm afraid some businesses however will probably get suckered into these dead ends. 

  • Flickr's Performance improvements for photo serving. They added addition regional caches. The result is a 2x improvement in image load times for impacted users. If you aren't in the US a regional cache improves load latencies 10x. Which leads to the idea prefetching to warm caches before users actually request an image. When a user requests images they are sent to the regional cache so they can be loaded much faster. Result was a reduced median latency of more than 200 ms and 95% of photos requests sped up by at least 100 ms. If you want more details the article has a very useful explanation of the process. 

  • Without a viable M2M network the Internet of Things won't happen. Here's something interesting Intel Launches Tiny 3G Modem for IoT Devices. Don't see costs or power draw however.

  • A very cool post on Simulating a global Ebola outbreak using tools Wolfram. It's one of those things that seems straightforward to the author, but looks like magic to me. 

  • Perl lives! We have the proof of life videos...from YAPC::Europe 2014, which are now available.

  • Good overview of How Flash changes the design of database storage engines: the increased speed and parallelism of Flash — perhaps 100,000 operations per second, compared to 300 for typical disks — create the most change to database design. The traditional design of relational databases, with one thread per connection, worked fine when disks were bottleneck, but now the threads become the bottleneck.

  • Irmin: a library to persist and synchronize distributed data structures both on-disk and in-memory. It enables a style of programming very similar to the Git workflow, where distributed nodes fork, fetch, merge and push data between each other. The general idea is that you want every active node to get a local (partial) copy of a global database and always be very explicit about how and when data is shared and migrated.