Stuff The Internet Says On Scalability For January 2nd, 2015

Hey, it's HighScalability time:


From Introduction to Metabolic Scaling Theory - From cells to ecosystems

  • 53 kilobytes: total amount of RAM in the world in 1953; 180-200 million: daily transactions at The Weather Channel; 
  • Quotable Quotes
    • Enquist, Brian: Life operates over 21 orders of magnitude in size - From Unicells to Whales and Giant Sequoias 
    • George Dyson: Digital computers translate between these two forms of information—structure and sequence—according to definite rules. Bits that are embodied as structure (varying in space, invariant across time) we perceive as memory, and bits that are embodied as sequence (varying in time, invariant across space) we perceive as code. Gates are the intersections where bits span both worlds at the moments of transition
    • : what is “scaling”? In its most elemental form, it simply refers to how systems respond when their sizes change
    • @muratdemirbas: Eventual consistency should not come to mean "Only God can judge me".
    • Raffi Krikorian: Every Problem is a Scaling Problem
    • The High-Interest Credit Card of Technical Debt: Experience has shown that the external world is rarely stable.
    • @Apcera: "#HybridCloud ROI isn’t there, & the complexity is huge." via @stevesi @Recode http://ow.ly/Gspxq  Time for a new solution in 2015. #PaaS
    • Nathan Bronson: I believe that to tackle big problems one must factor complexity into pieces that can each fit in someone’s brain, and that the key to such factoring is to create abstractions that hide complexity behind a simple mental model.

  • A prediction for the new year: algorithm profilers will be a hot new job category. Optical Illusions That Fool Google-Style Image Recognition Algorithms. SEO and HFT are a kind of profiling, but with the spread of algorithms through the consumption of the world by software, the hacking of all sorts of algorithms for advantage will become a permanent fixture of modern life. One more layer to the game.

  • Interesting idea from Brett Slatkin. Our approach to manufacturing is as quaint as punchcards: You'd turn in your punch cards and hope to get the output a week later — sooner if you were lucky...3D printing is slow. Even though laser printing can produce precision parts like rocket engines, it doesn't scale...To build cars, cell phones, and soda cans you need to produce high volumes quickly...What we need is a way to click a button and launch a manufacturing process.

  • If you need to optimize your Rails App for concurrency here's a good source: Heroku and Puma vs. Heroku and Unicorn. Puma was the winner, improving quality of service and reducing hosting costs. With Puman many fewer dynos were needed. The comment section has a vigorous debate.

  • The Current State of the Blockchain: Bitcoin, in its current state, cannot act as a major transaction network. Because blocks are current limited to be 1 MB in size, Bitcoin is limited to handle roughly 7 transactions per second. In comparison, thousands of credit card transactions happen per second across the world. < Good discussion on reddit. Also, The Blockchain is the New Database, Get Ready to Rewrite Everything

  • Golang gets a lot of love, now here's some tough love. Golang is Trash: Everything about the implementation of the language environment feels amateurish.  The best thing they could do at this point is start working on golang 2.0, with the intent to completely discard the entire toolchain and much of the runtime; Everyday hassles in Go: To show the limitations of some of the archaic concepts present in Go - here are the analysis if some of the features (or the lack of them) I consider unfortunate, with use cases and accompanying code.

  • In your tools are always the seeds of destruction. My $2375 Amazon EC2 Mistake. Crawlers find API keys on GitHub and then use the keys to spinup bitcoin mining instances. In this case 140 servers. Ouch. Great discussion on reddit. You can buy a private account on GitHub, but GitHub should check for keys as well. Clowncopter gives some sage advice: Never use root credentials with AWS. Create an IAM group, attach policies granting it the absolute minimal required permissions, place a user in that group, and use the user's credentials.

  • Read this article because it's the law. Introduction to Scaling Laws: There are many different scaling laws. At one extreme, there are simple scaling laws that are easy to learn, easy to use, and very useful in everyday life...Area versus Length...Volume, Area, and Length in Three Dimensions...Non-Integer Scale Factors...Polygons, Polyhedra, etc...The Pythagorean Theorem via Scaling.

  • DeepSpeech: Scaling up end-to-end speech recognition: Key to our approach is a well-optimized RNN training system that uses multiple GPUs, as well as a set of novel data synthesis techniques that allow us to efficiently obtain a large amount of varied data for training. Our system, called DeepSpeech, outperforms previously published results on the widely studied Switchboard Hub5’00, achieving 16.5% error on the full test set. 

  • Love the passion and the attitude. Eventual Consistency in Concurrent Data Structures: What do I intend to foist upon the world? I love teasing every inch of mileage out of a data structure by matching its constraints to my constraints. I don’t see much consistent discussion on this, so I figure I may as well try my hand at it. Perhaps I can infect the world with some of my enthusiasm, and we can escape the pervasive fear of “rolling your own” data structure. Not that this is usually a good idea, but I would love to see a greater proliferation of varieties available for general use, so that we don’t perpetually use the hammer of an STL hash map, tree or queue for every screw we see.

  • Rapid Auto Scaling with Amazon SQS  using a custom CloudWatch metric and a pair of Auto Scaling Groups. 

  • Chip Overclock doesn't think the General-Purpose Processor a Myth: I would say being general purpose means that modern processors can run graphical or digital signal processing applications at all, not that they are necessarily optimal for doing so.

  • Nice reading list. The 12 most popular web performance posts of 2014

  • You may not be aware that such things are possible. Nick Sullivan with a wonderful explanation of Speeding up HTTPS with session resumption. Save costly handshake roundtrips using Session ID resumption and Session ticket resumption. Load balancing across servers is still possible with a centralized key generation system.

  • Scaling CloudFlare’s Massive WAF (Web Application Firewall). Billions of transactions per day run in less than 1ms each in Lua. CloudFlare operates one of the world’s largest deployments of nginx + LuaJIT. Systemtap and flamegraphs are used to identify hotspots and optimize them. To make matching functions faster, a custom version of the Aho-Corasick algorithm has been created. 

  • Another good reading list. Readings in conflict-free replicated data types

  • Nicely put by Chris Dixon: Two eras of the internet: pull and push. Pull is when you are seeking information, usually an answer to a question. Push is when you are using the internet in a more passive way and content comes to you. I had the idea of the move to the push paradigm as a move to a zero-dimensional world, zero-dimension is a "point-world" where everything happens in the same place, everything resolves around you. But the push-pull metaphor is much clearer.

  • The Lambda architecture: principles for architecting realtime Big Data systems: The premise behind the Lambda architecture is you should be able to run ad-hoc queries against all of your data to get results, but doing so is unreasonably expensive in terms of resource. So the idea is to precompute the results as a set of views, and you query the views.

  • Jeff Sheffield lists Things that influenced me as a developer in 2014, which includes stackoverflow podcast, JavaScript Jabber podcast, Watchmecode, Memory Management Masterclass with Addy Osmani, Scott Hanselman's “Virtual Machines, JavaScript and Assembler”, Conversation with Sal Khan, Mighty Messaging Patterns, django and postgres and transactions oh-my, Connect to postgreSQL with pgAdmin3, Making Apache Suck less for hosting Python web applications, Messaging at Scale at Instagram. What's your list?

  • Timehop switched from Ruby to Go: We tried a few languages and Go was the hands down winner for us. The main reasons we chose Go were for its incredible performance, sane concurrency constructs, and type safety.

  • BadMagicNumber lists the Interesting papers from NIPS 2014, which is a deep learning conference. It's a good sign we still need human curated lists :-)

  • Distributed resource locking using memcached: We finally decided to go with the cache service, mainly because of the timeout capabilities that would allow us to easily circumvent the deadlock issue, better performance and it was much simpler to implement than the database option.

  • The Wheel of Reincarnation: a pattern whereby specialized hardware gets spun out from the "main" system, becomes more powerful, then gets folded back into the main system.

  • The Synapse Memory Doctrine Threatened?: This is hot because it suggests that a ‘memory trace’ was not stored in the form of synapses. Rather, it suggests that the sensory neuron itself has a memory of how many synapses it ought to be forming – with the actual synapses being merely an expression of this memory. This is pretty radical: it amounts to saying that the location of memory is not in the synapses, but (probably) in the cell nuclei of presynaptic neurons.

  • The Strange Behaviors of Facebook’s Metastable Failures: The solution was simple and effective: The team switched from a most recently used (MRU) connection pool to a least recently used (LRU). Attempting to re-create the anomalous behavior with the LRU scheme was unsuccessful. This is the type of re-engineering work that developers at Internet scale will find themselves doing continually from this point forward, as the communications networks of today continue to exhibit the strange behavioral characteristics of the electronics devices of yesterday.