Stuff The Internet Says On Scalability For February 13th, 2015

Hey, it's HighScalability time:


Stunning depiction of every space mission over the past 50 years. (Max Roser)

  • 700 billion: Apple's valuation; 1: Number of lines of code it takes to bring down UK air traffic control; 20: how old that line of code was in years; 2: the problem was of course a never been seen before double failure; 1: atom-thin silicon transistors may mean super-fast computing; a few: how many data points it takes to identify you
  • Quotable Quotes:
    • @EpicureanDeal: The Uber model is everywhere: using the internet to connect infrequent, consumers of ad hoc, spot market services with fragmented suppliers.
    • @awendt: “I’m sorry you learned about transactions and joins in college, but you’ll have to de-normalize for #microservices” – @adrianco #microxchg
    • @samnewman: @adrianco "JSON was 10x faster than XML. Protobufs 10x faster than JSON. Avro same speed as Protobufs, but half the size"
    • @RichardWarburto: Premature Optimization isn't the root of all evil: misunderstood domain models are.
    • @MichaelPisula: Says @ewolff at #microxchg: start big with your microservices, splitting is easier than joining and your architecture will be wrong anyway
    • @MJFKlewitz: "With vertical scaling the problem is you end up giving a lot of money to Larry Ellison" #greatquote @crichardson #microxchg
    • Jenny Rood: Species of ants which differ in size can coexist peacefully, but the insects will chase away similarly sized competing species.
    • Steven Levy: The nonlinear gains that Moore predicted are so mind-bending that it is no wonder that very few were able to bend their minds around it.
    • ntoshev: There seems to be a fundamental trade-off between latency and throughput, with stream processors optimizing for latency and batch processors optimizing for throughput.
    • Sam Altman: Nobody cares if you’re using an Intel Edison or a 555 to blink the LED in the prototype you show them: people care about whether you’ve made something that they want.
    • @swardley: 30-50 years from genesis to industrialisation is about the average these days
    • Alex Clemmer: 84% of a single-threaded 1KB write in Redis is spent in the kernel
    • @allspaw: Psst: while lots of folks hope for fully "autonomous" tech to solve all the world's ills, I'll just be over here getting some work done.
    • @alejandrocrosa: “The database you read from is just a cached view of the event log”
    • @viktorklang: Optimizing for latency (as in "time to serve") will also yield higher throughput. Thank you, Mr Little.
    • rakoo: using GOMAXPROCS doesn't automagically turn your program into a parallel one
    • fluidcruft: Data science manifesto: The purpose of computing is numbers.

  • Is the golden age of the cheap startup over? The Rising Costs Of Scaling A Startup. In San Francisco it is. Twice as expensive in 2014 than it was in 2009. Wages have doubled. Op-ex has doubled. And thus startup round sizes have increased. People and place costs dwarf compute infrastructure cost savings.

  • Just in case you are of the fashionable opinion Perl code must look like line noise, take a gander: Real measurement is hard. Nice, eh?

  • Magic tricks for algorithms. This may prove helpful in your new job as Algorithm Profiler...Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images:  It is possible to produce images totally unrecognizable to human eyes that DNNs believe with near certainty are familiar objects. 

  • Here's the rare Docker nocker. Many reasons why you should stop using Docker. One side: Idea good, implementation leaves something to be desired. Other side: it tastes great and is less filling. Interesting from dacjames: In general, I agree 100%. With the case of Docker, the concept of containerization is more important than the current project. Decoupling the application environment from the infrastructure environment is an immensely valuable paradigm.

  • Bounty Hunter. A job title that conjures up romantic images and dreams of the never was. You can still be one in the digital age. And make some money too. 11 Essential Bug Bounty Programs of 2015. Hundreds of thousands of dollars are available. Good hunting.

  • When algorithms rule the world you are just one weighting factor away from insignificance. Apple,Apps and Algorithmic Glitches. Divination used to be how we attempted to contol the future. Now we attempt to penetrate to the unknowable heart of opaque algorithms with something different...data.

  • How do you prevent too much of a good thing? Better Rate Limiting With Redis Sorted Sets. I've used leaky buckets before, but they preferred a distributed multi-user solution using Redis sorted sets and atomic operations to prevent race conditions. Elegant.

  • Fred Wilson on the 40% rule for SAAS companies: Your annual revenue growth rate + your operating margin should equal 40% So, if you are growing 100% year over year, you can lose money at a rate of 60% of your revenues. If you are growing 40% year over year, you should be breaking even.

  • Ready to darken your soul with secret knowledge? The Black Magic of (Java) Method Dispatch - Everything you wanted to know about Black Deviously Surreptitious Magic in low-level performance engineering. Here's what Sorcerer Martin Thompson has to say about it: Mr Shipilёv has out done himself with his latest master class exploring the anatomy of Java method dispatch. A brilliant example of the scientific method in action that is compelling and educational to read.

  • The DOM is too slow. 60 FPS on the Mobile Web: "It’s not just slow, it’s really slow. If you touch the DOM in any way during an animation you’ve already blown through your 16ms frame budget." The solution: use canvas. Mind blown.

  • Need a good microservices intro? Microservices Architecture and Containers distilled

  • Using Event Sourcing as way to globally distribute applications. Event Sourcing at Global Scale: At the system’s core is a globally replicated event log that preserves the happens-before relationship (= potential causal relationship) of events. Happens-before relationships are tracked with vector timestamps. 

  • Short video on setting up Scalable WordPress on Azure. Looks really easy to create a blog in datacenters around the world. Straightforward master-master database replication across datacenters.

  • What are the 8 Fallacies of Distributed computing? Lust, sloth...no that's not right.  Gareth Wilson has them right in his talkThe network is not safe, there are all these people in charge, we don’t know who they all are, it’s slow, it’s unreliable, and it won’t send everything we want and it’s constantly changing, but [sarcasm] at least it’s free.

  • Murat with another highly available Paper summary: Perspectives on the CAP theorem: So here is the difference between consensus and atomic storage. Consensus is supposed to dutifully remember a value that is anchored (stored by a majority number of nodes). Consensus is loyal to making that value persist as the committed decision. Atomic storage doesn't have that responsibility. The nodes don't need to commit to a decision value, so the system doesn't need to keep track of and remember whether a value is anchored. 

  • What are the effects of branch mispredictions? Jean-Philippe BEMPEL tells all in Branches: I have lost my path!: Remember that a misprediction for CPU means a flush of the pipeline to decode instructions from the new address and it causes a Stop-Of-The-World during this time. Depending of the CPU it lasts 10 to 20 cycles.

  • HipChat's new web interface was built using React.js, Flux, gulp, webpack, lodash, karma.

  • Need to compare metrics across cloud providers? Google has released PerfKit: PerfKit is a living benchmark framework, designed to evolve as cloud technology changes, always measuring the latest workloads so you can make informed decisions about what’s best for your infrastructure needs.

  • Mesos and YARN, can there be only one? Nope, there are a myriad of ways they can work together says Jim Scott: This open source software project [Myriad] is both a Mesos framework and a YARN scheduler that enables Mesos to manage YARN resource requests.

  • Not a new idea, but a good explanation. Distributed System Building Block: Flake Ids: The basic idea behind flake ids is simple: instead of incrementing a counter each time you need an ID, use some of the top bits in an id to represent time, and then some others to represent a “node id”, such that id generation across nodes is unique.

  • It's locked behind closed doors, but this might be of help: Low Latency Performance Tuning for Red Hat Enterprise Linux 7.

  • Meaning extraction is not just for philosophers anymore. It's something everyone will soon need to do. How Yelp finds meaning in queries. Reading Between the Lines: How We Make Sense of Users’ Searches. An extraction process in the form of a series of translation steps is run on a plain query, turning it into a rich query, which is fed into Lucene. Result: much better click through rates. 

  • Summarization is another way to mine meaning. Flipboard's Approach to Automatic Summarization: In extractive summarization, the objective is to identify the essential, or central, sentences in a document. One way of modeling a document is as a graph, with each sentence of the document represented with a node and the relationships between those sentences represented with weighted edges.

  • 5 Brilliant Articles About Newsfeeds & Activity Streams

  • Interesting approach. Sequence: Automated Analyzer for Reducing 100,000's of Log Messages to 10's of Patterns: While the analyzer is about reducing a large corpus of raw log messages down to a small set of unique patterns, the parser is all about matching log messages to an existing set of patters and determining whether a specific pattern has matched. Based on the pattern, it returns a sequence of tokens that basically extracts out the important pieces of information from the logs. The analysts can then take this sequence and perform other types of analysis.

  • Google maps has come a long way. Ten Years of Google Maps, From Slashdot to Ground Truth. It was all so simple then.

  • I pretty much failed at my attempt of implementing actors in golang. This looks good. spigo: Simulate Protocol Interactions in Go using nanoservice actors. Suitable for fairly large scale simulations, runs well up to 100,000 independent nanoservice actors. Two architectures are implemented. One creates a peer to peer social network (fsm and pirates). The other is based on NetflixOSS microservices in a more tree structured model.

  • This will not bore you. There's a new release of Twitter's once Apache Aurora: Aurora leverages the Apache Mesos cluster manager, which provides information about the state of the cluster. Aurora uses that knowledge to make scheduling decisions. For example, when a machine experiences failure Aurora automatically reschedules those previously-running services onto a healthy machine in order to keep them running.

  • openvswitch/ovs - Optimistic Concurrent Cuckoo Hash: A "cuckoo hash" is an open addressing hash table schema, designed such that a given element can be in one of only a small number of buckets 'd', each of which holds up to a small number 'k' elements.  Thus, the expected and worst-case lookup times are O(1) because they require comparing no more than a fixed number of elements (k * d). 

  • TuPAQ: An Efficient Planner for Large-scale Predictive Analytic Queries:  In this work, we build upon these recent efforts and propose an integrated PAQ planning architecture that combines advanced model search techniques, bandit resource allocation via runtime algorithm introspection, and physical optimization via batching.