Stuff The Internet Says On Scalability For January 22nd, 2016

Hey, it's HighScalability time:


The Imaginary Kingdom of Aurullia. A completely computer generated fractal. Stunning and unnerving.If you like this Stuff then please consider supporting me on Patreon.

  • 42,000: drones from China securing the South China Sea; 1 billion: WhatsApp active users; 2⁻¹²²: odds of a two GUIDs with 122 random bits colliding; 25,000 to 70,000: memory chip errors per billion hours per megabit; 81,500: calories in a human body; 62: people as wealthy as half of world's population; 1.66 million: App Economy jobs in the US; 521 years: half-life of DNA; 0.000012%: air passenger fatalities; $1B: Microsoft free cloud resources for nonprofits; 4000-7000+: BBC stats collected per second; $1 billion: Google's cost to taste Apple's pie;

  • Quotable Quotes:
    • @mcclure111: 1995: Every object in your home has a clock & it is blinking 12:00 / 2025: Every object in your home has a IP address & the password is Admin
    • @notch: Coming soon to npm: tirefire.js, an asynchronous framework for implementing helper classes for reinventing the wheel. Based on promises.
    • @ayetempleton: Fun fact: You are MORE likely to win a million or more dollars in the #powerball lottery than to lose an #AWS #S3 object in a given year.
    • @viktorklang: IMO biggest lie in performance work: constant factors don't matter in Big-Oh.
    • Flavien Boucher: We all came to the conclusion that Docker is adding a complexity layer compare to a virtual machine approach, and this complexity will be for the deployment, development and build.
    • @Frances_Coppola: Uber is a cab cartel. And AirBNB is wealthy - though its suppliers aren't. They are simply firms with apps.
    • Susan Sontag: The method especially appeals to people handicapped by a ruthless work ethic – Germans, Japanese and Americans. Using a camera appeases the anxiety which the work driven feel about not working when they are on vacation and supposed to be having fun. They have something to do that is like a friendly imitation of work: they can take pictures.
    • @SachaNauta: "It's never been easier to be a billionaire and never been harder to be a millionaire" @profgalloway #DLD16
    • @Techmeme: Google Play saw 100% more downloads than iOS App Store, but Apple generated 75% more revenue 
    • Ryan Shea: we’ve concluded that 8MB blocks are simply too large to be considered safe for the network at this point in time, considering the current global bandwidth levels.
    • @RichRogersHDS: "In the old world you spent 30% of your time building a great service & 70% shouting about it. In the new world, that inverts." - Jeff Bezos
    • @thetinot: When you have an SDN, yes, networking throughput does grow on trees. Why @googlecloud is faster than #AWS and #Azure 
    • @GOettingerEU: Digital tech has contributed to around 1/3 of EU GDP growth in over the past decade and I believe this number will continue to grow #wef16
    • @COLRICHARDKEMP: More women fly F16s in Israel than drive cars in Saudi Arabia. KA. 
    • @JoshZumbrun: The total collapse in shopping mall construction
    • @jeffjarvis: 44 million people saw NY Fashion Show content on Instagram last year says Instagram's Marne Levine. Attn: Conde & Hearst!  #DLD16
    • @HackerNewsOnion: Developer Accused Of Unreadable Code Refuses To Comment
    • Lloyds online banking: in a 60-second period: 12,900 people visit its website, 400 bills are paid, 1,500 customers log onto the mobile app, 350 transfers are made and 3,000+ logins
    • @bdha: 2013: DevOps 2014: Docker 2015: Containers 2016: Unikernels 2017: Threads 2018: Syscalls 2019: Inodes
    • hacknat: Two things need to happen to make unikernels attractive. A new Hypervisor needs to get made, one that is just as extensible as an OS around the isolated primitives. It should also have something extra too (like the ability to fine tune resource management better than an OS can). Secondly a user friendly mechanism like Docker needs to happen.

  • It's a winner take all world, but not everywhere. Brian Brushwood on Cordkillers with an insightful breakdown of how the new diversified market for TV content has actually become far less of a winner take all system. We have more good content than ever. Gone are the days of Mash when everyone watched the same show at the same time. Is it bad that actors are making less? No. We are seeing the destruction of the tournament, as explained in the book Freakonomics, is the idea that those at the very top make all the money, those at the bottom of the pyramid make next to nothing. And the winners only have to win by a nose to reap all the rewards, the don't even need to win on merit. This is an inefficient system. Now we are reaching an artistically efficient system. If you have a story to tell and no budget you can tell it on YouTube. This is the democratization of talent. It's inconvenient for those who used to be at the top. What we have now is more working actors producing more content than ever.  And since a lot of this content does not have to pander to advertisers to get made the content is more diverse and more interesting than ever as well.

  • The RAMCloud Storage System: RAMCloud combines low-latency, large scale, and durability. Using state of the art networking with kernel bypass, RAMCloud expects small reads to complete in less that 10µs on a cluster of 10,000 nodes. This is 50 – 1,000 times faster that storage systems commonly in use.

  • All Change Please. Adrian Colyer makes the case that we are transitioning to a new part of the technology cycle that promises great change. Networking: 40Gbps and 100Gbps ethernet. Memory: battery backed RAM; 3D XPoint, MRAM, MeRAM, etc. Storage: NVRAM and fast PCIe. Processing: GPUs; integrated on processor FPGAs; hardware transactional memory. This is the question: What happens when you combine fast RDMA networks with ample persistent memory, hardware transactions, enhanced cache management support and super-fast storage arrays? It’s a whole new set of design trade-offs that will impact the OS, file systems, data stores, stream processing, graph processing, deep learning and more. And this is before we’ve even introduced integration with on-board FPGAs, and advances in GPUs…

  • Martin Thompson on why latency is important: As we evolve into world of ubiquitous distributed computing we need better means of communicating, not just between machines, but also between threads and processes on the same machine. As our core counts soar we are effectively getting data centres in a box. Amdahl’s, Little’s, and Universal Scalability Law’s can no longer be ignored as we get more cores but no significant increases in speed per core. These are the ruling laws when it comes distributed and parallel computing.

  • Differentiating an employee from an Independent Contractor (IC) is a lot like figuring out what is a catch in the NFL, easy until you have to apply the rules to an actual instance. Here's a rare win for the IC. JORGE QUINTANILLA vs COMMISSIONER OF INTERNAL REVENUE, surprisingly pithy and well written, this is quite an interesting case that could support the Independent Contractor standing of the gig economy. A W2 doesn't always have to mean you aren't an independent contractor, if you are willing to go to court that is. This isn't likely to help Uber et al because if you tell people how to do the job and you can fire them without cause, you aren't an IC.

  • It turns out there are a lot of venture backed open-source companies. An impressive list from agibsonccc: Mesos(mesosphere); Spark(Databricks); Flink (Data Artisans); Zeppelin (NFLabs); 
    Scala (typesafe); Linux (Red hat); Hadoop (Horton, Cloudera, MapR); Elasticsearch (Elastic); PredictionIO (PredictionIO Inc); Meteor (Meteor Inc); Deeplearning4j (My company skymind); RethinkDB(RethinkDB inc); Redis (Redis Labs); Wordpress(Automattic); Drupal(Acquia); Docker (docker inc); Coreos (coreos inc); NGINX(nginx inc).

  • Never thought of it this way, but that little yellow first down line they display during NFL game broadcasts was an early form of augmented reality. The Super Bowl, Football and Computers.

  • In which we learn data is an important part of every business model. Apple To Disband iAd Sales Team: In 2014, one ad exec told Ad Age that Apple’s refusal to share data “makes it the best-looking girl at the party, forced to wear a bag over her head.” In 2015, iAd’s share of mobile display advertising revenue was just 5.1%, according to data compiled by EMarketer; meanwhile Facebook claimed 37.9% and Google 9.5%.

  • Looks like a fun course: UC Berkeley CS188 Intro to AI. It uses Pac-Man to teach an array of AI techniques. 

  • Does it make sense to say insurance companies will help drive the adoption of self driving when the technology could actually help push insurance companies out of the business? Self-driving car makers can provide insurance knowing that their cars are safe and costs will be so so low it can be rolled up into the price of the car. 

  • Etsy stock has lost 76% of its value in 9 monthspuranjay with an insightful question to ask yourself: how indispensable are these companies? People can live without Groupon. It hasn't changed habits fundamentally. Uber and Airbnb, on the other hand, have changed consumers' habits. If you can make something a habit, you're going to win (see: cigarette companies).

  • Need to spark your creativity? Turn your world upside down. “Schema violations" are said to be The Secret of Immigrant Genius. What does SQL have to do with creativity? Nothing. The idea: A schema violation occurs when our world is turned upside-down, when temporal and spatial cues are off-kilter...[Immigrants] uprooted from the familiar, they see the world at an angle, and this fresh perspective enables them to surpass the merely talented. 

  • The brain uses more bits. Research Reveals Memory Capacity of Brain is Ten Times More Than Previously Thought: Our new measurements of the brain's memory capacity increase conservative estimates by a factor of 10 to at least a petabyte, in the same ballpark as the World Wide Web...Our data suggests there are 10 times more discrete sizes of synapses than previously thought," says Bartol. In computer terms, 26 sizes of synapses correspond to about 4.7 "bits" of information. Previously, it was thought that the brain was capable of just one to two bits for short and long memory storage in the hippocampus.

  • Nest smart thermostat glitch leaves cold feet and steaming mad customers. Just to be clear, embedded software doesn't have to have this broken an update process. Once you get power the updating of a new image should be automatic in the case where there's a bad upgrade. Usually the design is to have a golden boot image in ROM that can bootstrap a new valid image from an origin. 

  • Here's a reason it's often better to buy rather than build. Why Big Companies Keep Failing: The Stack Fallacy: Stack fallacy is the mistaken belief that it is trivial to build the layer above yours...The stack fallacy is a result of human nature  — we (over) value what we know...In a surprising way, it is far easier to innovate down the stack than up the stack...It is therefore no surprise that Apple had an easier time building semiconductor chips than building Apple Maps.

  • What OOP users claim vs What actually happens. This is a cute picture, but I'd also like to see a picture of specialization via a gazillian anonymous lambdas being passed around or an infinite regress of containment hierarchies. At least with inheritance I can actually look at the code and see what's going on. 

  • Bayes's Theorem: What's the Big Deal? A warning for those who are too in love with algorithms: If you aren’t scrupulous in seeking alternative explanations for your evidence, the evidence will just confirm what you already believe.

  • Building alerts on BBC iPlayer A/V consumption: Receiving lots of data, analyzing it and taking actions on the results is not a unique problem...we settled on the following technology stack...For throughput and latency, Kinesis and Lambda...For visualisation and alerting Cloudwatch...Events are collected and pushed to Kinesis via td-agent, a packaged version of fluentd, in combination with the plugin aws-fluent-plugin-kinesis that we have installed across all Lumberjack servers...As the data arrives, Sawmill consumes them via Lambda in batches of around 1000 events each...Once the data is in Cloudwatch, we can view the graphs that describe stream counts for all of our channels...With this data, we make use of CloudWatch alarms to alert operations teams.

  • Fascinating spy-vs-spy type battle in My Experience With the Great Firewall of China. More in How the Great Firewall discovers hidden circumvention servers. Liked this comment: The OP is missing the point of the GFW. It's not really about censorship. It's mainly about providing a market for local tech startups and about keeping the lower classes from organizing and causing trouble.

  • BuzzFeed used data analysis to uncover an almost inconceivable system of match fixing in professional tennis. The Tennis Racket. They found a pattern: heavy betting against a player, followed by that player’s loss. And here's the code on github. How cool is that? Look for a new inspired by real-life crime fighting show on Amazon or Netflix. 

  • ScaleSwarm: Auto-scaling a swarm cluster: Swarm and Machine are individually very powerful projects, but when combined, they can work wonders. The project attempts to create a self-scaling cluster using Swarm and Machine. Here, to setup an AWS Docker Auto Scaling cluster, all you need is swarm, machine and Amazon API Keys.

  • Marko Topolnik defines lock-free: A method is lock-free if it is nonblocking and, additionally, guarantees that infinitely often some method call finishes in a finite number of steps.

  • Looks like WhatsApp is going to join the make-money-on-messaging-as-the-UI-to-the-world club. WhatsApp Is Finally Inviting Businesses Onto Its Massive Network This Year.

  • Numbers don’t lie—it’s time to build your own router: When the download tests were finished, there were no more questions about the Homebrew Special. Yes, it could beat the Nighthawk... and walk away yawning afterward. Aside from one small dip at the 10K file size/10 concurrent connections level (which challenges the CPU with the absolute most made-and-broken connections), it performed almost identically to the direct network connection itself.

  • Rules engines are so seductive. They work great, until they don't, and then you are stuck. But in the right circumstances they make for a much simpler and easier to understand system. Experience with Rules-Based Programming for Distributed Concurrent Fault-Tolerant Code: To demonstrate applicability outside of the RAMCloud system, the team also re-wrote the Hadoop Map-Reduce job scheduler (which uses a traditional event-based state machine approach) using rules. The original code has three state machines containing 34 states with 163 different transitions, about 2,250 lines of code in total. The rules-based re-implementation required 19 rules in 3 tasks with a total of 117 lines of code and comments.

  • Another dream dashed. Why Spiderman could never climb walls: Superhero would need sticky pads covering 40 per cent of his body to scale skyscrapers. 

  • The Facebook-Loving Farmers of Myanmar: Almost all of the farmers we spoke with were Facebook users. None had heard of Twitter. How they used Facebook was not dissimilar to how many of us in the West see and think of Twitter: as a source of news, a place where you can follow your interests. The majority, however, didn’t see the social platform as a place to be particularly social or to connect with and stay up to date on comings and goings within their villages.

  • Nice Lecture #01 - Course Information & History of Databases from Carnegie Mellon.

  • Confused as to why oil prices keep dropping? Me too. $20 is the new $40 from the Economist says there are both demand and supply side reasons. The comment section is perhaps more informative than the post. An interesting thought is that by the time the prices might be in a position to rise again technology changes could serve to depress the demand side further and keep prices low.

  • A different kind of historical preservation. The Elephant In The Digital Dark Room: We were very close to the difficult decision of having to stop manufacturing film,” said Jeff Clarke, Kodak’s chief executive, according to the Wall Street Journal. “Now with the cooperation of major studios and film makers, we’ll be able to keep it going.””

  • Cloudonaut explains in detail how to perform a rolling upgrade using AWS CloudFormation. Looks fairly straightforward. 

  • Brendan Gregg extols the virtues of Off-CPU profiling and shows some dynamite Off-CPU Time Flame Graphs of the Linux Kernel: With my eBPF stack hack, I'm able to try out off-CPU flame graphs now, at least for kernel stacks, on x86_64, and up to 20 frames deep. 

  • The physics of life: By studying the spontaneous flows of microtubules and proteins confined in small, doughnut-shaped containers, they hope to lay the groundwork for a self-pumping fluid that could move molecules around in microfluidic devices similar to those that are becoming increasingly common in experimental biology, medicine and industry. Active matter “changes our ideas of what materials can do”

  • Mark Papadakis Coroutines and Fibers. Why and When: if you care for performance(throughput, latency) and your thread(s) are executing tasks/jobs that may be long-running and/or block, and if you appreciate having a programming model that makes sense, then you need to use co-routines.

  • ScyllaDB: Cassandra compatibility at 1.8 million requests per node:  a resilient NoSQL database and is currently in beta testing. It is designed from the ground up to take advantage of multiple core systems and to provide very high performance.

  • RaftLib: Simple, easy to use stream computation library for C++.

  • Can Neural Activity Propagate by Endogenous Electrical Field?: the purpose of this paper is to provide an explanation for experimental data showing that neural signals can propagate by means other than synaptic transmission, gap junction, or diffusion. The results indicate that electric fields (ephaptic effects) are capable of mediating propagation of self-regenerating neural waves. This novel mechanism coupling cell-by-volume conduction could be involved in other types of propagating neural signals, such as slow-wave sleep, sharp hippocampal waves, theta waves, or seizures.

  • Nano Lambda: gives you the flexibility to deploy anonymous functions to create powerful micro-services. 

  • fastos/tcpdive: A TCP performance profiling tool