Stuff The Internet Says On Scalability For January 15th, 2016

Hey, it's HighScalability time:


Space walk from 2001: A Space Odyssey? Nope. A base jump from the CN Tower in Toronto.If you like this Stuff then please consider supporting me on Patreon.

  • 13.5TB: open data from Yahoo for machine learning; 1+ exabytes: data stored in the cloud; 13: reasons autonomous cars should have steering wheels; 3,000: kilowatt-hours of energy generated by the solar bike path; 10TB: helium-filled hard disk; $224 Billion: 2016 gadget spending in US; 85: free ebooks; 17%: Azure price drop on some VMs; 20.5: tons of explosives detonated on Mythbusters; 20 Billion: Apple’s App Store Sales; 70%: Global Internet traffic goes through Northern Virginia; 12: photos showing the beauty of symmetry; 

  • Quotable Quote:
    • @WhatTheFFacts: Scaling Earth's 'life' to 46 years, the industrial revolution began 1 minute ago -- In that time we've destroyed half the world's forests.
    • David Brin: The apotheosis of Darth Vader was truly disgusting. Saving one demigod—a good demigod, his son—wiped away all his guilt from slaughtering billions of normal people.
    • Brian Brazil: In today’s world, having a 1:1 coupling between machines and services is becoming less common. We no longer have the webserver machine, we have one machine which hosts one part of the webserver service. 
    • @iamxavier: "Snapchat is said to have 7 billion mobile video views vs Facebook’s 8 bil.The kicker: Fb has 15x Snapchat’s users."
    • Charlie Stross: Do you want to know the real reason George R. R. Martin's next book is late? it's because keeping track of that much complexity and so many characters and situations is hard work, and he's not getting any younger. 
    • @raju: Unicorn-Size Losses: @Uber lost $671.4 million in 2014 & $987.2 million in the first half of 2015
    • @ValaAfshar: 3.8 trillion photos were taken in all of human history until mid-2011. 1 trillion photos were taken in 2015 alone
    • @ascendantlogic: 2010: Rewrite all the ruby apps with javascript 2012: Rewrite all the javascript apps with Go 2014: Rewrite all the Go apps with Rust
    • @kylebrussell: “Virtual reality was tried in the 90s!” Yeah, with screens that had 7.9% of the Oculus Rift CV1 resolution
    • @kevinmarks: #socosy2016 @BobMankoff: people don't like novelty - they like a little novelty in a cocoon of familiarity, that they could have thought of
    • @toddhoffious: The problem nature has solved is efficient variable length headers. Silicon doesn't like them for networks, or messaging protocols. DNA FTW.
    • @jaykreps: I'm loving the price war between cloud providers, cheap compute enables pretty much everything else in technology. 
    • The Confidence Game: Transition is the confidence game’s great ally, because transition breeds uncertainty. There’s nothing a con artist likes better than exploiting the sense of unease we feel when it appears that the world as we know it is about to change.
    • @somic: will 2016 be the year of customer-defined allocation strategies for aws spot fleet? (for example, through a call to aws lambda)
    • beachstartup: i run an infrastructure startup. the rule of thumb is once you hit $20-99k/month, you can cut your AWS bill in half somewhere else. sites in this phase generally only use about 20% of the features of aws.
    • @fart: the most important part of DevOps to me is “kissing the data elf”
    • @destroytoday: In comparison, @ProductHunt drove 1/4 the traffic of Hacker News, but brought in 700+ new users compared to only 20 from HN.
    • @aphyr~ Man, if people knew even a *tenth* of the f*cked up shit tech company execs have tried to pull... Folks are *awfully* polite on twitter.
    • @eric_analytics: It took Uber five years to get to a billion rides, and its Chinese rival just did it in one
    • lowpro: Being a 19 year old college student with many friends in high school, I can say snapchat is the most popular social network, followed by Instagram then Twitter, and lastly Facebook. If something is happening, people will snap and tweet about it, Instagram and Facebook are reserved for bigger events that are worth mentioning, snapchat and Twitter are for more day to day activities and therefore get used much more often.
    • Thaddeus Metz: The good, the true, and the beautiful give meaning to life when we transcend our animal nature by using our rational nature to realize states of affairs that would be appreciated from a universal perspective.
    • Reed Hastings: We realized we learned best by getting in the market and then learning, even if we’re less than perfect. Brazil is the best example. We started [there] four years ago. At first it was very slow growth, but because we were in the market talking to our members who had issues with the service, we could get those things fixed, and we learned faster.

  • Why has Bitcoin failed? From Mike Hearn: it has failed because the community has failed. What was meant to be a new, decentralised form of money that lacked “systematically important institutions” and “too big to fail” has become something even worse: a system completely controlled by just a handful of people. Worse still, the network is on the brink of technical collapse. The mechanisms that should have prevented this outcome have broken down, and as a result there’s no longer much reason to think Bitcoin can actually be better than the existing financial system.

  • Lessons learned on the path to production. From Docker CEO: 1) IaaS is too low; 2) PaaS is too high: Devs do not adopt locked down platforms; 3) End to end matters: Devs care about deployment, ops cares about app lifecycle and origin; 4) Build management, orchestration, & more in a way that enables portability; 5) Build for resilience, not zero defects; 6) If you do 5 right, agility + control

  • Is this the Tesla of database systems? No Compromises: Distributed Transactions with Consistency, Availability, and Performance: FaRMville transactions are processed by FaRM – the Fast Remote Memory system that we first looked at last year. A 90 machine FaRM cluster achieved 4.5 million TPC-C ‘new order’ transactions per second with a 99th percentile latency of 1.9ms. If you’re prepared to run at ‘only’ 4M tps, you can cut that latency in half. Oh, and it can recover from failure in about 60ms. 

  • Uber tells the story behind the design and implementation of their scalable datastore using MySQL. Uber took that path of many others in writing an entire layer on top of MySQL to create the database that best fits their use case. Uber wanted: to be able to linearly add capacity by adding more servers; write availability; a way of notifying downstream dependencies; secondary indexes; operation trust in the system, as it contains mission-critical trip data. They looked at Cassandra, Riak, and MongoDB, etc. Features alone did not decide their choice. What did?: "the decision ultimately came down to operational trust in the system we’d use."  If you are Uber this is a good reason that may not seem as important to those without accountability. Uber's design is inspired by Friendfeed, and the focus on the operational side inspired by Pinterest.

  • Does anyone else feel like we are in a webisode of the movie Groundhog Day? The Sad State of Web Development. It has always been thus and will forever be.

  • Why is software different from any other part that goes into the making a thing? It's not, but we pretend it is. Software is just a part made out of 0s and 1s instead of atoms. Before I Can Fix This Tractor, We Have to Fix Copyright Law: But let’s back up—why do people need to ask permission to fix a tractor in the first place? It’s required under the anti-circumvention section of the Digital Millennium Copyright Act—a law that regulates the space where technology and copyright law collide. 

  • Jake Edge with a look Inside the Volkswagen emissions cheating: Overall, it is an interesting "detective story" of sorts, but it also shows just how much is going on behind the scenes in our cars and other devices we rely on every day. Even in a highly regulated industry like automobiles, though, there is plenty of wiggle room for companies to try to out compete other car makers—or to outfox regulators. It is unclear how widespread this kind of cheating is in the industry, but it seems likely we will hear about more of this kind of chicanery in coming years.

  • A very good question. Why Do Laws Cater to Big Business When Startups Drive Growth?: The answer is, surprisingly enough, money. 

  • Bret Taylor with a Take on Your Competition with These Lessons from Google Maps: With rare exception, the better mousetrap doesn't win simply because it is better...1) Incumbents have entrenched distribution channels...2) Habits die way harder than you’d expect...3) Just being different isn't enough. Your customers have to care...The bottom-line is you have to build a lens to allow users to see a new world rather than features to help them see an old world better.

  • Congratulations to Sebastian Stadil. Scalr Snags $7.35M. The good guys win one. 

  • Cloud companies are devouring datacenter space. Who Leased the Most Data Center Space in 2015?

  • Spot on thread about the difference between Lambda and Google App Engine. nostrademons: Lambda is something new. It's akin to BaaS (backend-as-a-service, like Parse or Firebase), but operates fairly differently from them, and doesn't have the mobile focus. Not sure if I'd lump it under BaaS or if it's an entirely new category (EaaS, execution-as-a-service?)...he interface to the outside world is different, the granularity of abstraction is different, and the pricing model is different. All those make it suitable for different applications, which indicates it's a new market and not going after the same customers as a PaaS...I get the sense that Amazon Lambda : database triggers :: Google AppEngine : webapps. The former is utility code that glues together events in multiple storage systems; the latter is designed to serve a single application and interface with all of its data needs.

  • Sometimes you just have to do things that aren't scalable. Japan Keeps This Defunct Train Station Running for Just One Passenger.

  • Gary Mulder: I've both performance tested and chaos monkey failure tested a Raft-based cluster of three Disruptors and had excellent results with Raft failover. Very fast to failover and never managed to split-brain it. It took significant artificial packet loss (> 30%) on the 10GE point-to-point connections between the Disruptors while running a background load of 40K requests/sec to delay Raft consensus by 10's of secs, and it still eventually achieved consensus.

  • Makes a lot of sense, machine learning is being applied to CDNs. Find Anomalies: It’s Time for CDN’s to Use Machine Learning: Think about a world where advanced analytics systems predict bottlenecks and automatically route traffic to its most appropriate CDN and optimal proxy. Predictive analytics can be a great solution to the challenges that CDN providers have faced for years

  • Messaging has always been a big tech winner. How Napoleon's semaphore telegraph changed the world.

  • What's the biggest mistake you've ever made while choosing web technology for your business? necrophcodr: Picking something because it was fancy or looked pretty darn cool.

  • Why Is Tech Accelerating?: "Evolution (biological or technological) results in a better next-generation product. That product is thereby a more effective and capable method, and is used in developing the next stage of evolutionary progress. It’s a positive feedback loop. Put differently, we are using faster tools to design and build faster tools." Or to put in another way, we are building stepping stones.

  • Microsoft Neural Net Shows Deep Learning Can Get Way Deeper: It’s called “hyper parameter optimization.” “People can just spin up a cluster [of machines], run 10 models at once, find out which one works best and use that,” Gibson says. “They can input some baseline parameter—based on intuition—and the machines kind of homes in on what the best solution is.” 

  • Cushion shares all the services they use and how much they pay. Running Costs. Very good discussion on HackerNews. The number of services people use and why is astonishing. codinghorror: Heroku is very, very expensive. We switched an early Discourse setup from Heroku to Digital Ocean (using our Docker setup, which is another reason why it exists) and it saved them probably $100 per month. Everything but the largest communities fit fine on a $20/month, 2 CPU, 2GB ram DO droplet.

  • Where do I host my App? That's a very common question and now there's a book to help answer the question: Hosting for App Developers. Here's an example chapter: What Kinds of Hosting Does the Market Offer? Not very detailed, but it may help.

  • Nice look at Building a Serverless Dynamic DNS System with AWS: "In this post, we describe how to build your own dynamic DNS system with a small script and several AWS services." It uses Lambda, API Gateway, Route 53, S3. This from the AWS Startup Collection blog. Lots of good articles.

  • Sean Hall makes the case. Is Redshift outpacing Hadoop as the big data warehouse for startups? Redhsift is more agile, cheaper, faster, but has a few problems. A commentator says don't forget BigQuery from Google with "costs saving is more than 70%, also queries are much more faster."

  • Confused about TensorFlow? Murat is here to help. Paper review: TensorFlow, Large-Scale Machine Learning on Heterogeneous Distributed Systems. Good comments on HackerNews.

  • Signal failure is the largest source of delays in the subway: In all there are more than 12,000 signals and 300,000 relays that comprise the system’s interlockings. Signal failure is the largest source of delays in the subway. There is an incident on average every 11 hours. Whenever a signal fails, the ones behind it go red, since they can’t be sure there isn’t a train in that section. Often, a can or a piece of aluminum foil is enough to fool a track circuit.

  • So dang funny. You know that really popular author/book you all really like? Well it's/they're bollocks.

  • Crossbar or Crosspoint? Crossbars are quite likely to become prevalent over the next few years.  Intel is spearheading an NVMe over fabric standard, and crossbars could play a role in this effort. Crosspoint memories also seem poised to become a big part of the world of memories with the new 3D XPoint Memory that Micron and Intel recently introduced.  This could get pretty confusing!

  • configuration (mis)management or why I hate puppet, ansible, salt, etc: So that ladies and gentlemen is why I hate anything and everything that uses YAML or some other weird custom DSL to do the obvious. The current top offenders are salt, puppet, and ansible and they’re gaining more followers by the day. Everything that seems intuitive and obvious to me is either an anti-pattern or straight-up impossible to do with any of those tools. Instead of writing libraries of idempotent components they had to layer custom nonsense on top and completely change sensible programming language semantics. Somehow I’m the only one that ended up with the right set of experiences that taught me to avoid all the things these tools champion.

  • The Poor Man's Threading Architecture. Cool traces of how work is distributed to CPUs and GPUs in complicated games.  

  • paulpauper: It may come as a surprise to some, but making money is not important. That's why so many people lost their shirts shorting the 'unprofitable' Amazon.com. Rather, it's the the demonstrated ability to make money, which is more important. Should the time come to turn on the advertising money printing press, Pinterest and Snapchat, like Facebook, should have no difficulty making money. There's already a huge line of advertisers ready to plow hundreds of millions of dollars into Pinterest ads. 

  • From the Ground Up: Reasoning About Distributed Systems in the Real World: Working on a messaging platform team, I’ve had countless conversations which resemble the following exchange:
    • Developer: “We need fast messaging.”
    • Me: “Is it okay if messages get dropped occasionally?”
    • Developer: “What? Of course not! We need it to be reliable.”
    • Me: “Okay, we’ll add a delivery ack, but what happens if your application crashes before it processes the message?”
    • Developer: “We’ll ack after processing.”
    • Me: “What happens if you crash after processing but before acking?”
    • Developer: “We’ll just retry.”
    • Me: “So duplicate delivery is okay?”
    • Developer: “Well, it should really be exactly-once.”
    • Me: “But you want it to be fast?”
    • Developer: “Yep. Oh, and it should maintain message ordering.”
    • Me: “Here’s TCP.”

  • Isn't this a problem for OpenStack? Tricking Out OpenStack To Scale It Out: The first step to trick out your own OpenStack environment for cloud-scale agility and performance is to identify the third-party vendors who work with your OpenStack distribution. 

  • Analyzing Spark's MPP Scalability with the USL: What can we take away from the analysis of these benchmark results? I would suggest that Spark can be improved to scale better. It has some small amount of serialization as the job is distributed and then the results are reassembled, which is always expected in a scatter-gather MPP systems. I would also suggest that response time "stretching" may point to internal contention or tail latency, which is often a problem in MPP systems too. You can't figure out which merely by glancing at a graph, but you can get started in the right direction.

  • Netflix blows up with Dynomite with Redis on AWS - Benchmarks. Dynomite is a proxy layer that provides sharding and replication and can turn existing non-distributed datastores into a fully distributed system with multi-region replication. You know it's Netflix because they test with 3 availability zones, a million keys, on 48 largish nodes. Just one result: At the 99th Percentile with DC_QUORUM enabled, Dynomite produces less than 3ms of latency.

  • Excellent high level overview of Architectures for Massively Multi-User Environments: In order to achieve the illusion of a single, consistent universe, we are left to having to play with space and time: what runs where (load partitioning) and when (interpolation, delayed execution, time dilation, etc.).

  • Complexity fails, the microprocessor edition. We Saw Some Really Bad Intel CPU Bugs in 2015, and We Should Expect to See More in the Future: First, there was the bug found by Ben Serebrin and Jan Beulic, which allowed a guest VM to fault in a way that would cause the CPU to hang in a microcode infinite loop, allowing any VM to DoS its host.

  • Testing Chromecast is a lot cooler than you might think. Google ”IoT” Testing for Chromecast: Cloud Emulation + Physical Gear.

  • Is you didn't go to CES here's an insightful CES 2016 roundup so you can pretend you did. Some of the topics: The march towards Autonomous Intelligence; Self-driving cars; IoT; Robots; Drones; Netflix; Connected everything; Smart City; VR; and much more. Another good one is CES 2016—Observations for Product People. Major observations impacting product makers: Invisible finally making a clear showing (almost); Capable infrastructure is clearly functional (almost); Residential working now, but expectations high and software not there; Wearable computing focusing on fitness; and much more.

  • Deep Image: Scaling up Image Recognition: We present a state-of-the-art image recognition system, Deep Image, developed using end-to-end deep learning. The key components are a custom-built supercomputer dedicated to deep learning, a highly optimized parallel algorithm using new strategies for data partitioning and communication, larger deep neural network models, novel data augmentation approaches, and usage of multi-scale high-resolution images. 

  • Distributed Sagas: The saga paper outlines a technique for long-lived transactions which provide
    atomicity and durability without isolation...We are especially interested in the problem of writing sagas which interact with third-party services, where we control the Saga Execution Coordinator (SEC) and its storage, but not the downstream Transaction Execution Coordinators (TECs) themselves. 

  • On The Advantages of Tagged Architecture: This paper proposes that all data elements in a computer memory be made to be self-identifying by means of a tag. The paper shows that the advantages of the change from the traditional von Neumann machine to tagged architecture are seen in all software areas including programming systems, operating systems, debugging systems, and systems of software instrumentation. 

  • Data-centric Programming for Distributed Systems: Disorderly programming—a theme that we explore in this thesis through language design—extends the declarative programming paradigm with a minimal set of ordering constructs. Instead of overspecifying order and then exploring ways to relax it (e.g., by using threads and synchronization primitives), a disorderly programming language encourages programmers to underspecify order, to make it easy (and natural) to express safe and scalable computations. As we will show, disorderly programming also enables powerful analysis techniques that recognize when additional ordering constraints are required to produce correct results. Mechanisms ensuring these constraints can then be expressed and inserted at an appropriately coarse grain to achieve the needs of core tasks like mutable state and distributed coordination.