Stuff The Internet Says On Scalability For August 19th, 2016

Hey, it's HighScalability time:


Modern art? Nope. Pancreatic cancer revealed by fluorescent labeling.

If you like this sort of Stuff then please support me on Patreon.

  • 4: SpaceX rocket landings at sea; 32TB: 3D Vertical NAND Flash; 10x: compute power for deep learning as the best of today’s GPUs; 87%: of vehicles could go electric without any range problems; 06%: visitors that post comments on NPR; 235k: terrorism related Twitter accounts closed; 40%: AMD improvement in instructions per clock for Zen; 15%: apps are slower is summer because of humidity;

  • Quotable Quotes:
    • @netik: There is no Internet of Things. There are only many unpatched, vulnerable small computers on the Internet.
    • @Pinboard: The Programmers’ Credo: we do these things not because they are easy, but because we thought they were going to be easy
    • Aphyr: This advantage is not shared by sequential consistency, or its multi-object cousin, serializability. This much, I knew–but Herlihy & Wing go on to mention, almost offhand, that strict serializability is also nonlocal!
    • @PHP_CEO: I’VE HAD AN IDEA / WE’LL TAKE ALL THE BAD CODE / BUNDLE IT TOGETHER / AND SELL IT TO VCS AS A COLLATERALIZED TECHNICAL DEBT OBLIGATION
    • felixgallo: I agree, the actor model is a significantly more usable metaphor for containers than functions. When you start thinking about supervisor trees, you start heading towards Kubernetes, which is interesting.
    • David Rosenthal: So in practice blockchains are decentralized (not), anonymous (not and not), immutable (not), secure (not), fast (not) and cheap (not). What's (not) to like?
    • @grimmelm: You know, you can’t spell “idiotic” without “IoT”
    • @jroper: 10 years ago, backends were monolithic services and frontends many pages. Now frontends are monolithic pages and backends many services.
    • @jakevoytko: Ordinary human: Hey, this is a fork. You can eat with it! People who comment on programming blogs: You can't eat soup with that.
    • iLoch: Wow $5000/mo for 2000rps, just for the application servers? That's absurd. I think we're paying around $2000/mo for our app servers, a database which is over 2TB in size, and we ingest about 10 megabytes of text data per second, on top of a couple thousand requests per second to the user facing application.
    • @josh_wills: I'm thinking about writing a book on data engineering for kids: "An Immutable, Append-Only Log of Unfortunate Events"
    • Kill Process: What the world needs is not a new social network that concentrates power in a single place, but a design to intrinsically prevent the concentration of power that results in barriers to switching.
    • ljmasternoob: the bump was just Schrödinger's cat stepping on Occam's razor.
    • carsongross: The JVM is a treasure just sitting there waiting to be rediscovered.
    • @mjpt777: When @nitsanw points out some of what he finds in the JVM I often end up crying :(
    • @karpathy: I hoped TensorFlow would standardize our code but it's low level so we've diverged on layers over it: Slim, PrettyTensor, Keras, TFLearn ...
    • @rbranson:  coordination is a scaling bottleneck in teams as much as it is in distributed systems.
    • @mathiasverraes: There are only two hard problems in distributed systems:  2. Exactly-once delivery 1. Guaranteed order of messages 2. Exactly-once delivery
    • @PhilDarnowsky: I've been using dynamically typed languages for a living for a decade. As a result, I prefer statically typed languages.
    • Allyn Malventano: 64-Layer is Samsung's 4th generation of V-NAND. We've seen 48-Layer and 32-Layer, but few know that 24-Layer was a thing (but was mainly in limited enterprise parts).
    • @cmeik: "It's a bit odd to me that programming languages today only give you the ability to write something that runs on one machine..." [1/2]
    • @trengriffin: @amcafee Use of higher radio frequencies will require a lot more antennas creating ever smaller coverage areas. More heterogeneous bandwidth
    • @jamesurquhart: Disagree IaaS multicloud tools will play major role moving forward. Game is in PaaS and app deployment (containers).

  • Linking it all together on a great episode of This Week In Tech. Google’s new OS, Fuchsia, for places where Android fears to tread, smaller, lower power IoT type devices. Intel Optane is an almost shipping non-volatile memory that is 1000X faster than SSD (maybe not), has up to 10X the capacity of DRAM, while only being a few X slower than typical DRAM, is perfect for converged IoT devices. Say goodbye to blocks and memory tiers. IoT devices don't have to be fast, so DRAM can be replaced with this new memory, hopefully making simpler cheaper devices that can last a decade on a small battery, especially when combined with low power ARM CPUsNVMe is replacing SATA and AHCI for higher bandwidth, lower latency access to non-volatile memory. 5g, when it comes out, will specifically support billions of low power IoT devices. Machine learning ties everything together. That future that is full of sensors may actually happen. As Greg Ferro said~ We are starting to see the convergence of multiple advances. You can start to plot a pathway forward to see where the disruption occurs. The irony, still, is nothing will work together. We have ubiquitous wifi more from a fluke of history than any conscious design. We see how when left up to industry the silo mindset captures all reason, and we are all the poorer for it.

  • We have water rights. Mineral rights. Surface rights. Is there such a thing as virtual property rights? Do you own the virtual property rights of your own property when someone else decides to use it in an application? Pokemon GO Hit With Class Action LawsuitWhy do people keep coming to this couple’s home looking for lost phones?

  • As data becomes more valuable that we are the product becomes assumed. Provider of Personal Finance Tools Tracks Bank Cards, Sells Data to Investors: Yodlee has another way of making money: The company sells some of the data it gathers from credit- and debit-card transactions to investors and research firms...Yodlee can tell you down to the day how much the water bill was across 25,000 citizens of San Francisco” or the daily spending at McDonald’s throughout the country...The details are so valuable that some investment firms have paid more than $2 million apiece for an annual subscription to Yodlee’s service. 

  • Take that! Humans aren't dead yet. Chinese man famed superhuman memory beat an AI in a facial recognition contest:  after drawing level with Wang in the first two rounds of the competition, Mark [the AI] finally lost in the third round, in which each competitor was required to match women with their childhood photo.

  • Making the census great again. Two university students take just 54 hours to build a Census website that WORKS - for $10 MILLION less than the ABS' disastrous site. How? They went serverless, using Lambda, of course. Scalability? Check. No cost on off years when not in use? Check. Handle bursts? Check. Cheap? Check. Easy admin? Check. Is this fair? Uncheck. Not 100%. The data had to stay in Australia, so this solution could not work. But perhaps the specs could change?

  • If you're using Rails on Heroku, here's what it looks like to scale, and this is how far you can get it to go.  Scaling Rails to 125,000 Requests per Minute on Heroku: Watch your database IOPS. We hit the limit here before we hit CPU or memory limits, which was a surprise; you’ll need PgBouncer so that you don’t max out the number of connections to Postgres; Move to single-tenancy (P-medium or P-large) dynos; Instrument the app with New Relic, and use Heroku’s Metrics feature to measure the bottleneck; Use Heroku’s support. Now should you use Heroku for these kind of loads? That's a question that always comes up and it's discussed in detail on this HackerNews post. It's not about getting the most performance per $ says ZeeMee. ZeeMee has seasonal load, so this system can sit nearly idle (cheap), most of the year, and it can be cranked it up to "expensive" a few days a year when needed. ZeeMee has only about 2.5 people working on backend/rails so the hands off aspect of this solution is a winner for them.

  • DiscusssionAMA: We are the Google Brain team. We'd love to answer your questions about machine learning. Hundreds of comments cover a lot of territory. What are the most exciting or underrated things going on in this field right now? Robotics. Random Forests and Gradient Boosting. Evolutionary algorithms. Potential for new techniques (particularly generative models) to augment human creativity. Applications to Healthcare. Applications to Art & Music. Treating neural nets as parametric representations of programs, rather than parametric function approximators. 

  • Good discussion. Ask HN: What is the future of back-end development? Lots of debate over the future of serverless as the next big thing. General concensus is yes, but there are downsides.

  • Some good Do’s and Don’ts of AWS Lambda: Don’t substitute FaaS with writing good libraries; Utilize SNS for ephemeral work; Separate pipeline stages; Enable SNS delivery status notifications; Utilize CloudWatch events to pre-warm functions; Keeping track of versioning; Utilize micro-batching for I/O bound work; Cheaper isn’t better; Utilize CloudWatch alarms.

  • Using External Services. A check-list for avoiding the most common errors when wiring up and calling into services both inside and outside of AWS: Plan your execution time;  Implement a circuit-breaker; Use Promises when possible; Return a Promise out of the API handler;  Clean up before responding to users; Set up IAM access privileges; Configure external access keys with stage variables.

  • Nice overview. NoSQL Databases: A Survey and Decision Guidance: this article gives a top-down overview of the field: Instead of contrasting the implementation specifics of individual representatives, we propose a comparative classification model that relates functional and non-functional requirements to techniques and algorithms employed in NoSQL databases.

  • Tired of only seeing average latency in SSD reviews? Don't performance pros always say you need to see a distribution and that averages are useless? Allyn Malventano has created his own Latency Distribution / Latency Percentile tests for SSDs. Here's an example of a recent very detailed review: Triple M.2 Samsung 950 Pro Z170 PCIe NVMe RAID Tested. The greatest differentiator for SSDs is the software that implements all the write/erase and cache magic. As you might expect this software can yield some horrendous tail latencies and that's what Allyn has found. More here and here

  • iRobot advocates an SDK-based direct resource access to serverless APIs rather than an HTTP based approach. A Wishlist for Serverless Deployment Tooling: We’ve made a conscious decision to accept the ops complexity for the lower cost and better performance. Also included is a nice list of features they've found important in a serverless world: A four step deployment process: local staging, cloud staging, template generation, template based deployment; Support makefiles, run as the last step before packaging the code for a Lambda function; Support microservices declaring their data sources; Deploy from a working directory; Whenever possible, use content-addressable names; Support multi-region deployment; Support querying deployments to determine their contents and discover their artifacts.

  • So cthulhu was a vision of the future. Robot Octopus Points the Way to Soft Robotics With Eight Wiggly Arms.

  • If you are in the mood for a really nerdy book then you might like Kill Process by William Hertling. It's absolutely frightening what a vigilante with elite hacker skills can accomplish with the resources of the world's largest social network. But that's not all. It's also the story of plucky startup with idealistic goals of changing the world. It's all the story of a righteous fight against a corrupt government. And it's aslo the wrenching story of woman trying to cope with crippling domestic abuse. The evolution of the main character from internally focused victim, frightened of world, lashing out in compensation, to the Queen of the World is wonderfully done. Highly recommended.

  • How do you orchestrate asynchronous micro-services? Netflix does it with Distributed delay queues based on Dynomite, which is based on redis. 

  • When Netflix talks about all their Amazon wizadry they are only talking about the control plane. The data plane, the part that actually delivers movies, which they don't do from AWS, is not covered all that much. Now you can take a look at at the Netlix CDN. Here's the paper: Open Connect Everywhere: A Glimpse at the Internet Ecosystem through the Lens of the Netflix CDN. Here's a blurb: Researchers map Netflix's content delivery network for the first time: Scientists at Queen Mary University of London (QMUL) have revealed the network infrastructure used by Netflix for its content delivery, by mimicking the film request process from all over the world and analysing the responses.

  • A very nice detailed list and explanation of Best Practices for Building a Microservice Architecture: A microservice architecture shifts around complexity. Instead of a single complex system, you have a bunch of simple services with complex interactions. Our goal is to keep the complexity in check. 

  • Great deep dive into NUMA Deep Dive Part 1: From UMA to NUMA: This series aims to provide insights of the CPU architecture, the memory subsystem and the ESXi CPU and memory scheduler. Allowing you in creating a high performing platform that lays the foundation for the higher services and increased consolidating ratios. Part 2Part 3. Part 4. And there are 3 more parts to go!

  • USB TYPE-C AND USB 3.1 EXPLAINED: Type-C refers to the physical shape of the the newest USB connector; USB Type-C is NOT the same thing as USB 3.1; USB Type-C ONLY describes the physical connector; USB 3.1 ONLY describes the actual capabilities of the port.

  • Parallel I/O and Columnar Storage~ How can we still solve the challenge of searching 100TB of web tracking data? We can't make a single disk go any faster but we can break up the data set into smaller pieces, put each piece on its own disk and read all pieces from all disks in parallel. If we distributed the data over 500k individual disks we could read our whole data set in one second.

  • I wonder what the sarcasm detecting AI would make of this next sentence? Building backdoors into your product is a great idea! Volkswagen Created A 'Backdoor' To Basically All Its Cars... And Now Hackers Can Open All Of Them: Researchers are now revealing that approximately 100 million VW vehicles can be easily opened via a simple wireless hack. The underlying issue: a static key used on basically all of the wireless locks in VWs.

  • morecoffee: I work with HTTP/2 daily, and there are some pain points when running at high speeds: - Headers are still head of line blocking. You must synchronize sending them to maintain the HPACK table state. At a high number of requests per second, this is a bottle neck - Running over TLS is CPU bottleneck since encrypted messages are sequential. We get around this by making multiple TCP connections. - Long lived HTTP/2 connections will often break because of NAT's (home internet), or changing IP address (mobile). A single dropped packet kills throughput for highspeed links too. All of these are addressed by the QUIC protocol. I suspect that eventually HTTP/2 will be last major protocol over TCP because of most of the aforementioned problems come from running over it.

  • Looks like the last mile chasm will be crossed by wireless. Might also be why Verizon wants to be in the ad business. What happens when even your dumb pipes can be routed around? Google Fiber Stalls as the Industry Gears Up for Ultrafast Wireless: Another reason appears to be a growing sense that gigabit Internet can be delivered much more cheaply if the wires are ditched...Google Fiber’s big move toward wireless came when it announced in June that it is buying Webpass, a company that uses wireless technology to provide homes with gigabit Internet access. 

  • Agreed, if you do it the right way. Pull doesn't scale - or does it?: having worked with pull-based monitoring at the largest scales, this claim runs counter to our own operational experience...Instead of executing check scripts, it only collects time series data from a set of instrumented targets over the network...you [can] monitor over 10,000 machines from a single Prometheus server. The scaling bottleneck here has never been related to pulling metrics, but usually to the speed at which the Prometheus server can ingest the data into memory and then sustainably persist and expire data on disk/SSD...Prometheus is not an event-based monitoring system. You do not send raw events to Prometheus, nor can it store them. Prometheus is in the business of collecting aggregated time series data... in our experience it's slightly more likely for a push-based approach to accidentally bring down your monitoring... If a pull-based approach scales to a global environment with many tens of datacenters and millions of machines, you can hardly say that pull doesn't scale.

  • Pete Lumbis on what's next for networking: Giving up control...we’ve always had very precise control of the network. In order to build more cost effective networks that do more, we have to give up some of that control...a security/policy engine running in the hypervisor. In some shops the network team control that, in others that’s driven by the server team. At Cumulus we are working with customers to run BGP on their servers to get rid of layer 2 and mLAG entirely.

  • Miserlou/omnihash: A tiny little tool to hash strings, files, input streams and network resources using various common hashing algorithms.

  • Should use a lot of little lambda functions? Should you have just one lambda entry point that implement lots of functions? Should try to organize around microservices or nanoservices or is there some other preferred granularity? How do you update lots of interacting functions? What about code upload size limits? No answers of course, but here's a good discussion: One big Lambda function or multiple smaller ones? 

  • A Conflict-Free Replicated JSON Datatype: In this paper we present an algorithm and formal semantics for a JSON data structure that automatically resolves concurrent modifications such that no updates are lost, and such that all replicas converge towards the same state. It supports arbitrarily nested list and map types, which can be modified by insertion, deletion and assignment.

  • IBM-Bluemix/openwhisk-slackapp: This sample shows how to build a serverless Slack app using Slack Events API to receive events, with IBM Bluemix OpenWhisk actions to process these events, and how to expose these actions with API Connect.

  • A Cloud You can Wear: Towards a Mobile and Wearable Personal Cloud: In this paper, we present the concept of a wearable cloud – a complete yet compact and lightweight cloud which can be embedded into the clothing of a user. The wearable cloud makes the design of mobile and wearable devices simple, inexpensive, and lightweight, tapping into the resources of the wearable cloud.

  • Why Philosophers Should Care About Computational Complexity: In this essay, I offer a detailed case that one would be wrong. In particular, I argue that computational complexity theory—the field that studies the resources (such as time, space, and randomness) needed to solve computational problems—leads to new perspectives on the nature of mathematical knowledge, the strong AI debate, computationalism, the problem of logical omniscience, Hume’s problem of induction, Goodman’s grue riddle, the foundations of quantum mechanics, economic rationality, closed timelike curves, and several other topics of philosophical interest.

  • NASA Systems Engineering Handbook