Stuff The Internet Says On Scalability For February 24th, 2017

Hey, it's HighScalability time:

Great example of Latency As A Pseudo-Permanent Network Partition. A slide effectively cleaved Santa Cruz from the North Bay by slowing traffic to a crawl.
If you like this sort of Stuff then please support me on Patreon.

  • 40 TFLOPS: on Lambda; 7: new habitable planets with good beer; dozens: balloons needed in Loon network; 500 TB/sec: rate at which DNA is copied in human body; 1/2: web is encrypted; 34: regions in Azure; $8k: cost of Tesla self-driving hardware; 99.95%: DMCA takedowns are bot BS; 300 nanometers: new microscope; 7%: AMP traffic to publishers; 

  • Quotable Quotes:
    • @jasonlk: Elon Musk: Self-Driving Car Revolution Will Leave 15% of World Population Without Jobs
    • Near death Archimedes: Stand away, fellow, from my diagram!
    • rumpelstilskin21: Angular and React make for popular headlines on reddit but unless you are working for a major, large web site where such things might be deemed useful by management (and no one else) then quit trying to get educated by the amateurs on reddit.
    • StorageMojo: There is a new paradigm about to hit the industry, which will eviscerate large portions of the current storage ecosystem. Like other major shifts, it is powered by a class of users who are poorly served by existing products and technologies. But if our digital civilization is to survive and prosper, it has to happen. And it will, like it or not.
    • ThatMightBePaul: Worst case scenario: you try Go, don't like it, and you head back to Node more confident that it fits you better. That's still a pretty positive outcome, imo. So, invest the time in Go, and then see which feels right :)
    • Russ: it is the job of the application to properly figure out the network’s limits and try to live within them.
    • World's Second-Best Go Player: After humanity spent thousands of years improving our tactics, computers tell us that humans are completely wrong. I would go as far as to say not a single human has touched the edge of the truth of Go.
    • @mjpt777: After fixing a few more false sharing issues we shaved another ~350ns of Aeron's RTT between machines.
    • @thomasfuchs: 1997: Let’s make a website! *fires up vi* 2007: Let’s make a website! *downloads jQuery* *fires up vi* 2017: Let’s make a website! [very long list of tech]
    • Basho: Do not follow the ancient masters, seek what they sought.
    • hellofunk: If many years ago, someone told me that a humongous company named Alphabet was thinking about deploying balloons all over the world, I'd have told you a thing or two about having a charming imagination. 
    • Russ: Sure, the Internet is broken. But anything we invent will, ultimately, be broken in some way or another. Sure the IETF is broken, and so is open source, and so is… whatever we might invent next. We don’t need a new Internet, we need a little less ego, a lot less mud slinging, and a lot more communication. 
    • @sAbakumoff: Analyzed the sentiment of 80000 Github Commit Comments, it seems that Ruby devs tend to be pretty positive, but c++ are angriest ones!
    • Michael Sawyer: The YouTubers' common enemy is YouTube
    • @jannis_r: "Good size for a microservice: if it fits into one engineers head" @adrianco #AWSTechBreakfast
    • packagecloud: setting [TZ] environment variable can save thousands (or in some cases, tens of thousands) of unnecessary system calls that can be generated by glibc over small periods of time. 
    • @istanboolean: "Hardware has stopped getting faster. Software has not stopped getting slower." @rob_pike
    • Greg Meddles: You're out of memory on some particular Amazon instance, so you bump up to the next biggest in size. That is always the naive solution. Whatever you're doing, you'll usually end up doing more of it. Eventually, you'll end up throwing good money after bad.
    • @viktorklang: Replace the use of sequential, concurrent, and parallel with dependent, coordinated, and independent? Thoughts?
    • Coast Guard Vice Adm. Marshall Lytle: Cyberwarfare is like a soccer game with all the fans on the field with you and no one is wearing uniforms
    • CockroachDB: If you’re serious about building a company around open source software, you must walk a narrow path: introduce paid features too soon, and risk curtailing adoption. Introduce paid features too late, and risk encouraging economic free riders. Stray too far in either direction, and your efforts will ultimately continue only as unpaid open source contribution
    • Veratyr: Deployment [of k8s] is just so much harder than it should be. Fundamentally (I discovered far later on in the process), Kubernetes is comprised of roughly the following services: kube-apiserver, kubelet, kube-proxy, kube-scheduler, kube-controller-manager. The other dependencies are: A CA infrastructure for certificate based authentication, etcd, a container runtime (rkt or Docker) and CNI.
    • @jbeda: I want to go on record: the amount of yaml required to do anything in k8s is a tragedy. Something we need to solve. 

  • What do you get for $5? Quite a lot. $5 Showdown: Linode vs. DigitalOcean vs. Amazon Lightsail vs. Vultr: Linode’s new plan is not only offering the consistently better performance...Linode is still a bit behind the curve when it comes to things like block storage volumes, default SSH keys and yeah, their UI.

  • Another wonderful engineering post from Riot Games. Under the hood of the League Client's Hextech UI: Any given build of the League client is expressed as a list of units called plugins... Back-end plugins that deal purely with data are written as C++ REST microservices...front-end plugins that deal with presentation are written as Javascript client applications and run inside Chromium Embedded Framework...The League client update really is a desktop deployment of an entire constellation of microservices...APIs are thoughtfully designed, any arbitrary combination of features can run cooperatively...In the League client, the common pattern is for dependencies to flow upwards...a WebSocket that allows the front-end plugins to observe back-end plugins for changes...To make implementation of complex video-based elements simpler, we created a state machine library based on Web Components...League client is patched out to players’ local drives, it doesn’t have the same immediate bandwidth constraints...we provide a number of purpose-specific audio channels - UI SFX, Notifications, Music, Voiceover, etc. - through a plugin dedicated to managing audio...We use straight-up native Custom Elements with heavy usage of Shadow DOM.

  • Does insurance cover this? The first SHA1 collision.

  • How Lambda links CPU and memory is not obvious and is as unpredictable as a herd of cats. The Occasional Chaos of AWS Lambda Runtime Performance: So, while a 128MB Lambda might often execute in the same amount of time as its 1536MB cousin, its worst case execution time will be 12 times slower, which is proportional to the memory settings...In some cases lower-memory Lambdas may even execute faster than the 1536MB Lambda...most of the time, the highest memory Lambda is *likely* to be the fastest...AWS Lambda performance scales *at least* proportionally to the memory setting...our 128MB Lambda could take anywhere from 16 seconds to 3 minutes to execute the same algorithm...never count on better performance than what is documented.

  • Using modern sysbench to compare MyRocks and InnoDB on a small server: MyRocks did much better on inserts, updates and deletes; InnoDB did much better on range scans and better on point selects; There is little overhead from compression with MyRocks. There is a lot with InnoDB.

  • You want fast queries? Try: MapD & 8 Nvidia Pascal Titan Xs. Summary of the 1.1 Billion Taxi Rides Benchmarks. Crushed RedShift, BigQuery, Vertica, ElasticSearch, Spark, Presto, PostgreSQL, Athena. 

  • Aristotle called it the Golden Mean. ryeguy: Is there some kind of word for the phenomenon in tech where we go from one extreme to the other, then end up realizing we went to far and settle in the middle? Like some kind of tech mean reversion? From this article: one extreme: single point of failure relational databases (some have master/master, but never as nice as a db built from the ground up for fault tolerance) other extreme (the "solution"): non-transactional, masterless, non-relational AP databases (eg dynamo, cassandra) the compromise: distributed, relational CP databases (spanner). 

  • Hook up a stateless and scalable web service to a cloud-based actor-model game service. Building a Scalable Online Game with Azure - Part 2.

  • It's all about latency. Evolution of Business Logic from Monoliths through Microservices, to Functions: As technology has progressed over the last decade, we’ve seen an evolution from monolithic applications to microservices and are now seeing the rise of serverless event driven functions, led by AWS Lambda. What factors have driven this evolution? Low latency messaging enabled the move from monoliths to microservices, low latency provisioning enabled the move to Lambda...The optimal size for a bundle of business logic depends upon the relative costs in both dollars and access time of CPU, network, memory and disk resources, combined with the latency goal for the service...The invocation latency for event driven functions is one of the key limitations that constrains complex applications, but over time those latencies are reducing.

  • Why use a bounded queue in the first place? EntroperZero: Say you're hosting a video streaming service with bounded queues. If your input exceeds your throughput, you start dropping frames, or you disconnect some clients who have filled the buffers. Now use unbounded queues. You don't drop any frames, but instead, all your users start experiencing lag because you aren't sending fast enough. The lag gets worse and worse until your server runs out of memory, and then your service goes down entirely.

  • What happens when NASA does "Better, faster, cheaper"? Dreams burn in the atmosphere of Mars. Nov. 10, 1999: Metric Math Mistake Muffed Mars Meteorology Mission. Code reviews are good because “In this business, you’re either just shy of it working … or the thing looks bulletproof." Those Mars Exploration Rovers worked pretty well.

  • Why are RNNs the ultimate NNs? JuergenSchmidhuber: Because they are general computers. Like your laptop. The program of an RNN is its weight matrix. Unlike feed forward NNs, RNNs can implement while loops, recursion, you name it. While FNNs are traditionally linked to concepts of statistical mechanics and traditional information theory, the programs of RNNs call for the framework of algorithmic information theory (or Kolmogorov complexity theory).

  • If you are considering microservices this Poki article on rebuild move from scratch is a good experience report. From Monolith to Microservices: microservices don't automagically make your code amazing, but it does force you to think about decoupling and the communication between the various services. syholloway has a good pro/cons list for microservice adoption. 

  • Do we really need swap on modern systems? Christian Horn says In most environments, a bit of swap makes sense. Definitely for a generalized workstation type lof load, but for a specialized node dedicated to a specific function isn't this another responsibility applications should handle? Like caching, security, networking, scheduling and so on.

  • Big data problems we face today can be traced to the social ordering practices of the 19th century: This is not the first ‘big data’ era but the second. The first was the explosion in data collection that occurred from the early 19th century – Hacking’s ‘avalanche of numbers’, precisely situated between 1820 and 1840. This was an analogue big data era, different to our current digital one but characterized by some very similar problems and concerns. 

  • How much does it cost to run a SaaS? Cushion shows all. It takes a lot of services to raise a service.

  • Apache Kafka vs Amazon Kinesis: Performance: Advantage: Kafka; Setup: Advantage: Kinesis, by a mile; Ops: Advantage: Kinesis, by a mile; Costs: Advantage: Probably Kinesis; Incident Risk: Advantage: Kinesis; TCO: probably significantly lower for Kinesis. So is the risk. 

  • Marco Arment said about his new Overcast 3 release that he has removed all closed-source libraries (Google ads and Fabric) and will no longer use closed-source libraries. Neither will he use third-party analytics services. Reason: desire to guard customer rights and privacy. Will this be a trend? Should we call for all companies to open up their libraries to scrutiny? 

  • Relearning Functional Service Design for Microservices: Developers should familiarise themselves with fault tolerant design patterns, such as circuit breakers, bulkheads, timeouts and retries...Caching, although useful, should be deployed with care, and not used simply to overcome bad system design, such as a long activation path involving many dependent services...developers should not strive for reusability, and instead aim for replaceability...high cohesion, low coupling...separation of concerns...understand organizational boundaries...understand uses cases and flows...identify functional domains...find area that change independently...do not start with data model.

  • Epoll is fundamentally broken 1/2: Using epoll() correctly is hard. Understanding extra flags EPOLLONESHOT and EPOLLEXCLUSIVE is necessary to achieve load balancing free of race conditions.

  • Copyright: the immoveable barrier that open access advocates underestimated: In conclusion, copyright is the immoveable barrier that the open access movement underestimated. In doing so, it has created a situation in which legacy publishers can expect to continue controlling scholarly communication, and profiting excessively from the public purse, even if/when the BOAI dream of universal open access is finally realised.

  • Occupy the Cloud: Distributed Computing for the 99%: We argue that stateless functions represent a viable platform for these users, eliminating cluster management overhead, fulfilling the promise of elasticity. Furthermore, using our prototype implementation, PyWren, we show that this model is general enough to implement a number of distributed computing models, such as BSP, efficiently.