« Sponsored Post: Educative, PA File Sight, Etleap, PerfOps, InMemory.Net, Triplebyte, Stream, Scalyr | Main | Sponsored Post: Educative, PA File Sight, Etleap, PerfOps, InMemory.Net, Triplebyte, Stream, Scalyr »

Stuff The Internet Says On Scalability For August 23rd, 2019

Wake up! It's HighScalability time:


Absurd no more. This Far Side cartoon is now reality.


Do you like this sort of Stuff? I'd love your support on Patreon. I wrote Explain the Cloud Like I'm 10 for people who need to understand the cloud. And who doesn't these days? On Amazon it has 54 mostly 5 star reviews (125 on Goodreads). They'll learn a lot and likely add you to their will.

Number Stuff:

  • 7.11 trillion: calls to the DynamoDB API, peaking at 45.4 million requests per second, during 48 hours of Prime Day. Amazon Aurora also supports the network of Amazon fulfillment centers. On Prime Day, 1,900 database instances processed 148 billion transactions, stored 609 terabytes of data, and transferred 306 terabytes of data. The EBS team added an additional 63 petabytes of storage ahead of Prime Day; the resulting fleet handled 2.1 trillion requests per day and transferred 185 petabytes of data per day.
  • 768 million: US vacation days go wasted. Do something fun! Your creativity depends on it.
  • $1.2 billion: data labeling industry for AI by 2023, up from $500 million last year. 
  • 23: local Texas government agencies struck by ransomware. Attacks against businesses and governments are up by 365%.
  • 1.2 trillion: transistor deep learning processor. 56 times larger than the largest GPU today. 
  • 97%: of code in a modern web app comes from npm.
  • 70%: of all Java apps are bottlenecked on memory churn. 
  • 25 years: time it has taken ecommerce to reach 10% of retail sales in the U.S.
  • 8 million: universes simulated by two-thousand processors crunching data simultaneously over three weeks.
  • 5.3 million: stolen credit card numbers go on sale.
  • 50%: per year growth in bandwidth needs for hyperscalers and cloud builders.

Quotable Stuff:

  • Cardinal Richelieu: Give me six lines written by an honest man, and I will find something in it with which to hang him.
  • @bassamtabbara: Multicloud is knowing that I can change carriers without having to change phone numbers
  • Benjamin Woodruff: Instagram Server is entirely Python powered. Well, mostly. There’s also some Cython, and our dependencies include a fair amount of C++ code exposed to Python as C extensions. Our server app is a monolith, one big codebase of several million lines and a few thousand Django endpoints [1], all loaded up and served together.
  • Stephen Shankland: The fine-tuned AI chips - which have 6 billion transistors apiece - are "smart" enough to power Tesla's full self-driving abilities in the future, according to the company. Their performance has improved by a factor of 21, compared to the earlier Nvidia chips. Ganesh Venkataramanan, one of the chip designers and a former AMD processor engineer, said that in order to meet "performance levels at the power constraints and the form factor constraints we had, we had to design something of our own." The chips, optimized for self-driving cars, run at 2GHz and perform 36 trillion operations a second.
  • @trevorbrindlejs: Hiring is Broken: (research paper) Candidates are concerned about: - Relevance of questions - anxiety during interview - frustration/humiliation (affect) - lack of typical dev env (affordances) - the time required to practice - being disqualified on an unfair criteria
  • @jzawodn: True for me today: "Debugging is like being the detective in a crime movie where you are also the murderer."
  • Dan Goodin: A rash of supply chain attacks hitting open source software over the past year shows few signs of abating, following the discovery this week of two separate backdoors slipped into a dozen libraries downloaded by hundreds of thousands of server administrators.
  • kruzes: Sorry, but at this point, I need to assume Javascript/Node culture is beyond saving. Just today I was trialing Firebase Cloud Functions. The hello world of it requires a package "firebase-functions", by Google themselves. It has 73 dependencies, by 83 maintainers.There's no being careful with that, I have to assume Google was careful on my behalf, right?
  • Bechtolsheim: People always talk about this as if there is some magic to it, but there really is just substitution from a cost/performance and technology standpoint from the previous generation. And the speed of adoption is largely driven by the relative price/performance. In the dark ages between 2000 and 2010, there was a 10 Gb/sec standard in 2000, but the equipment was so expensive that very few people could justify deploying it. It took almost ten years for the cost to come down before there was some adoption. In the cloud, this pace doesn’t work at all because they will never adopt a technology unless it is cheaper on Day One.
  • @copyconstruct: How to make FaaS a fully functional. We now have millions of cores and petabytes of RAM. We need a programming model that’ll allow us to unlock the full potential of the cloud and serverless.
  • Albert Kozłowski: However, after a couple of years and despite how much things have changed in terms of technology, I believe that code ownership and feature teams had the biggest impact on how software is developed within organizations that adopted microservices. In my opinion, having smaller teams with clear ownership brings a lot of joy to the day-to-day development work and gives developers the kind of freedom that sparks creativity.
  • @PDChina: For the first time in China, #AI assistive technology was used in a trial at Shanghai No 2 Intermediate People's Court on Wed, the Legal Daily reported. When the judge, public prosecutor or defender asked the AI system, it displayed all related evidence on a courtroom screen.
  • dwl-sdca: The IT person doesn't make the funding decision. A relative of mine works for a small county agency. The IT department wanted to buy two external drives to support staggered off-site backups. The total cost was less than US$1000. The request was refused. They countered with a request for one backup drive. That too was refused.
  • David Auerbach: Two consequences of this massive increase in data processing are a drive toward ubiquity of the models used, and an increasing human opacity to these models, whether or not such opacity is intended or inevitable. If our lives are going to be encoded (in the non-programming sense) by computers, computer science should assume reductionism, ubiquity, and opacity as intrinsic properties (and problems) of the models its methods generate.
  • Ed Sperling: Putting all of this in perspective, all of the major chipmakers are tackling similar problems in their target markets. They are improving performance per watt through a combination of general-purpose processors and custom accelerators, and in many cases they are making it possible to replace modules more easily and quickly from one market to the next, and as algorithms are updated. They also are improving throughput of data on-chip, off-chip to memory, and prioritizing the movement of different kinds of data.
  • Jason Torchinsky: These pumps are expensive pieces of equipment, costing many thousands of dollars each. This isn’t some toy; the Encore line of pumps are serious machines, designed for serious business. That’s why this is all so baffling: why is that image so, you know, shitty?
  • @benedictevans: China’s ability to fire the CEO of Cathay Pacific shows how irrelevant it is to ask who technically owns Huawei. What matters is whether the state has effective control, not legal control, and we know the answer to that for any company in or even near China.
  • Zak Jason: "I can’t tell you the number of times women have filled our questionnaires with no details except ‘We want an Instagram-perfect spot for brunch."
  • Dr. Ian Cutress: One of the key critical future elements about this world of compute is moving data about. Moving data requires power, to the point where calling data from memory can consume more power than actually doing ‘compute’ work on it. This is why we have caches, but even these require extensive management built in to the CPU. For simple operations, like bit-shifts or AND operations, the goal is to move the ability to do that compute onto the main DRAM itself, so it doesn’t have to shuttle back and forth.  Citing performance examples, UPMEM has stated that they have seen speedups of 22x—25x on Genomics pattern matching, an 18x speed up in throughput for database index searching at 1/100th the latency, and an 14x TCO gain for index search applications.
  • More_front_IPC says: So it appears that with higher core clocks no longer being as readily available with process node shrinks as it was in the past there is now the impetus for AMD and Intel to have to go wider order superscalar and move more in the IBM Power 8/9/10 direction with some very wide order superscalar offerings in order to get the IPCs higher with with that clock speed increase low hanging fruit non longer there as an easy way to get more performance.
  • Lorin Hochstein: To get better at avoiding or mitigating future incidents, you need to understand the conditions that enabled past incidents to occur. Counterfactual reasoning is actively harmful for this, because it circumvents inquiry into those conditions. It replaces “what were the circumstances that led to person X taking action Y” with “person X should have done Z instead of Y”.
  • @GossiTheDog: I'm beginning to think that rather than dumping firmware for IoT ovens and toasters and hyping insignificant bugs on blogs, the security industry should be dumping and examining the firmware of security industry products - as this shit looks backdoored to hell.
  • Cade Metz: One day, who knows when, artificial intelligence could hollow out the job market. But for now, it is generating relatively low-paying jobs. The market for data labeling passed $500 million in 2018 and it will reach $1.2 billion by 2023, according to the research firm Cognilytica. This kind of work, the study showed, accounted for 80 percent of the time spent building A.I. technology.
  • TSMC: First, let's discuss the elephant in the room. Some people believe that Moore's Law is dead because they believe it is no longer possible to continue to shrink the transistor any further. Just to give you an idea of the scale of the modern transistor, the typical gate is about 20 nanometers long. A water molecule is only 2.75 Angstrom or 0.275 nanometer in diameter! You can now start counting the number of atoms in a transistor. At this scale, many factors limit the fabrication of the transistor. The primary challenge is the control of materials at the atomic level. How do you place individual atoms to create a transistor? How do you do this for billions of transistors found on a modern chip? How do you build these chips that have billions of transistors in a cost effective manner?
  • ChuckMcM: It wasn't until it started obviously failing to be true, that semiconductor companies started arguing in favor of an interpretation they could meet, rather than admit that Moore's law was dead, as pretty much any engineer actually building systems would tell you. Somewhere there must be a good play on the Monty Python Parrot sketch where Moore's law stands in for the parrot and a semiconductor marketing manager stands in for the hapless pet shop owner. It is really hard to make smaller and smaller transistors. And the laws of physics interferes. Further its really hard to get the heat out of a chip when you boost the frequency. Dennard, others, have  characterized those limits more precisely and as we hit those limits, progress along that path slows to a crawl or stops. Amdahl pretty famously characterized the limits of parallelism, we are getting closer to that one too, even for things that are trivially parallelized like graphics or neural nets.
  • Charlie Demerjian: As you can see from the chart above a 32C Epyc, probably the 7452, beats the best of Intel’s Cascade line by a substantial margin. The Intel 8280L (Note that we will use the L/M variants from here on out because the crippled -nothing parts are not comparable to AMD’s Epyc line in our eyes) is a $17,906 part where the 7452 costs $3400. And has more features like 8-channel DDR at higher speeds, PCIe4, more than 2x the PCIe lanes, etc etc. On the down side the 7452 consumes 20W more to do so. Depending on your TCO calculations, usually $1-3/W/year, this could add almost $300 to AMDs tab reducing the price differential to a mere $14,506 per socket.
  • joezydeco: When working with credit cards and chip/PIN systems the entry of the PIN needs to be secure. This usually means the scanning lines from the keypad go directly to a security-hardened subprocessor inside the pump - the same one reading the PAN from the EMV chip or magstripe. Then the PIN/PAN block is encrypted and sent off to the application processor and/or bank to complete the transaction. If PIN entry was offloaded to the application processor, that processor would need to be audited to make sure of certain requirements (PIN isn't sniffable, it isn't held in RAM after deallocation, encryption isn't breakable, etc).

Useful Stuff

  • Do you have psychic scars from then endless Extreme Programming, Agile, and Lean wars?  Would you like an approach to building and delivering software that isn't just another version of identity politics? Then you'll probably appreciate this interview with Ryan Singer, head of Product Strategy at Basecamp. Shaping, betting, and building. It all sounds so reasonable you could probably make some money from it on ASMR YouTube. And there's a free book too! Just go to basecamp.com/shapeup. If you are looking for a methodology you've probably already done worse—many times.

  • Eric Brewer on Why Envoy and Istio are the Future of Networking. Most people when they think of a service they think about the API. But that's only half of a service. Operations is the other half. When deploying a service you think about policies: DDoS, who can call it, quotas, authentication, security, etc. You're not thinking about what about the service does, you're not thinking about the API. This means you can decouple developers from operations by moving all the operational concerns to operations. Ideally the two—developers and operations—can coexist without much interaction. But this is not how it works historically. Historically developers encode in source code access control checks, quota checks, etc. The problem is this means you have to negotiate what goes inside every service which adds coordination overhead. So, put all those things in the service infrastructure. This is where Istio and Envoy come in. Istio implements service infrastructure and Envoy manages it. Operational checks are pushed into a proxy. Developers just write the meat of it. Decoupling lets both sides go faster. Also, Service Mesh Day Recap

  • Embedded interpreters have been causing problems for millions of years. The Obscure Virus Club. Retro viruses are explained as embedded enzyme interpreters that speak RNA and translate to DNA so the virus can insert its own genetic information into the cells it infects. Reverse transcriptase

  • AnandTech with wall to wall coverage of the Hot Chips 31 conference. I hope Dr. Ian Cutress gets a few days off. You might like: Tesla Solution for Full Self DrivingNVIDIA Releases GeForce 436.02 Driver: Integer Scaling Support for Turing, Freestyle Sharpening, & MoreDr. Lisa Su, CEO of AMD Live Blog.

  • Highlights from Git 2.23. IshKebab: I'm pretty blown away that they're finally admitting that the current git CLI is an unintuitive mess.

  • Lesson Learned from Queries over 1.3 Trillion Rows of Data Within Milliseconds of Response Time at Zhihu.com
    • Zhihu is the Quora of China. We currently have 220 million registered users, and 30 million questions with more than 130 million answers. With approximately 100 billion rows of data accruing each month and growing, this number will reach 3 trillion in two years. 
    • TiDB, an open source MySQL-compatible NewSQL Hybrid Transactional/Analytical Processing (HTAP) database, empowered us to get real-time insights into our data.
    • TiDB’s key features: Horizontal scalability; MySQL-compatible syntax; Distributed transactions with strong consistency; Cloud-native architecture; Minimal extract, transform, load (ETL) with HTAP; Fault tolerance and recovery with Raft; Online schema changes
    • The top layer: stateless and scalable client APIs and proxies. These components are easy to scale out.
    • The middle layer: soft-state components, and layered Redis caches as the main part. When services break down, these components can self-recover services via restoring data saved in the TiDB cluster.
    • The bottom layer: the TiDB cluster stores all the stateful data. Its components are highly available, and if a node crashes, it can self-recover its service.
    • The 99th percentile response time was about 25 ms, and the 999th percentile response time was about 50 ms.

  • A whole bunch of Key Takeaway Points and Lessons Learned from QCon New York 2019.

  • Know thyself. Why our team cancelled our move to microservices: After a month of investigation and preparation, we cancelled the move, instead deciding to stick with our monolith. For us, microservices were not only going to not help us; they were going to hurt our development process...Once everything started getting hard, and the clear path forward started to get lost, we paused, and realized we didn’t know why we were doing any of this. We didn’t have a list of our pain points, and we had no clear understanding of how this would help solve any pain points we do have. Worse, microservices might be just about to create a whole set of new problems for us...After months of investigation and work, we abandoned the project and spent the remaining time performing some minor refactors to our “monolith”.

  • There are a lot of options. Next up? We need stateful solutions. Serverless on GCP: Firebase (serverless applications, BaaS), Cloud Functions (serverless functions, FaaS), App Engine (serverless platforms, PaaS), Cloud Run (serverless containers, CaaS), Kubernetes Engine, Compute Engine. 

  • You need to wait for the push of a button and then let an LED flash exactly five times? You need to control a battery-operated night light? A short survey of sub $0.10 microcontrollers. Amazingly there are quite a few. Also, Making A Three Cent Microcontroller Useful

  • Monolist: By abstracting away retry behavior, ensuring that jobs are idempotent, and making sure that we’re always getting closer to success, our end users are fully oblivious to the errors, and can focus on staying productive, writing code, and being the best they can be at their jobs.

  • We used AWS to create a global on-demand server infrastructure.  Spawning Game Servers on AWS. It goes pretty much as you might expect. They went with AWS Elastic Compute Cloud (ECS) using Fargate. They only pay for what they use and as game play is variable you don't want to stand up a fleet of machines. The biggest issue seemed to be minimizing startup times given the Steam update takes about 40 seconds. They minimized the docker image and went with a regional architecture. 
  • Free Neo4j Data Science and Graph Algorithm courses.

  • There's a DevOps report. The 2019 Accelerate State of DevOps: Elite performance, productivity, and scaling. The shocking conclusion is DevOps is the future. And if you're elite DevOps your 24x times more likely to full exploit the cloud and low performers use more proprietary software than high and elite performers. So if you want to be leet you know what you need to do. 

  • Common Design Patterns in Distributed Architectures: Command and Query Responsibility Segregation (CQRS); Two-Phase Commit (2PC); Saga; Sidecar. Saga: Saga is an asynchronous design pattern that is meant to overcome the disadvantages of synchronous patterns, such as 2PC. This design pattern uses Event Bus to communicate with microservices. This bus is used to send and receive requests between services, with each participating service creating a local transaction and emitting an event. Other services listen for events, and the first request to intercept an event performs the required action. Sidecar enables applications to be decomposed into isolated components and includes the dependencies and packages that it requires.

  • Maybe we should just give up on the idea of reuse and make coding applications easier? It seems even the with the best intentions and highest skill levels reuse remains elusively situational. Building the New Uber Freight App as Lists of Modular, Reusable Components. Would anyone be surprised in a year to see another post explaining how the app grew more complex over time and the previous reusable component system to rule them all didn't work as well as planned and needed to be reconceptualized?

Pub Stuff:

  • Anna: A KVS For Any Scale: In contrast, we explore how a system can be architected to scale across many orders of magnitude by design. We explore this challenge in the context of a new keyvalue store system called Anna: a partitioned, multi-mastered system that achieves high performance and elasticity via waitfree execution and coordination-free consistency. Our design rests on a simple architecture of coordination-free actors that perform state update via merge of lattice-based composite data structures. We demonstrate that a wide variety of consistency models can be elegantly implemented in this architecture with unprecedented consistency, smooth fine-grained elasticity, and performance that far exceeds the state of the art.
  • Hiring is Broken: What Do Developers Say About Technical Interviews?: Technical interviews -a problem-solving form of interview in which candidates write code- are commonplace in the software industry, and are used by several well-known companies including Facebook, Google, and Microsoft.
  • UNIVERSEMACHINE: The correlation between galaxy growth and dark matter halo assembly from z = 0−10: We present a method to flexibly and self-consistently determine individual galaxies’ star formation rates (SFRs) from their host haloes’ potential well depths, assembly histories, and redshifts. The public data release (DR1) includes the massively parallel (>105 cores) implementation (the UNIVERSEMACHINE), the newly compiled and remeasured observational data, derived galaxy formation constraints, and mock catalogues including lightcones.

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>