hot links

Stuff The Internet Says On Scalability For March 8th, 2019

High Scalability

08 Mar 2019 — 25 min read

Wake up! It's HighScalability time:

A highly simplified diagram of serverless. (@jbesw)

Do you like this sort of Stuff? I'd greatly appreciate your support on Patreon. Know anyone who needs cloud? I wrote Explain the Cloud Like I'm 10 just for them. It has 40 mostly 5 star reviews. They'll learn a lot and love you even more.

5%: France's new digital tax revolution; $15 trillion: AI contribution to global GDP by 2030; 70%: better response time using HTTP keep-alive in lambda; 115 million: Akamai found bots (per day) compromising user accounts by credential stuffing; 83%: of all internet traffic is API calls, not HTML; $1 million: first millionaire bug-bounty hacker is 19 years old; 15%: mooch their Netflix account; 5%: Microsoft's app store take; $15: Tensorflow at the edge; 30%: first quarter drop in DRAM prices; $2 billion: IBM's microkernel folly; ~1TWh: lithium-ion batteries production per year by 2030; 25%: Tesla supercharger time improvement by a software update;

Quoteable Quotes:
- Jeff Bezos: I've witnessed this incredible thing happen on the internet over the last two decades. I started Amazon in my garage 24 years ago — drove packages to the post office myself. Today we have 600,000-plus people, millions and millions of customers, a very large company. How did that happen in such a short period of time? It happened because we didn't have to do any of the heavy lifting. All of the heavy-lifting infrastructure was already in place for it. There was already a telecommunication network, which became the backbone of the internet. There was already a payment system — it was called the credit card. There was already a transportation network called the US Postal Service, and Royal Mail, and Deutsche Post, all over the world, that could deliver our packages. We didn't have to build any of that heavy infrastructure. An even more stark example is Facebook. Here's a guy who literally, in his dorm room, started a company — Mark Zuckerberg started a company in his dorm room, which is now worth half a trillion dollars — less than two decades ago.
- @paul_snively: You say "convention over configuration;" I hear "ambient information stuck in someone's head." You say "configuration over hardcoding;" I hear "information in a different language that must be parsed, can be malformed, or not exist."
- lget: A common phrase you often hear when people try to explain the impressive technological leaps made in the 20th century is "war is the father of all things", meaning of course that WWII was a major catalyst for technological advancement. I always wondered if it shouldn't be phrased as "John von Neumann is the father of all things."
- @JoeEmison: It’s possible for (serviceful) serverless to be the optimal greenfield architecture for 95%+ of all applications without it being the solution for even 5% of all problems actually given to software developers today.
- Berkeley: Serverless computing will become the default computing paradigm of the Cloud Era, largely replacing serverful computing and thereby bringing closure to the Client-Server Era.
- @Cycling_Embassy: This just in: the Dutch #bicycle sales figures are reporting over 1 million bicycles sold in the Netherlands in 2018: 40% of them e-bikes, passing regular bicycles at 34%
- Marlin1975: Threadripper is not just about core count but also PCIe lanes and Memory bandwith/channels. 8 cores have shown to be memory limited in some cases. So 12/16 would also show weakness that a threadripper system would over come.
- jiffier: In fact, I don't even use Docker. As someone said on twitter, Kubernetes is the Websphere of the millenials, which I have to admit was quite hilarious.
- Antonio García Martínez: No, data isn’t the new oil. And it never will be, because the biggest data repositories don’t want it to be.
- Cliff Click: Hire for enthusiasm not experience.
- Nicole Perlroth: The hack on Google, called Operation Aurora, was historic for an unusual reason: It was the first time a Chinese government hacking victim confronted its attacker. Inside the company, Sergey Brin, one of Google’s co-founders, made it his personal mission to make sure something like Aurora never happened again. Google, known for its motto “Don’t Be Evil,” had a new motto about its cybersecurity: “Never again.” Google poached cyberexperts from the National Security Agency and Silicon Valley. It built a threat analysis group on a par with those at the top intelligence agencies and designed a new security infrastructure. It also created a new team, called Google Project Zero, to hunt for critical security flaws in technology outside Google.
- @QuinnyPig: What the blue hell is this byzantine GCP pricing model?! $120K a year, use whatever you want, at the end of the year you either pay for overages and then walk away, or re-up your contract and the cycle repeats, UNLESS you master the wolf, in which case
- Roger McNamee: The man they chose was Dan Rose, a member of their inner circle with whom I was friendly. I spoke with Dan at least twice before the election. Each time, he listened patiently and repeated what Zuck and Sheryl had said, with one important addition: he asserted that Facebook was technically a platform, not a media company, which meant it was not responsible for the actions of third parties. He said it like that should have been enough to settle the matter.
- c61: The two key takeaways for me are: (1) "Contrary to the common belief that message passing is less error-prone, more blocking bugs in our studied Go applications are caused by wrong message passing than by wrong shared memory protection." (2) "Shared memory synchronization operations are used more often than message passing, ..." So basically, (1) people have more trouble writing correct multithreaded routines using message passing, and (2) in large production applications people tend to fall back to using shared-memory primitives rather than message-passing anyway.
- Percona: The results clearly indicate that for these workloads, THP [Transparent Huge Pages] has a negative impact on the overall database performance. Although the performance degradation is negligible, it is, however, clear that there is no performance gain as one might expect. This is very much in line with all the different databases’ recommendation which suggests disabling the THP.
- HBR: Both types of thinking [convergent and divergent] are important to finding the best final solution, but divergent thinking is particularly important for developing innovative solutions. However, divergent thinking skills are largely ignored in engineering courses, which tend to focus on a linear progression of narrow, discipline-focused technical information. This leads engineering students to become experts at working individually and applying a series of formulas and rules to structured problems with a “right” answer.
- Martin Giles: In attacking the plant [Triton], the hackers crossed a terrifying Rubicon. This was the first time the cybersecurity world had seen code deliberately designed to put lives at risk. Safety instrumented systems aren’t just found in petrochemical plants; they’re also the last line of defense in everything from transportation systems to water treatment facilities to nuclear power stations.
- @PessimistsArc: In 1872 telegrams were criticized for allowing instant publication of criticisms without context often with malicious additions. Writer said mischief caused couldn't be overstated. Remind you of anything?
- @fintanr: If you want to do #CloudNative development at any scale - start with #ContinuousIntegration . If you don't have CI in place the rest will fail. Even if you don't want to use containers, put CI in place. It remains, by far, the best tooling decision you will make for your SDLC.
- @AirMovingDevice: I will be deleting all of my tweets and will no longer be tweeting or responding to DMs. All of my tweets were entirely based on my personal analysis using publicly available data, and did not involve other individuals. It is not my intention to subvert state or Party authority.
- Richard Speed: Just over half of UK's 43 police forces responded to a Freedom of Information (FoI) request, and 13 per cent stated that none of their data and applications were "in the cloud". 71 per cent had sent anywhere between 1 and 25 per cent of workloads cloudwards, while only 4 per cent were in the 26 to 50 per cent bracket.
- @CTOAdvisor: Me: Why do you want multi-cloud? CTO: I want to put the workload on the cheapest compute platform. Me: Multi-cloud for workload mobility is really hard. What value is the savings going to bring the business? CTO: Reduced IT costs. Me: 🤨
- kureikain: I have zero on-call for Go. I had very few for Elixir. But the bug were in logic code. Same with Ruby. But it's a disaster with Node. We used TypeScript so it catch lot of type issue. However, the Node runtime is weird. We run into DNS issue(like yo have to bump the libuv thread pool, cache DNS). JSON parsing issue and block the event loop etc...max memory...
- Nick Matthews~ There's a lot of operational problems with how people build DMZs. If you rebuild your DMZ in AWS and don't change your operations...I firmly believe tons of developers came to AWS because the DMZ was so effective at getting nothing done.
- @Carnage4Life: Economics of shared scooter economics (e.g. Bird, Lime) is wild. I thought Uber & Lyft losing $1.13-$1.40 per ride was steep but surmountable by raising prices. Shared scooters last only a month and they lose $300 on each one. Closer to $3 per ride loss😱
- 4 star General Brown: We're in a hyper competition now, always.
- Undrinkable Kool-Aid: Whenever I’ve approached a new system the bottleneck has never been writing more code. The bottleneck has always been in understanding all the implicit assumptions baked into the system that for one reason or another are essential to its correctness. Pretending that the system and the runtime traces of the system are the specification is why there is always a shortage of programmers. It takes a particular kind of masochist to enjoy reverse engineering a black box from just poking at it with simple linear impulses. If these systems had more formal foundations then there would be no shortage because we could teach people the general formalism and let them figure out the runtime mappings
- eBay: users who saw the animated percentage off badge were more likely to click an item as compared to those who saw the static percentage off badge. This helps show that animations and microinteractions not only have numerous qualitative benefits, but also strong quantitative ones. This data has empowered us to spend time integrating microinteractions as part of our design system, and we continually look at how to bring these moments to other parts of the customer experience.
- Lyft: It may become increasingly difficult to maintain and improve our performance, especially during peak usage times, as we expand and the usage of our offerings increases
- @copyconstruct: My own hype curve with WebAssembly: Oct 2018: zomg WASM is the future we all need Dec 2018: zomg it really is the future Jan 2019: let me do a bit of research and see what the state of security is and if linear memory is all that ...Feb 2018: oh dear 👀 March 2019: 😧
- @seldo: At some dumb point in the dumb future Lyft and Uber will declare that they can never be profitable but that they are essential to city transportation and will demand that cities subsidize them and we will have finally closed the circle of stupid.
- Jay Phelps: Let's just dive right in. The question, the ultimate question: what is WebAssembly? Well, it's also known as Wasm, or Wasm, depending on how you want to pronounce it, where you live. I pronounce it Wasm personally. And the quick spiel is that it's an efficient and safe low-level byte code for the web. And that sounds great, but we kind of need to unpact that. Really, what do I mean by efficient? Well, I mean the goal of it is to be fast to load and fast to execute. So fast to load meaning fast to send over the internet, because the primary purpose of WebAssembly is for the browser, but we'll talk about that a little bit later. So fast to load over the internet, small binary sizes are critical, compact format is critical, fast to load meaning fast to parse. And then fast to execute, meaning fast to actually run once it's compiled. So it's going to be just in time compiled.
- David Rosenthal: Perhaps once people figure out that Moore's Law is dead, and Kryder's Law is dying they will insist on keeping their equipment longer. IT manufacturers, like US automakers in the 60s and 70s, have become hooked on planned obsolescence, so they are likely to respond by adding fins and chrome to their hardware rather than making it last longer.
- Segment: If you run a B2B business, you typically have to make a hard choice. You can track data on the individual user level, but then you may have a hard time combining it later (how do I understand what a 2,000 person organization is doing?). On the other hand, you can track the health and actions of overall ‘accounts’, but possibly miss data from individual users
- 510TBaguettes: It's not because you have 100TB free of storage on your server that you are a data hoarder, and there is no "good way to do this". Data hoarding isn't about just buying $3000 worth of hard drives just for posting them here. What's interesting is what you do with your storage. If you just have 1TB of storage but you do something freakin' cool with it, what you can share here is way more important than someone buying 30TB of storage and never post again here. Please, focus on what we love, the DATA, not the storage medium, please focus on projects, on archiving, on digital preservation.
- Joel Frohlich: Of course, it’s possible that an artificial consciousness might possess qualia vastly different than our own. In this scenario, questions about specific qualia, such as color qualia, might not click with the AI. But more abstract questions about qualia themselves should filter out zombies. For this reason, the best question of all would likely be that of the hard problem itself: Why does consciousness even exist? Why do you experience qualia while processing input from the world around you? If this question makes any sense to the AI, then we’ve likely found artificial consciousness. But if the AI clearly doesn’t understand concepts such as “consciousness” and “qualia,” then evidence for an inner mental life is lacking.

Lyft reportedly pays AWS $80 million a year. Is that a lot? Snap reportedly spends $600 million a year on GCP and Azure. Is that a lot? One esitmate for Netflix is that they spend $25 million per month on AWS. Is that a lot?
- Netflix says they actually save money in AWS. Their cloud costs per streaming view ended up being a fraction of the cost of its old datacenters. Why? The elasticity of the cloud.
- Many seem to think they could do the same job a lot cheaper on a couple of laptops in their mom's basement, which is pretty much the same as saying all the people working at these companies don't know what they're doing.
- Which is more likely, that people working at these companies don't know what they're doing or people don't know what these companies are doing? Ed Boyden calls this The illusion of reductionism. Pretending a problem is simple doesn't make it simple.
- If you want to know what Netflix is doing: Netflix: What Happens When You Press Play? If you want to know what Lyft is doing there's a short video by their CTO: Lyft Saves Infrastructure Costs, Enables Massive Growth of Ridesharing Platform Using AWS.
- Ever think Netflix's infrastructure was a bit of overkill just to watch video? Me too. But that's because I was shortsighted. The big picture became clear in 2016 when Netflix went live in 130 new countries—simultaneously! All the work Netflix has been doing up to that point made that possible. Would it be possible if they had to build out their own datacenters? Sure, but how long would it have taken and how much would it have cost? More...a lot more.
- Lyft faces a similar problem. Since 2012 Lyft has experienced exponential growth. In 2014 they launched 24 new cities in 24 hours. They rolled out autoscaling and doubled their footprint overnight. They can also scale down, the same feature Netflix found so valuable. The Saturday night peak is 8x Sunday morning. Lyft could IPO for $25 billion. Without the clould could they expanded so fast, so reliably? Unlikely. $80 million a year sounds like a smart investment.
- @QuinnyPig: I have these conversations with clients constantly. "We would save money if we built our own datacenters." Yes, until you staff up to maintain them, and ensure that the communications patterns between teams worked, and... (4/16) Does anyone honestly believe that "losing focus" for a company like Lyft at $20-25 billion is worth whatever cost savings they could realize? It simply isn't. (7/16)
- @samkottler: Lyft’s AWS spend is ~1.2% of bookings, which seems pretty normal? Not sure why everyone is so surprised by those numbers. The most interesting part of the Lyft S-1 is not their AWS spend. It’s that they are sinking a boatload of money into autonomous driving, the delta of which largely explains increased losses.
- @adrianco: The $ commitment from Lyft to AWS is part of an enterprise discount plan. Bigger the commitment, bigger the discount. They are big and growing fast and already increased their commitment once. Key thing is that the $ can be spent on any service, it's not like datacenter spend. The most important thing to understand is that your AWS spend depends on your own efficiency. Cleaning up underused stuff, increasing utilization, tuning code, optimizing instance types, spot, reservations etc. makes a big difference to next month's bill, and AWS will help.
- @stevesi: 1/ Fascinated by discussions of size of cloud hosting bills. At massive scale two things are true: • Someone always says they can run servers on-prem more cheaply • Someone is always under-estimating how much burdened servers cost to run and cost of running them more cheaply 2/ I know everyone "knows" this but it has been so long for many since running on-prem that we forget just how expensive, time, and labor consuming (and rigid) running anything of any scale on-prem (or CoLo) can be. Not to mention unreliable and hard to scale (to more customers).
- @MohapatraHemant: So @lyft is paying $8m/mo to @AWS -- almost $100m/yr! Each ride costs $.14 in AWS rent. I keep hearing they could build their own DC & save. My early days at @Google cloud, heard the same from customers: "at scale, owning is cheaper". It wasn't - they all came around. Here's why: Construction of a mid-sized Enterprise DC (just 5000sqft), at just "tier3" availability (3 9s) will cost around 40m. If you want 5 9s redundancy you will need 1-2 failovers, so 3x that. Incld racks, cooling, power, construction and land. Using a colo @Equinix will likely save 20%. But your DC costs will amortize over 10 years, correct? Yes. But there's more. Construction will take 12-24mos. For that time, company loses focus, hires non-core engg, vendors, and planners that understand bldg codes, fire safety, env rules, security, maintenance etc. Then for 10 years you have: ongoing support, maintenance & repair, costs of power, heating/cooling, and biometric security of physical assets. Power bills alone run in xxMs that's why Google DCs are so remote and near Geo/hydro/solar power sources. Moreover, you need to build for 10yrs out, not today, so you'll likely either keep building more and more, or overbuild capacity by 50-100%. Your initial estimate of 40m (x3) is now 60-80m (x3).

The wisdom is knowing what you need to do and when you need to do it. You Don’t Need All That Complex/Expensive/Distracting Infrastructure: Kubernetes clusters? Load balancers across multiple cloud regions? Zero-click, Blue-Green deployment based on real-time metrics from a continuous deployment pipeline? The good news? You don’t need any of it*. Your goal, when launching a product, is to build a product that solves a problem for your users. Not build the fanciest deployment pipelines, or multi-zone, multi-region, multi-cloud Nuclear Winter proof high availability setup. I’ve seen the idea that every minute spent on infrastructure is a minute less spent shipping features.

Freetrade and why they ended up Killing Kubernetes: “do less of everything > do more of the right things > ship faster.” This is why we killed our Kubernetes stack. Here’s what our setup roughly looked like in June: Kubernetes configuration, Docker images, Kotlin services, Custom Gradle build scripts, Hazelcast, Cloud endpoints configuration, A deployment orchestration tool. In the end, here’s what our launch stack looked like. Firebase functions. That the second implementation was: Immediately cheaper (even at 20k users its still cheaper!), Able to scale more quickly, Easier to debug, Easier to replicate for testing/disaster recovery, Has built-in logging support, Has built-in autoscaling.

Storage demand isn't as high as everyone thinks. Demand Is Far From Insatiable: Seagate was caught out by an unexpectedly deep drop in disk drive demand and saw its revenues fall 7 per cent...Western Digital is about to go into cost cutting mode to carve out $800m in savings, after reporting shrinking revenues of $4.23bn for its second fiscal 2019 quarter, down by a fifth compared to the year ago period

Last week we had an entry on how serverless might incent creating more specialized hardware (perhaps like Azul for Java). In How Might Serverless Impact Node.js Ecosystem? it's made clear how better software packing can save money by reducing load times (from 2 seconds to a hundred milliseconds) and ideally could reduce memory foot prints. Related to both these ideas is the fact that cloud providers totally control the backend. We just upload code. Cloud providers could transform our code base into anything, as long as functionality is preserved. So it would be possible to transparently develop specialized representations and hardware that are much faster to load and use less memory.

Once you've built a solid platform you can quickly build high value services in response to customer demand. Here's an AWS example of that philosophy in action.
- Nick Matthews, Principal Solutions Architect at AWS, in Heavy Networking 433: An Insider’s Guide To AWS Transit Gateways mentions HyperPlane. HyperPlane is a distributed state system. Think of lots of EC2 instances sharing state with each other. It runs NAT gateway, NLB, EFS mounts, and PrivateLink. It works by running a very large fleet of instances, using a technique called shuffle sharding. Each customer can be given a piece of multiple instances simultaneously. If any instance has an issue a very small percentage of traffic might be impacted.
- Following HyperPlane leads to @colmmacc: Ever wonder how we manage multi-tenancy at AWS? or why we want you to use the personal health dashboard instead of the AWS status dashboard? are you pining for a bonus content section with probabilistic math? These slides on Shuffle Sharding are for you!! ... Again think about, we can build a huge big multi-tenant system with lots of customers on it, and still guarantee that there is *no* full overlap between those customers. Just using math. This still blows my mind...
- The key idea being: Our Cloud gives us this agility because we all pool our resources. So a much bigger, and better than me, team can build much bigger, and better than mine, data centers, which we all share. 10 years in, pretty much everyone understands that this is awesome.
- Which leads this video Load Balancing at Hyperscale.
- Which leads to Cindy Sridharan's gloss of the video: I especially enjoyed learning about the SHOCK principle proposed (Self Healing or Constant Work), which suggests that when you build a system, it should be resilient to even large shocks. Or put differently, “if something big changes, the system should be able to carry on as normal”. The talk proposes that: 1. Constant effort and recovery from failure are the natural states 2. Always operate in repair mode. When a node fails, Hyperplane actually does less work! 3. When designing large scale systems, we don’t want them to be complex. We want them to be as simple as possible. To this end, we want as few modes of operation as possible (Hyperplane has no retry mode, for example. It piggybanks on TCP’s innate retry mechanism). 4. The talk also introduces the idea of shuffle sharding, a DDoS mitigation technique (where isolation is the primary mitigation technique) that’s now widely deployed across many AWS services.

A bold vision for future discovery. Can we do on purpose for biology what for other fields happened accidentally. Episode 43: Ed Boyden | New Tools for Neuroscience. Rather than doing moon shots where we try to address unknowns without getting down to the ground truth first maybe we should try to get down to the ground truth. What's the right way to do a moon hot? On one extreme you have extreme translational science where the goal is let's just try to get something into humans quickly. The other extreme is science that might very curiosity driven, but doesn't connect with an application. I propose a third way of doing science where we think backwards from the goal, but the goal is not just to do a moon hot without the knowledge of the underlying mechanisms. Let's accelerate the basic science of those underlying mechanisms to understand what the heck we're doing. It's kind of fusing the two old models. Build new technologies that let you accelerate the discovery of new things. Be more systematic. Make it more high throughput. More comprehensive. Then you might be able to make a moon hot without cutting corners. We might be able to break it down into an orderly series of steps. Can we think backwards from the goal and build technologies that systematically and comprehensively get to the ground truth underlying the mechanisms, and nail it. The big questions arise by considering hypothesis that alternatives to the classical ones. Why can't we through new technologies by mapping and controlling in a systematic way what's going on why can't we confront entire sets of hypothesis that we can broaden our horizons and not just focus on a single thing. The Tile Tree Method. The basic idea is can you have an idea of every possible way to solve a problem. That's how optogenetics was created. We went through all the laws of physics and try and decide which would be the best for controlling neurons. There's a short list of things you can deliver into a brain that convey energy and then from that you consider what kind of effect should the energy have. Will it be converted to electrical signals or chemical signals? If it gets converted into a electrical signal do you make the molecule that converts light into electricity or do you find it? If you think of the logical path described in the last several sentences, we're breaking down the space of all the possibilities as a tree diagram. The laws of physics, do you make or do you find, and so on. Very often that's how projects begin in our group. When you pick a discipline to approach a problem with do take a chemical approach, an electrical approach, or a computer science approach? Very often by systematically trying to break down the space of possible ideas intro an orderly structured pipelined approach that can be tested. It doesn't guarantee you are going to have a possible idea but it might reduce the risk of missing something really important or it might reduce the risk of getting stuck in a traditional way of thinking. This all points to the significance of making tools before anything else. First get the tools, then get the data, then you can make the theory. The tools that we design are designed to get to the ground truth. We want to design tools that allow us to understand a complex system in terms of building blocks and the interactions between them. Tools are now cool because they worked. After optogenetics people began making new kinds of tools, calcium imaging, new kinds of microscopes, and the question became how can you make more?

Must programmers be imprisoned in a boiler room to produce software? Or can remote async teams get the job done? Yes... using the right tools, right people, right culture, and right process. On-Site Friendly: GitLab is a 100% remote company...The (GitLab) distributed system strives for weak consistency, freedom, then adjustment, on shorts loops...GitLab (and the remote culture) has a radically different approach. There you work trying to solve a somewhat defined business problem, in short iterations, delivering something, and then adjusting.

You can expect development to stall on all these apps as teams must turn their gaze inward as they pay the great multi-year integration tax. If you want to create a competitor now is the time. Facebook to integrate WhatsApp, Instagram and Messenger.

Uber on Using Machine Learning to Ensure the Capacity Safety of Individual Microservices: to help prevent these outages, our reliability engineers abide by a concept called capacity safety, in other words, the ability to support historical peaks or a 90-day forecast of concurrent riders-on-trip from a single data center without causing significant CPU, memory, and general resource starvation. On the Maps Reliability Engineering team, we built in-house machine learning tooling that forecasts core service metrics—requests per second, latency, and CPU usage—to provide an idea of how these metrics will change throughout the course of a week or month. Using these forecasts, teams can perform accurate capacity safety tests that capture the nuances of each microservice...At a high level, each service in production at Uber is instrumented to emit metrics, some of which are resource and capacity-related. These metrics are stored in M3 and can be queried via API...The data we use, such as information on CPU and memory usage, comes from reliable sources, such as server logs, event logs, and data center reports, giving us a robust training set for forecasting...For time series forecasting, we leverage an internal forecasting API that uses an ensemble method containing ARIMA, Holt-Winters, and Theta models.

Improving performance by tracking down bottlenecks. Achieving 100k connections per second with Elixir. Even though they hold the connection open for a second during the test, when you start doing real work over a connection everything changes. You're causing interrupts, potentially saturating the network, creating lock contention, using CPU, triggering garbage collection pauses, live locking other code, etc., so just handling a lot of connections isn't the whole story. Also, The WhatsApp Architecture Facebook Bought For $19 Billion

A quick look at QUIC: QUIC is a rather forceful assertion that the Internet infrastructure is now heavily ossified and more highly constrained than ever. There is no room left for new transport protocols in today’s network. If what you want to do can’t be achieved within TCP, then all that’s left is UDP...There is a price to pay for this new-found agility, and that price is broad interoperability. Browsers that support QUIC can open up UDP connections to certain servers and run QUIC, but browsers cannot assume, as they do with TCP, that QUIC is a universal and interoperable lingua franca of the Internet. While QUIC is a fascinating adaptation, with some very novel concepts, it is still an optional adaptation. For those clients and servers who do not support QUIC, or for network paths where UDP port 443 is not supports the common fallback is TCP. The expansion of the Internet is inevitably accompanied by inertial bloat, and as we’ve seen with the extended saga of IPv6 deployment, it is a formidable expectation to think that the entire Internet will embrace a new technical innovation in a timeframe of months, years or possibly even decades! That does not mean that we can’t think new thoughts, and that we can’t realize these new ideas into new services on the Internet. We certainly can, and QUIC is an eloquent demonstration of exactly how to craft innovation into a rather stolid and resistant underlying space.

Deep Compression for Improved Inference Efficiency, Mobile Applications, and Regularization: The main power sink in modern high performance computing including machine learning at scale isn’t computation, it’s communication. Performing a multiply operation on 32 bits of data typically requires less than 4 pJ, while fetching that same amount of data from DRAM requires about 640 pJ, making off-chip memory reads about 160 times more energy-expensive than commonly used mathematical operators. That lopsided power consumption gets even worse when we consider sending data more than a few centimeters (leading data-centers to replace mid-length data communication wires with optical fibers)...Luckily there’s a biological analogue in the way human brains develop. Using synaptic pruning, the human brain becomes more efficient as people progress from childhood into adulthood...Not only does deep compression result in model sizes that are 35-50 times smaller, run faster, and require less energy (great for deployment in battery-powered applications), but the models often achieve better performance than the original, uncompressed model...Iteratively pruning and re-training the model in what’s called dense-sparse-dense training can remove up to 90% of parameters with no loss in test accuracy...By quantizing the remaining parameters, we can take advantage of efficiency and speedup from features such as Nvidia’s INT8 inference precision and further reduce the model size...Stopping after dense-sparse-dense training and parameter quantization is already enough to reduce storage requirements for the iconic AlexNet by more than 26 times without any significant loss of performance. There is one more step to achieving the full depth of benefits from deep compression, and that’s weight sharing.

facebook/folly (article): Maintaining scalability and infrastructure reliability are important considerations for any large-scale network. Ensuring consistency when propagating updates across a large network of peers is a problem that has been extensively studied, but enforcing the integrity of those updates is a separate challenge. Traditional methods for update integrity can introduce compromises with respect to efficiency and scalability. To address these concerns, we’ve employed homomorphic hashing, a cryptographic primitive that has been used in other applications, including efficient file distribution and memory integrity checking.

BrighterCommand/Brighter: a command dispatcher, processor, and task queue. It can be used to implement the Command Invoker pattern. It can be used for interoperability in a microservices architecture as well.

CraneStation/wasmtime: a standalone wasm-only runtime for WebAssembly, using the Cranelift JIT. It runs WebAssembly code outside of the Web, and can be used both as a command-line utility or as a library embedded in a larger application

uber/kraken: a P2P-powered docker registry that focuses on scalability and availability. It is designed for docker image management, replication and distribution in a hybrid cloud environment. With pluggable backend support, Kraken can easily integrate into existing docker registry setups as the distribution layer. Kraken has been in production at Uber since early 2018. In our busiest cluster, Kraken distributes more than 1 million blobs per day, including 100k 1G+ blobs. At its peak production load, Kraken distributes 20K 100MB-1G blobs in under 30 sec.

awslabs/route53-infima: a library for managing service-level fault isolation using Amazon Route 53. Infima provides a Lattice container framework that allows you to categorize each endpoint along one or more fault-isolation dimensions such as availability-zone, software implementation, underlying datastore or any other common point of dependency endpoints may share. Infima also introduces a new ShuffleShard sharding type that can exponentially increase the endpoint-level isolation between customer/object access patterns or any other identifier you choose to shard on.

Azure/mmlspark: an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark in several new directions. MMLSpark adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of datasources.

openai/neural-mmo: This environment is the first neural MMO; it attempts to create agents that scale to real world complexity. Simulating evolution on Earth is computationally infeasible, but we can construct a reasonable and efficient facsimile. We consider MMORPGs (Massive Multiplayer Online Role Playing Games) the best proxy for the real world among human games: they are complete macrocosms featuring thousands of agents per persistent world, diverse skilling systems, global economies, and ad-hoc high stakes single and team based conflict.

Distributed DNA-based Communication in Populations of Synthetic Protocells: Encapsulating DNA strand-displacement circuits further allows their use in concentrated serum where non-compartmentalized DNA circuits cannot operate. BIO-PC enables reliable execution of distributed DNA-based molecular programs in biologically relevant environments and opens new directions in DNA computing and minimal cell technology.

Facebook Perspective on Submarine Wet Plant Evolution: From TAT-12/13 and TPC-5 cable systems [1,2] to today, we have seen, first, the development of optically amplified submarine systems and the corresponding technologies (EDFA, line fiber, WDM, DWDM,…), and then, the fastpaced revolution of coherent communications during approximately the last 10 years [3,4]. We went from 5 Gbit/s per fiber pair to more than 20 Tbit/s per fiber pair. Because we are now approaching the limits of capacity per fiber pair, a paradigm shift from an optical system design point of view is happening and we are moving towards cable systems having a much higher number of fiber pairs.

Understanding Real-World Concurrency Bugs in Go: We studied six popular Go software including Docker, Kubernetes, and gRPC. We analyzed 171 concurrency bugs in total, with more than half of them caused by non-traditional, Go-specific problems. Apart from root causes of these bugs, we also studied their fixes, performed experiments to reproduce them, and evaluated them with two publicly-available Go bug detectors

Degenerate Feedback Loops in Recommender Systems: In this paper, we provide a novel theoretical analysis that examines both the role of user dynamics and the behavior of recommender systems, disentangling the echo chamber from the filter bubble effect. In addition, we offer practical solutions to slow down system degeneracy. Our study contributes toward understanding and developing solutions to commonly cited issues in the complex temporal scenario, an area that is still largely unexplored.

"If you want, I can store the encrypted password." A Password-Storage Field Study with Freelance Developers: From our research, we offer twocontributions. First of all, we reveal that, similar to the students, freelancers do not store passwords securely unless prompted, they have misconceptions about secure password storage, and they use outdated methods. Secondly, we discuss the methodological implications of using freelancers and students in developer studies

Keeping CALM: when distributed consistency is easy: When it comes to high performing scalable distributed systems, coordination is a killer. It’s the dominant term in the Universal Scalability Law. When we can avoid or reduce the need for coordination things tend to get simpler and faster. See for example Coordination avoidance in database systems, and more recently the amazing performance of Anna which gives a two-orders-of-magnitude speed-up through coordination elimination. So we should avoid coordination whenever we can.

Stuff The Internet Says On Scalability For March 8th, 2019

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale