hot links

Stuff The Internet Says On Scalability For October 25th, 2019

High Scalability

25 Oct 2019 — 25 min read

Wake up! It's HighScalability time:

Is this the PDP-7 Ken Thompson used to create Unix? Our intrepid detective says yes.

Do you like this sort of Stuff? I'd greatly appreciate your support on Patreon. I also wrote Explain the Cloud Like I'm 10 for all who need to understand the cloud. On Amazon it has 61 mostly 5 star reviews (136 on Goodreads). Please recommend it. You'll be a cloud hero.

Number Stuff:

7 million: transactions per second on a Dynamodb table. Amazon mandated all Tier 1 services move to NoSQL for reasons of scale. A Tier 1 service is one producing revenue. Amazon has about 350 Tier 1 services.
68%: on-premises DBMS revenue is decreasing and most DBMS revenue growth is in the cloud. In the past two years. AWS and Microsoft represent 72% and 75% of total market growth. By 2022, 75% of all databases will be deployed or migrated to a cloud platform.
240x: speedup over single core CPU decoding for speech recognition using an algorithm designed to maximize parallelism on a GPU.
33%: projected annual growth in the number of devices managed on IoT connectivity management platforms.
$157 billion: projected 2023 consumer spending on smart home gadgets and services. $103 billion this year.
.17%: of web sites in the Alexa Top 1 millon run WebAssembly. 50% of those apply it for malicious deeds, such as mining and obfuscation.
150: swarm of satellites dedicated to IoT M2M communication.
40%: jump in self-published titles in 2018, 1.68 million, up from 1.19 million in 2017. Amazon is considered to be largest publisher.
1 Petabit per Second: first demonstration of a network node.
1 million: Instagram followers for Jennifer Aniston in 5 hours and 16 minutes. A world record. Quite understandably Instagram had a little scale indigestion.
10%: Americans don't use the Internet.
15 seconds: build times down from 7 minutes using Lambda in parallel.
1.3 million: ATM machines installed world wide.
$70 million: Norsk Hydro's remediation cost for recovering from a ransomware attack.
$4.3trn: for the first time, the pot of passive equity assets managed by computers exceeded that run by humans.
36: pieces of code that changed everything. Includes: Apollo 11, JPEG, first pop-up ad, like button, HTML link, PageRank, and equtable cellular network call scheduling and routing.

Quotable Stuff:

Oribtal Index: At closing velocities around 10 km/s, the kinetic energy of even an untrackable chip of paint is greater than the destructive power of an equivalent mass of stationary TNT.
Mr Robot: A few backroom deals, a promise here, a bribe there. Like that, the Deus Group was rich on oil and had a foothold in the military industrial sector. With the right players in place, running the world turned out to be surprisingly easy. I have to say, business was booming. With his network of hackers and terrorists in place, Zhang suggested the Deus Group look in a new direction. Industrial espionage was yesterday's work. What better way to gain leverage on everyone than if everyone was connected? During my service in the Congress, I took the initiative in creating the Internet. The United States was a test case. Americans seemed the most ready to give their lives over to a box.
Bjarne Stroustrup: What amazed me most was the range of applications: from rice cookers to space rockets. It is humbling to realize that just about wherever you look there is C++ involved: cars, movies, games, medicine, finance, computers, farming, etc. I find it most gratifying that C++ has been used in science: the human genome project, the fundamental physics research at CERN and elsewhere, and the Mars Rovers. It feels great to have made a contribution, however small.
Oleg Rogynskyy~ If you miss the AI boat the results are very different. If you did not collect the data early enough there’s nothing you can do to make your AI better than your competitors. Unlike automation, unlike the industrial revolution from 100 years ago. the AI arms race is a zero sum game. My prediction is that 10 years from now the Fortune 500 will look very different than now because some companies did not get into collecting data and training machine learning models early enough.
Rodrigo Ramirez: So calculating fibonacci numbers Java code is a bit slower than Go code, about 24% slower this time. Python, on the other hand is almost 100 times slower than Go, 9458% slower according to my test.
@FakeRyanGosling: If you're curious about continuous delivery and want to know everything about how cloud deployments work, oh boy should you strap in and read on, I have just the thread for you. Let's look at how Spinnaker performs a simple blue/green deployment on AWS. This is our starting state, 20 instances for an application called egg, in the TEST account, in the us-east-1 region. Let's say that the next artifact version has already been built/baked and an AMI is ready to go. We first need to look up your source server group to create the new one in the same account, region, with the same capacity, security groups, load balancers, target groups, etc.
@Carnage4Life: The big mistake of my generation of techies was assuming the Internet & openness was a force for good. All the Internet does is reduce friction. The same services that make it easier to build good communities also make it easier for toxic ones. Basically hate finds a way.
tyingq: Something of a shallow look into why it's still around. Part of it is that it's not just COBOL, but the ecosystem around it that makes it harder to port. Porting over a COBOL program from MVS, OS/400, MPE, etc, also requires porting over the surrounding stuff. Job schedulers, record (vs stream) based files, monitoring, print formats, character encodings, 3270 screen formats, and so on.
@HollyGraceful: Hackers don't break software, they simply demonstrate that it was already broken.
@kelseyhightower: I like Amazon's definition of the Serverless billing model. Pay for value: Pay for consistent throughput or execution duration rather than by server unit.
@asymco: Piper Jaffray Taking Stock With Teens survey, >9,500 respondents, 83% have an iPhone, the highest percentage ever. 86% of teens anticipating their next phone to be an iPhone, tied for the highest (flat from Spring-19).
kaycebasques: We’ve [1] been advocating strongly for using performance budgets [2] as a means of protecting hard-earned performance improvements. There’s some depressing stat around performance regressions... something like 25-50% of big sites regress in performance 6 months after a big push to optimize.
Instagram: Using cache-first rendering for both feed posts and the stories tray led to a 2.5% and 11% improvement in respective display done times and bought the user experience more in-line with what is available on the native iOS and android Instagram apps.
hinkley: Long ago I had a job where they couldn't understand why the whole site had gotten slower. Well, apparently when we told them that putting an expensive call in the header on every view, that everything was going to get slower, they didn't believe us. Nobody needs that kind of data, the ones who do don't need it to be accurate to the millisecond, and every penny you spend on information the average user is unaffected by is wasted money. There is no big picture. There is only hill climbing and getting stuck on local maxima constantly. And no I'm not bitter, why do you ask?
Goethe: The most foolish of all errors is for clever young men to believe that they forfeit their originality in recognizing a truth which has already been recognized by others.
chickenpotpie: I've been wondering why none of the serverless providers have offered the ability to get better pricing if I let them control when my timed functions run. For example, I tell them that these functions are daily/monthly/whatever tasks and I don't care when they run as long as they run at least once during that time period. They can run my functions when server usage is low. They get better compute utilization and save money (hopefully passing some of those savings onto me).
Delft: Researchers at TU Delft have developed a new supercompressible but strong material without conducting any experimental tests at all, using only artificial intelligence (AI). "AI gives you a treasure map, and the scientist needs to find the treasure," says Miguel Bessa
Benjamin Strick: Through these findings, I am able to say that an automated bot network is being used on Twitter and utlises other major social media platforms to disseminate propaganda about the Indonesian Government’s involvement in West Papua, and that it is doing so by using hashtags on genocide and the West Papua freedom movement to drown out any anti-Indonesian government narrative.
hintymad: This is hardly surprising at all. There used to be lots of complaints, both internally and publicly, about how bloated and ineffective Uber's engineering organization was. Many engineers were bitter about many overlapping projects in Uber. It's also funny that Uber engineers used many bogus reasons to implement NIH projects. A typical example is that Uber had a team to implement their own Bazel-like system because "Bazel (or any other tool) does not scale in Uber-scale". Same went for their own GPU database, their own SQS, their own key-value store, their own resource scheduler, their own datacenter, and their own deployment system. The end result was that almost everything sucked. Engineers suffered productivity loss for years.
balena: On the 13th June 2019, a rocket launched from Kiruna, Sweden, carrying 280kg of scientific experiments; among them, a Raspberry Pi Zero running Docker containers on balenaOS. To our knowledge this is the first time a Moby/Docker container engine has been flown to space!
John Allspaw: Resilience could be described as pro active activities aimed at being prepared to be unprepared. This is different what we're used to. In software we're used to preventative design. We want to write our code, architect our systems, our infrastructure, all of the stuff that goes into the code and supporting the code as it's running, to take into account scenarios that are untoward or unwanted and be able to handle them gracefully. Graceful degradation is a view. And a lot of the thrust behind tolerance and systems are in that same vein of preventative design. The difference is that resilience engineering is, and resilience manifests in scenarios that are unforeseen.
Manuel Razo-Mejia: Natural selection [seems to be] pushing the system hard enough so that it … reaches a point where the cells are performing at the limit of what physics allows.
Renata Ávila: The Internet of creation disappeared. Now we have the Internet of surveillance and control.
Wired: Now Google is promising something that sounds even better: “negative latency.” While that term sounds like it literally translates to “time travel,” what Stadia’s head of engineering, Majd Bakar, meant when said it was that emerging technology will eventually allow Stadia to reduce latency to the point where it’s basically nonexistent—making games on the service more responsive than even those on PCs and consoles.
@actualham: We don’t have Alexa, so to be funny, I yelled at my TV, “Alexa, pause the movie!” And Alexa did. And we looked around like 😮. Turns out a new remote we got for our old Firestick turned my TV into Alexa. So now I’m looking around. Anything could be Alexa. Toaster, throw rug.
Werner Vogels: At AWS, when we think about the future of hybrid, we believe that most workloads currently in data centers will be in the cloud and the on-premises infrastructure will be these billions of devices that sit on the edge. On-premises devices will be in our homes, offices, oil fields, space, planes, ships, and more. This makes the cloud more important than ever: connected devices need a secure platform to aggregate and analyze all the data.
@DrQz: [Voyager I]% uptime 11:31 up 15384 days, 21:19, 1 users, load averages: 2.62 2.23 2.05
@davidgerard: Crypto fans are allergic to doing the reading, so they come up with a simpler version of a real thing, that they think they understand. Then they shout at the ones who have done the reading, in the hope that 2+2 really does make 3 if you wish hard enough and use computers.
David Carboni: But if we shift our perspective and look, not at where they are today, but at how they got started and the way that they move, there’s a beautiful release. Now we don’t need to be dazzled by their scale, wondering how we could possibly be like that. Instead we can look at where we’re starting from, face the direction we want to go (which will be unique to each of us) and start putting one foot in front of the other.
aww_dang: Creators are still active. New niches for content are still emerging. Entrenched sites/apps are not immortal. Nothing is impossible. It all starts with individual action, one developer at a time. Don't become hypnotized by the bigness of institutions. We've seen solo developers release sea changing software before. Be the change you want to see.
Bialek: I don’t think optimization is an aesthetic or philosophical idea. It’s a very concrete idea. Optimization principles have time and again pointed to interesting things to measure.
Bruce Dawson: I’m going to close this investigation the way I usually do, with a plea for thread names. The system process has dozens of threads, many of which are special purpose, and none of which are named. The busiest system thread in this trace was the MiZeroPageThread and I had to repeatedly drill down into its stack in order to remind myself that this thread was not of interest. The VC++ compiler also doesn’t name its threads. Thread names are cheap and awesome. Use them. It’s easy.
BAIR: One way to view the problem we have outlined is that AI systems trained via self-play end up using coordination protocols that humans do not use. However, it is possible that this only happens because we are running the algorithms on a single layout at the time, and so they learn a protocol that is specialized to that layout. In contrast, human coordination protocols are likely much more general. This suggests that we could make AI protocols similar to human ones by forcing the AI protocols to be more general. In particular, if we train AI systems via self-play to play on arbitrary maps, they will have to learn more general coordination protocols, that may work well with human protocols.
Aarian Marshall: [Tesla] Share prices shot up 20 percent late Wednesday on the news of Tesla’s $143 million in quarterly net income, the first time the company has been in the black since the end of last year. It attributed the jump to cost reductions, which Chief Financial Officer Zach Kirkhorn said were now “ingrained in the culture of the team.
DSHR: For example, Seagate's Exos 7E8 drives have a sustained transfer rate of 215Mb/s. To match the read performance of a single hard drive would need about 13M pores, or about $6.2M worth of MiniONs. (ONT does have higher-throughput products, using the PromethION48 would need 88 units at about $52M). This doesn't account for Reed-Solomon overhead, or the difficulty of ensuring that each strand was read only 22 times among the 13M pores.
@hwallop: Wow. McDonald's delivery with UberEATS now accounts for just over 10% of McDonald's UK business. We really are a lazy nation...
guycoder: Fortunately the PCI sig group is not working to upgrade your personal PC. PCIe is used in many places, especially the data center, where extra bandwidth and performance is needed. Upcoming challenges of integrating heterogeneous architectures will require a very large jump in bandwidth and reduced latency and we seem to be standardizing on the PCIe physical layer to connect all these devices together. Your local desktop and graphics card / nvme SSD are probably not primary use cases for this upgrade but help drive adoption and cost reduction. Think AI/Machine Learning accelerators running in the cloud processing more and more data that ends up as some new feature on your phone or Facebook.
alasdair_: I am speaking solely for myself when I say that, While the company [Niantic] has plenty of flaws, it is very serious about privacy protections. It was drilled into every engineer from day one that we were dealing with really sensitive data. Not just any old data - kids were a target audience and so we were dealing with realtime location data from children under 13. Niantic took a lot of time and effort to ensure that this data was deleted, obfuscated or has as much precision removed as possible, as quickly as possible. I don’t like how Niantic handled some things, but their stance on privacy was never one of them.
ctdonath: How do current ground wired customers like their internet provider? On the whole, they don’t. What percentage of the world population has internet access of any kind? of land area? of anything over 1Mbps? Huge untapped markets there. I see this building up like onset of digital photography: nobody expected supplanting film & paper, yet the whole world switched seemingly overnight. Kodak went from most valuable brand to “who?” fast. Once people see Starlink working fast, reliably, and anywhere they’ll switch en masse. Personally, as a 100% telecommuter, the only thing stopping me from moving where I want is internet connectivity. One billion customers divided by 40,000 Starlink satellites is a very manageable 25,000 users each. Sometime please run the numbers on supporting 100Mbps for each, plus latency. Given that, I expect a huge migration of customers.
patrickyeon: I2C is perfectly useable in less-than-perfect environments, and LEO isn't really as harsh as people make it out to be. It's really only going to be a problem if you assume that everything operates perfectly all the time, and if you're making those assumptions you're going to have a bad time working on remote systems anyway. Make sure your drivers can handle NAKs and errors on the line. Make sure you can reset subsystems (probably by power-cycling) completely and your system can keep running. Be ready to deal with stale sensor data or having to do a few retries at times. With these and some good testing you'll be fine with I2C, and really there's immense value in having those attitudes in your system design anyway. I was a design EE at Planet Labs for 4 years, have sent something like 200 satellites to space, each with literally dozens of I2C devices on them, and supported them in orbit.
MIT Tech Review: The work of Guillet and co throws a new perspective on all this. It suggests that Grover’s algorithm is not only possible in certain materials; it seems to be a property of nature. And if that’s true, then the objections to Patel’s ideas start to crumble. It may be that life is just an example of Grover’s quantum search at work, and that this algorithm is itself a fundamental property of nature. That’s a Big Idea if ever there was one.
Jason Lyon: “We often neglect how we get rid of the things that are less important,” he added. “And oftentimes, I think that’s a more efficient way of dealing with information.” If you’re in a noisy room, you can try raising your voice to be heard — or you can try to eliminate the source of the noise. Halassa’s findings indicate that the brain casts extraneous perceptions aside earlier than expected. “What’s interesting,” said Ian Fiebelkorn, a cognitive neuroscientist at Princeton University, is that “filtering is starting at that very first step, before the information even reaches the visual cortex.”
debbiedowner: What is really interesting is that the main innovation is never mentioned in the BBC article. The academic they quote, Dave Cowley, is an author on "Using deep neural networks on airborne laser scanning data: Results from a case study of semi‐automatic mapping of archaeological topography on Arran, Scotland" from 11/2018 [0]. The "new 3D technology" that is becoming more "widely available" and allows for "rapid discovery" is not LIDAR (which is very old) but Deep Learning as applied to LIDAR. It's interesting b/c the BBC doesn't mention "Deep Learning" in the article. LIDAR is old enough that it was launched into space as early as 1971. Deep learning is most popular on images, but is becoming more popular recently on less structured data like the unordered collection of points in xD (x >= 3), so this is the new part.
Payman Behnam: Therefore, optical RAM may be considered as a promising alternative to electric RAMs where fast access times are necessary. It is worth mentioning that optical RAM is expected to be a more viable solution for optical processors; otherwise, converting information between the optic and electronic domains would be a significant challenge as it impacts the access latency considerably. Although some efforts have been made to build special-purpose optical processors, such as Fathom Computing for neural networks, there is still a long way to build a general-purpose optical processor.
David Gerard: But the SEC only cares whether an offering fits the Howey test — the test for whether an offering is a security in the US. If you issue a token at a discount to investors, then flood the retail market with it so the investors can profit, the token is a security — even if you haven’t delivered it yet.

Useful Stuff:

Porhub's Stack: Most of our sites use the following as a base: Nginx, PHP, MySQL, Memcached and/or Redis, Other technologies like Varnish, ElasticSearch, NodeJS, Go, Vertica are used where appropriate. For frontend, we run mostly vanilla Javascript, we’re slowly getting rid of jQuery and we are just beginning to play with frameworks, mostly Vue.js. We have a dedicated team working strictly on the video player, their first priority is to constantly monitor for performance and efficiency.

RepairApp: A serverless overview is an example of why it's hard to write about architecture these days. They all pretty much look alike.

When one company controls both sides of a market you have no way of telling if they are lying. Facebook to Pay $40M Under Proposed Settlement in Video Metrics Suit. That will have to change.

Help! My Azure Site Performance Sucks! — Part 1: But to be honest, when running in the cloud, things don’t always go smoothly. I’ve seen applications that can be lightning fast in one environment crawl to 14.4K speeds when deployed to Azure...One of the biggest reasons for bad Azure performance comes down to education. This is especially true of developers just getting started with the cloud and not fully understanding what the platform is. Azure isn’t one thing. It’s 100s of things (and growing!). It’s infrastructure, connectivity, functionality, scalability, and flexibility, all rolled up into a five-letter name that no one can seem to agree on how to pronounce...If you have multiple apps on the same server, isolate them to their own instances. Are your application database and storage in different data centers?...Determining if the web server (App service) is the root of your problems tends to be a pretty simple process....When it comes to bad performing Azure sites, DTU spikes are usually at the top of the discussion. Whether it’s long running queries, poorly-performing indexes, or just bad code that causes way too many calls, an overworked SQL Database can slow down any application.

Lots of details and code examples. Testing Cloudflare workers. Different than Lambda, but should be familiar to any Node programmer.

If you're around Pittsburgh Pennsylvania and have an interest in old computers you might want to visit the Large Scale Systems Museum. Here's a complete trip report. Story idea: for some reason all computers in the US have been destroyed. The only computers remaining are in this museum and your job is to reboot the US economy using just these computers. It's a guaranteed best seller.

What Breaks Our Systems: A Taxonomy of Black Swans: There are basically six categories of incidents that I want to discuss here. One is hitting limits that we didn't know were there. Spreading slowness - when our system starts to lock up because something has gone slow somewhere. Thundering herds - when we have a big spike of coordinated demand. Automation interactions is becoming a very hot topic as people are automating more. Also, there are cyberattacks and dependency loops.

Wow, James Bond has nothing on this story. It has drama. Intrigue. Deception. Technical mastery. And that hollow feeling that it will all only get worse. The Untold Story of the 2018 Olympics Cyberattack, the Most Deceptive Hack in History: The GRU hackers known as Sandworm, meanwhile, are still out there. And Olympic Destroyer suggests they've been escalating not only their wanton acts of disruption but also their deception techniques. After years of crossing one red line after another, their next move is impossible to predict. But when those hackers do strike again, they may appear in a form we don't even recognize.

Wanna move your service to cloud? Prepare for service limits and unlimited costs. MVPs and $100k AWS Bills: Reflections on the launch of Octopus Cloud 1.0:
- To bring Octopus Cloud to market quickly, we did the simplest thing possible; we took our self-hosted Octopus Server product and bundled it into an EC2 instance for each customer that signed up.
- We quickly learned that everything in AWS has some kind of service limit, and we hit all of them. Customers would sign up, we’d hit a limit, we’d ask Amazon to increase it, we’d onboard more customers, and we’d hit another limit. Every time we thought we were in the clear, we’d hit another service limit we didn’t know about. This caused a few issues as we scaled, and at one point, we had to pause new signups while we tried to provision more headroom.
- An EC2 instance for every customer adds up, and as our databases were backed by Amazon RDS, we were limited to 30 databases per RDS instance. Add storage, network, etc. and we were spending over $100 per month to keep a single Octopus Cloud instance online. Octopus Cloud customers could start a free 30-day trial, which meant that those hundreds of trial signups per month, each of which cost us $100 to host, quickly added up.
- We also didn’t have our pricing quite right. We initially launched Octopus Cloud with a $10/month starting tier, with a different pricing model from the one we currently use. Unfortunately, this was one of the most painful lessons we learned because the deficit between what we were charging and spending was magnified by the sheer number of people using Octopus Cloud; continued growth would further amplify the problem.
- In the rest of this series, we’ll go into each of the decisions we made when re-engineering Octopus Cloud for v2. These include: Switching from AWS to Azure. Porting Octopus Server to Linux. Running Octopus in containers and using Kubernetes.

A moving story showing what happens when even the littlest bit has the drive to succeed. The Life and Times of a Backblaze Hard Drive.

A Facebook Systems @Scale 2019 New York recap. You might like Uber on Comprehending incomprehensible architecture or Monarch Google’s planet-scale monitoring infrastructure.

If you want innovation and you want competition you want this. Senators propose near-total ban on worker noncompete agreements. There are several structural reasons California has become tech center and this is one of them. Predictably not everyone likes exployees to exercise free choice. 4 tech companies are paying a $325M fine for their illegal non-compete pact.

You are not alone in paying a lot for bandwidth. AWS Customers Rack Up Hefty Bills for Moving Data: data transfer charges for one customer, Apple, approached $50 million in 2017. That represented about 6.5% of Apple’s total AWS bill of $775 million for that year...Seven of the 10 companies saw increases of at least 50% in their AWS data transfer bills last year compared to the year before...For 2018, Pinterest had the highest overall AWS data transfer bill on our list at $26.4 million, up 78% from its $14.7 million bill in 2017...Capital One’s data transfer charges grew 181% to $4 million in 2018 from the prior year, while Snap’s grew 588% to around $9.2 million in 2018 from the year before. Airbnb saw its data transfer bill rise 163% to $14.1 million in 2018 from 2017, according to the records.

Amazon likes to productize their experience with customers. Disaster recovery is one such example, Automated Disaster Recovery using CloudEndure: CloudEndure is an automated IT resilience solution that lets you recover your environment from unexpected infrastructure or application outages, data corruption, ransomware, or other malicious attacks. It utilizes block-level Continuous Data Replication (CDP), which ensures that target machines are spun up in their most current state during a disaster or drill, so that you can achieve sub-second RPOs. In the event of a disaster, CloudEndure triggers a highly automated machine conversion process and a scalable orchestration engine that can spin up machines in the target AWS Region within minutes. This process enables you to achieve RTOs in minutes. The CloudEndure solution uses a software agent that installs on physical or virtual servers. It connects to a self-service, web-based use console, which then issues an API call to the selected AWS target Region to create a Staging Area in the customer’s AWS account designated to receive the source machine’s replicated data.

The State of Serverless, circa 2019:
- Serverless might sound like a technology stack, but it’s really a vision for software development. In contrast to the ever-growing complexity of servers and Kubernetes, attendees at a Serverless conference are looking for ways to do more with less — less infrastructure, less complexity, less overhead, and less waste.
- While the latest Serverlessconf retains its technology and practice focus, it was fantastic to see companies like the Gemological Institute of America, Expedia, T-mobile, Mutual of Enumclaw Insurance, and LEGO up on stage in 2019 talking about adopting and benefitting from Serverless architectures.
- The buzz was around whether declarative or imperative “Infrastructure-as-Code” is the better approach, alternatives to CloudFormation, easier ways to construct and deploy Serverless architectures...Whatever your position on recent approaches like AWS’s cdk and the utility of declarative approaches like AWS SAM, it’s clear that CloudFormation and other vendor-provided options still aren’t nailing it.

All sorts AWS Lambda case studies from companies like Thomson Reuters, iRobot, Finra, Autodesk, and Netflix. One theme seems to be companies that—not surprisingly—need to quickly process a lot of information.

Ah the good old days—that never were, at least for most of us. My experience of programming and marketing a DOS program back in the day was a costly failure. Remember print ads? Remember begging for reviews? Remember permissionful enterprise purchase pipelines? Remember renting booths? Yah, that all sucked. How the App Store Ended a Golden Era of Software. Which isn't to say the AppStore doesn't have huge room for improvement. It does. But the golden era is almost always the era you're living in, it's just hard to recognize you're in it. Also, Six Reasons Why iOS 13 and Catalina Are So Buggy.

When Mark Zuckerberg positions Facebook as part of a new fith estate he may not have thought through the positioning. The first three estate were based on who had power and it governed their mutual duties and obligations—priests, soldiers (which morphed into nobility; remember when their job was to protect?), workers. As industrialization took over from the medieval world the 4th estate was added—the press. Zuck seems to think his new estate has no duties or obligations. That's not how it works. An institution so powerful that it's part of the power structure of a culture virtually demands regulation. And unlike the the press it has no constitutional guarantees. This is what might happen China's Study the Great Nation app 'enables spying via back door. Grow up.

It's always about locks. The process used to find the problem is as usual far more interesting than the eventual cause. 63 Cores Blocked by Seven Instructions: how often do you have one thread spinning for several seconds in a seven-instruction loop while holding a lock that stops sixty-three other processors from running. That’s just awesome, in a horrible sort of way...a 64-logical-processor machine laid low by a seven-instruction loop running in the system process while holding a vital NTFS lock, fixed by disabling System Restore...The original reporter disabled system restore and suddenly their builds started running 2x to 3x as fast!

Spec'ing flash is like using a uint32 for a counter. You sit around a table and guess how many writes you'll have over the life of a product. And of course you really have no idea. For a counter you should just use a uint64. For flash it's not so easy. There are all sorts of limits, so you usually get stuck with what you get stuck with. Flash Memory Wear Killing Older Tesla's Due to Excessive Data Logging.

The tendency of systems to centralize and grow into great big balls of mud continually amazes. Hacking 20 high-profile dev accounts could compromise half of the npm ecosystem.

A recap of Facebook's Video @Scale 2019. You might like Facebook's Video quality keynote, or YouTube on Adopting video at scale, or Live streaming at Twitch. More videos will be added later.

The median latency for [Google] trace data collection pipeline is 15 seconds. Dapper — Google’s Secret Weapon: Now for tracing, Dapper doesn't only use Annotation based monitoring schemes which assumes there is no additional information other than the recorded message identifiers and time stamp events during every request and RPC execution. Instead, they use trees, span, and traces as well...Tree’s nodes are considered as span, The edges indicate the relationship of each span with its parent span. Dapper records a human-readable Span name, Id and even Parent Id...A span can also contain information from multiple hosts. The image below shows a detailed view of a single span...The entire process of Dapper collection and logging contains three different stages. First, Span data is written in local log files. Second, It is then pulled from all production hosts and collection infrastructure and finally written to a cell in Dapper Bigtable...A trace is laid out as a single Bigtable row with each column corresponding to a span.

The ecosystem wars intensify they do. Clearly Google needs to vertically integrate. Build Google homes, cars, and everything else, just so they won't have to integrate with anyone else. Builders Ditch Nest After Google Ties Devices to Digital Assistant: “What they’re doing is creating a lot of mistrust around Google and that’s then causing people to de-select Google and Nest as technology platforms,” Emigh said. “That’s happening in droves.”

In the right hands data can tell a story. Controlling the narrative going forward will certainly require controlling data. Ryan Smith uses Backblaze’s SMART data to illustrate the power of data: Backblaze has stated that they can achieve up to 1Gbps per pod but as you can see they are only reaching an average throughput of 521Mbps...Overall, Backblaze’s datacenters are handling over 100GB/s of throughput across all their pods which is quite an impressive figure...What interested me when looking at Backblaze’s SMART data was the fact that drives were being retired more than they were failing. This means the cost of failures is fairly insignificant in the scheme of things. It is actually efficiencies driven by technology improvements such as drive and enclosure densities that drove most of the costs. However, the benefits must outweigh the costs.

Very nice comparison. AWS Lambda vs. Azure Functions: 10 Major Differences:
- Microsoft takes a different approach. They separated the notion of the Azure Functions programming model from the serverless operational model. With Azure Functions, I can deploy my functions to a pay-per-use, fully-managed Consumption plan. However, I can also use other hosting options to run the same code
- Azure Functions Consumption plan is one-size-fits-all. It comes with 1.5GB of memory and one low-profile virtual core...Azure Functions Premium plan comes with multiple instance sizes, up to 14GB of memory, and four vCPUs. However, you have to pay a fixed per-hour fee for the reserved capacity
- Azure Functions has runtimes for JavaScript, Java, Python, C#, F#, and PowerShell (preview). Azure lacks Go and Ruby
- Azure Functions has a more sophisticated model based on triggers and bindings. A trigger is an event that the function listens to. The function may have any number of input and output bindings to pull and/or push extra data at the time of processing
- Azure Functions allocates multiple concurrent executions to the same virtual node. If one execution is idle waiting for a response from the network, other executions may use resources which would otherwise be wasted. However, resource-hungry executions may fight for the pool of shared resources, harming the overall performance and processing time
- If Azure Function’s executions share the instance, the memory cost isn’t charged multiple times...Azure Functions comes with HTTP endpoint integration out of the box, and there is no additional cost for this integration
- Azure Functions [performance and scalability] has improved significantly in the last year or two, but Microsoft is still playing catch-up
- Both AWS and Azure have dedicated services for workflow orchestration: AWS Step Functions and Azure Logic AppsIn addition, Azure Durable Functions is a library that brings workflow orchestration abstractions to code. It comes with several patterns to combine multiple serverless functions into stateful long-running flows.
- Also, Stateless Serverless with Durable Functions.

Soft Stuff:

dapr/dapr: a portable, event-driven runtime that makes it easy for developers to build resilient, stateless and stateful microservices that run on the cloud and edge and embraces the diversity of languages and developer frameworks. Dapr codifies the best practices for building microservice applications into open, independent, building blocks that enable you to build portable applications with the language and framework of your choice. Each building block is independent and you can use one, some, or all of them in your application.
lorin/resilience-engineering: resilience engineering papers.
netflix.github.io/mantis: a platform to build an ecosystem of realtime stream processing applications. Similar to micro-services deployed in a cloud, Mantis applications (jobs) are deployed on the Mantis platform. The Mantis platform provides the APIs to manage the life cycle of jobs (like deploy, update, and terminate), manages the underlying resources by containerizing a common pool of servers, and, similar to a traditional micro-service cloud, allows jobs to discover and communicate with other jobs.
airbnb/MvRx (article): is the Android framework from Airbnb that we use for nearly all product development at Airbnb.
facebookresearch/Neural-Code-Search-Evaluation-Dataset: Neural-Code-Search-Evaluation-Dataset presents an evaluation dataset consisting of natural language query and code snippet pairs, with the hope that future work in this area can use this dataset as a common benchmark. We also provide the results of two code search models (NCS, UNIF) from recent work.
leeoniya/uPlot: An exceptionally fast, tiny time series chart.
ververica/stateful-functions: Stateful Functions for Apache Flink. The project aims to simplify the development of distributed stateful applications by solving some of the common challenges in those applications: scaling, consistent state management, reliable interaction between distributed services, and resource management.

Pub Stuff:

GPU-Accelerated Viterbi Exact Lattice Decoder for Batched Online and Offline Speech Recognition: We present an optimized weighted finite-state transducer (WFST) decoder capable of online streaming and offline batch processing of audio using Graphics Processing Units (GPUs). The decoder is efficient in memory utilization, input/output bandwidth, and uses a novel Viterbi implementation designed to maximize parallelism.
UNIX: A History and a Memoir: The fascinating story of how Unix began and how it took over the world. Brian Kernighan was a member of the original group of Unix developers, the creator of several fundamental Unix programs, and the co-author of classic books like "The C Programming Language" and "The Unix Programming Environment." Also, VCF East 2019 -- Brian Kernighan interviews Ken Thompson, The History of Unix, Rob Pike, C, the Enduring Legacy of Dennis Ritchie.