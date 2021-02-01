Hey, it's HighScalability time once again!

Amazon converts expenses into revenue by transforming needs into products. Take a look at a fulfillment center and you can see the need for Outpost, machine learning, IoT, etc, all dogfooded. Willy Wonka would be proud.

Akin: Any run-of-the-mill engineer can design something which is elegant. A good engineer designs systems to be efficient. A great engineer designs them to be effective.

@turtlesoupy: 1/n I'm an engineer who wrote Instagram's feed algorithm. Twitter needs to lower the click-probability weighting in its ranking algorithm. A thread. 2/n made you look? Seriously, that's the problem. You are infinitely more likely to "click" on something that has a little thread 🧵 icon in it as opposed to a vanilla tweet. /n that also applies to people posting spicy hot takes where you click into the thread to see outrage. It is rubbernecking. 4/n weighting on P(click), 🧵 posts dominate others and get ranked to the top of feed. But that encourages you to /post/ more 🧵 content so you get to the top of other people's feeds. 5/n that's a "network effect" where a small ranking change has a huge effect on online discourse. Look at me! I'm writing a thread right now and I hate them.

Geoff Huston: NATs are the reason why in excess of 20 billion connected devices can be squeezed into some 2 billion active IPv4 addresses. Applications that cannot work behind NATs are no longer useful and no longer used. However, the pressures of this inexorable growth in the number of deployed devices in the Internet means that the even NATs cannot absorb these growth pressures forever. NATs can extend the effective addressable space by up to 32 ‘extra’ bits, and they enable the time-based sharing of addresses. Both of these are effective measures in stretching the address space to encompass a larger device pool, but they do not transform the address space into an infinitely elastic resource. The inevitable outcome of this process is that either we will see the fragmenting of the IPv4 Internet into a number of disconnected parts, so that the entire concept of a globally unique and coherent address pool layered over a single coherent packet transmission realm will be foregone, or we will see these growth pressures motivate the further deployment of IPv6, and the emergence of IPv6-only elements of the Internet as it tries to maintain a cohesive and connected whole.

Robert J. Sawyer: While the first successful alchemist was undoubtedly God, I sometimes wonder whether the second successful one may not have been the Devil himself.

Michael Wayland: Ford and Google are entering into a six-year deal, which the automaker said is worth hundreds of millions of dollars. The tie-up will make Google responsible for much of the automaker’s growing in-vehicle connectivity, as well as cloud computing and other technology services. Google is expected to assist Ford with everything from in-car infotainment systems and remote, or over-the-air, updates to using artificial intelligence

@samnewman: So a few people have asked why I have this snarky response. What is my problem with this service? Well, to be clear, it’s not an issue with GraphQL, it’s an issue with direct coupling with underlying datasources

Percona: There were not many cases where the ARM instance becomes slower than the x86 instance in the tests we performed. The test results were consistent throughout the testing of the last couple of days. While ARM-based instance is 25 percent cheaper, it is able to show a 15-20% performance gain in most of the tests over the corresponding x86 based instances. So ARM-based instances are giving conclusively better price-performance in all aspects.

Pat Helland: The word consistent is not consistent. I pretty much think it means nothing useful. Eventual consistency means nothing both now AND later. It does, however, confuse the heck out of a lot of people.

Steren: My stack requires no maintenance, has perfect Lighthouse scores, will never have any security vulnerability, is based on open standards, is portable, has an instant dev loop, has no build step and… will outlive any other stack.It’s not LAMP, Wordpress, Rails, MEAN, Jamstack... I don’t do CSR (Client-side rendering), SSR (Server Side Rendering), SSG (Static Site Generation)...My stack is HTML+CSS. And because my sources are in git, pushed to GitHub, GitHub Pages is my host.

jandrewrogers: This implies that average data models today are a million times larger than data models a decade ago. Exabyte scale working data models have been something I've needed to consider in designs for at least a few years. We are surprisingly close to overflowing 64-bit integers in real systems.

@DanRose999: I was at Amzn in 2000 when the internet bubble popped. Capital markets dried up & we were burning $1B/yr. Our biggest expense was datacenter -> expensive Sun servers. We spent a year ripping out Sun & replacing with HP/Linux, which formed the foundation for AWS. The backstory: My first week at Amzn in '99 I saw McNealy in the elevator on his way to Bezos' office. Sun Microsystems was one of the most valuable companies in the world at that time (peak market cap >$300B). In those days, buying Sun was like buying IBM: "nobody ever got fired for it". Our motto was "get big fast." Site stability was critical - every second of downtime was lost sales - so we spent big $$ to keep the site up. Sun servers were the most reliable so all internet co's used them back then, even though Sun's proprietary stack was expensive & sticky. Then something even more interesting happened. As a retailer we had always faced huge seasonality, with traffic and revenue surging every Nov/Dec. Jeff started to think - we have all this excess server capacity for 46 weeks/year, why not rent it out to other companies? Around this same time, Jeff was also interested in decoupling internal dependencies so teams could build without being gated by other teams. The architectural changes required to enable this loosely coupled model became the API primitives for AWS. These were foundational insights for AWS. I remember Jeff presenting at an all-hands, he framed the idea in the context of the electric grid. In 1900, a business had to build its own generator to open a shop. Why should a business in 2000 have to build its own datacenter? Amzn nearly died in 2000-2003. But without this crisis, it's unlikely the company would have made the hard decision to shift to a completely new architecture. And without that shift, AWS may never have happened. Never let a good crisis go to waste!

Adam Leventhal: Maybe AWS will surprise us, but I’m not holding my breath. Outposts are not just expensive but significantly overpriced — despite the big drop and by quite a bit...Outposts pricing remains at least mildly rapacious.

boulos: is a big reason for why I cared about making GKE on-prem / bare metal a thing: I don’t believe (most) customers on-prem want to buy new hardware from a cloud provider. They mostly want to have consistent API-driven infrastructure with their hybrid cloud setup, and don’t want to burn their millions of dollars of equipment to the ground to do so.

@copyconstruc: The 2010s began with the idea that scaling infinitely wasn’t possible with an RDBMS. In 2020, it’s possible to scale “infinitely using strong relational data models (MySQL with Vitess, Postgres with CockroachDB, Aurora, and custom solutions like Cloud Spanner), which is cool.

LeifCarrotson: If you're a veteran with Javascript, and Angular, and Node, and MySQL, you will scarcely be able to add something as simple as a "Birthday" field to the user profile pages of your new employer's SPA on your first day or first week on the job. That background will give you an idea of how you would have done that on previous applications, and give a slight speed boost as you try to skim the project for tables and functions that are related to user profiles, but the new application is almost certainly different.

Geoff Huston: The incidence of BGP updates appears to be largely unrelated to changes in the underlying model of reachability, and more related to the adjustment of BGP to match traffic engineering policy objectives. The growth rates of updates are not a source of any great concern at this point in time.

Kevin Mitchell: The inherent indeterminacy of physical systems means that any given arrangement of atoms in your brain right at this second, will not lead, inevitably, to only one possible specific subsequent state of the brain. Instead, multiple future states are possible, meaning multiple future actions are possible. The outcome is not determined merely by the positions of all the atoms, their lower-order properties of energy, charge, mass, and momentum, and the fundamental forces of physics. What then does determine the next state? What settles the matter?

Drew Firment: The verdict: cloud maturity correlates closely with stronger growth. Not surprisingly, transformational-stage companies are more than 15 percentage points ahead of tactical-stage companies. When you show an obvious commitment to cloud, you see a very real return.

@theburningmonk: Given all the excitement over Lambda's per-ms billing change today, some of you might be thinking how much money you can save by shaving 10ms off your function. Fight that temptation 🧘‍♂️until you can prove the ROI on doing the optimization.

@forrestbrazeal: "We are not close to being done investing in and inventing Graviton" - @ajassy #reinvent. Graviton is threatening to take the place of DynamoDB On-Demand Capacity as my go-to example for how the cloud gets better at no cost to you.

Cory Doctorow: Forty years ago, we had cake and asked for icing on top of it. Today, all we have left is the icing, and we’ve forgotten that the cake was ever there. If code isn’t licensed as “free, you’d best leave it alone.

@justinkan: One of my investments is trying to decide between Google Cloud and AWS. AWS seems to have a ton of support for startups but haven't heard from anyone from Google. Does anyone at Google Cloud work on startups?

@pixprin: 1/ Our [Pixar] world works quite a bit differently than VFX in two major ways: we spend a lot of man hours on optimization, and our scheduler is all about sharing. Due to optimization, by and large we make the film fit the box, aka each film at its peak gets the same number of cores.

Small Datum: On average, old Postgres was slower than old MySQL while new Postgres is faster than new MySQL courtesy of CPU regressions in new MySQL (for this workload, HW, etc) The InnoDB clustered PK index hurts throughput for a few queries when the smaller Postgres PK index can be cached, but the larger InnoDB PK index cannot. The InnoDB clustered PK index helps a few queries where Postgres must to extra random IO to fetch columns not in the PK index

@jevakallio: GraphQL is great because instead of writing boring APIs I can just mess about doing clever schema introspection metaprogramming all day long and get paid twice what a normal API programmer does because nobody else understands what is going on

motohagiography: The last year has illustrated a new kind of "platform risk," where I think we always tried to diversify exposure to them in architecture, security, and supply chains, but it's as though it has finally trickled down to individuals.

B.N. Frank: McDonald’s restaurants are putting cameras in their dumpsters and trash containers in an effort to improve their recycling efforts and save money on waste collection. Nordstrom department stores are doing this as well

pelle: AltaVista itself largely died, because in a misguided attempt to manage the innovators dilemma they just tried to rebrand everything network oriented they had as AltaVista. People only remember the search engine now and for good reason. But we had AltaVista firewalls, gigabit routers, network cards, mail server (both SMTP and X400 (!!!) and a bunch of other junk without a coherent strategy. Everything that had anything to do with networking got the AltaVista logo on it. The focus became on selling their existing junk using the now hip AltaVista brand, but the AltaVista search itself was not given priority.

@randybias: I’m sorry. There is no “open cloud future. That ship sailed. It’s a “walled garden cloud future.

eternalban: So this is the reality of modern software development. Architecture is simply not valued at extremities of tech orgnizations: the bottom ranks are young developers who simply don't know enough to appreciate thoughtful design. And the management is facing incentives that simply do NOT align with thoughtful design and development

@benedictevans: Facebook has 2bn users posting 100bn times a day. The global SMS system had 20-25bn messages a day. So is this a publisher? A platform? A telco? No. We don’t really know what we think about speech online, nor how to think about it, nor who should decide.

@brokep: The pirate bay, the most censored website in the world, started by kids, run by people with problems with alcohol, drugs and money, still is up after almost 2 decades. Parlor and gab etc have all the money around but no skills or mindset. Embarrassing. The most ironic thing is that TPBs enemies include not just the US government but also many European and the Russian one. Compared to gab/parlor which is supported by the current president of the US and probably liked by the Russian one too.

Jessitron: Complexity in software is nonlinear. It goes up way, way faster than value. This integrated architecture looks lovely in the picture, but implementing it is going to be ugly. And every modification forever is going to be ugly, too.

@Suhail: We have 1 virtual machine on GCP and Google is making us talk to sales. It's the holidays. Which sales person will care about our account against their quota?

Forrest Brazeal: Using a combination of traditional reserved instances to cover our platform costs and savings plans for the resources our learners create, we’ve been able to get our reserved instance mix up to about 80% of our total EC2 spend — a KPI that should reduce our EC2 bill by about 30% over the next 12 months at no impact to our users.

@MohapatraHemant: By 2008, Google had everything going for it w.r.t. Cloud and we should’ve been the market leaders, but we were either too early to market or too late. What did we do wrong? (1) bad timing (2) worse productization & (3) worst GTM...Common thread: engineering hubris. We had the best tech, but had poor documentation + no "solutions mindset. A CxO at a large telco once told me “you folks just throw code over the fence. Cloud was seen as the commercial arm of the most powerful team @ Google: TI (tech infra)...My google exp reinforced a few learnings for me: (1) consumers buy products; enterprises buy platforms. (2) distribution advantages overtake product / tech advantages and (3) companies that reach PMF & then under-invest in S&M risk staying niche players or worse: get taken down.

Simon Sharwood: Amazon Web Services is tired of tech that wasn’t purpose built for clouds and hopes that the stuff it’s now building from scratch will be more appealing to you, too...AWS is all about small blast radii, DeSantis explained, and in the past the company therefore wrote its own UPS firmware for third-party products.

danluu: We've seen that, if we look at the future, the fraction of complexity that might be accidental is effectively unbounded. One might argue that, if we look at the present, these terms wouldn't be meaningless. But, while this will vary by domain, I've personally never worked on a non-trivial problem that isn't completely dominated by accidental complexity, making the concept of essential complexity meaningless on any problem I've worked on that's worth discussing.

@ben11kehoe: sigh...every year for Christmas Day operations there turns out to be a single thing we didn’t provision high enough that keeps us from handling the massive influx of new robots entirely hands off keyboard. #ServerlessProblems One year it was a Kinesis stream that needed to be upsharded. Last year it was a Firehose account limit. This year it was one DDB table’s throughput. Always so close!

Mary Poppendieck: People have become way too comfortable with backlogs. It’s just a bad, bad concept.

jamescun: This post touches on "innovation tokens". While I agree with the premise of "choose boring technology", it feels like a process smell, particularly of a startup whose goal is to innovate a technology. Feels demotivating as an engineer if management says our team can only innovate an arbitrary N times.

ormkiqmike: I've been building systems for a very long time. Managed services is a complete game changer for me and I would need some incredible reason to not use it. Especially for enterprise, for me it's a no-brainer. The amount of time/effort enterprises have to do keep their systems' patched/updated, managed services are way cheaper.

qeternity: 4 years old. The world is completely different today. We run a number of HA Postgres setups on k8s and it works beautifully. Local nvme acccess with elections backed using k8s primitives.

ram_rar: I worked in a startup that was eventually acquired by cisco. We had the same dilemma back then. AWS and GCP were great, but also fairly expensive until you get locked in. Oracles bare metal cloud sweetened the deal soo much, that it was a no brainer to go with them. We were very heavy on using all open source tech stuff, but didnt rely on any cloud service like S3 etc. So the transition was no brainer. If your tech stack is not reliant on cloud services like S3 etc, you're better off with a cloud provider who can give you those sweet deals. But you'll need in house expertise to deal with big data.

@stuntpants: The premise here is wrong. arm64 is the Apple ISA, it was designed to enable Apple’s microarchitecture plans. There’s a reason Apple’s first 64 bit core (Cyclone) was years ahead of everyone else, and it isn’t just caches. Arm64 didn’t appear out of nowhere, Apple contracted ARM to design a new ISA for its purposes. When Apple began selling iPhones containing arm64 chips, ARM hadn’t even finished their own core design to license to others. Apple planned to go super-wide with low clocks, highly OoO, highly speculative. They needed an ISA to enable that, which ARM provided. M1 performance is not so because of the ARM ISA, the ARM ISA is so because of Apple core performance plans a decade ago.

Dexter Eric: imagine magnifying the nanoscale world by a factor of ten million, stretching nanometer-scale objects to centimeters. In such a world, a typical atom becomes a sphere about 3 mm in diameter, the size of a small bead or a capital O in a medium-large font

@swardley: Hmmm...DevOps, it's culture not cloud. Agile, it's culture not project methodology. Open Source, it's culture not sharing code. Cloud Native, it's culture not containers. Ok, define culture. No hand waving allowed.

Layal Liverpool: Harris Wang at Columbia University in New York and his team took this one step further, using a form of CRISPR gene editing to insert specific DNA sequences that encode binary data – the 1s and 0s that computers use to store data – into bacterial cells. By assigning different arrangements of these DNA sequences to different letters of the English alphabet, the researchers were able to encode the 12-byte text message “hello world! into DNA inside E. coli cells. Wang and his team were subsequently able to decode the message by extracting and sequencing the bacterial DNA.

fxtentacle: FYI, in situations like this the best way is to: 1. Register your videos and images with the USCO. It'll cost <$100. 2. You can now file DMCA takedowns. Send one to Apple with the USCO registration ID and a copy of the image and a link to the app in question. 3. Apple will either immediately remove that fake app, or be liable for up to $350k in punitive damages for wilful infringement and lose all DMCA protection. 4. If Apple didn't react a week later, approach a lawyer. They'll likely be willing to work purely for 50% commission, because it'll be a slam dunk in court. 5. Repeat the same with Facebook / Youtube if they advertise there with your images or videos. Take Screenshots and write down the url and date and time.

@aschilling: What is necessary to run #Parler? - 40x 64 vCPUs, 512 GB, 14 TB NVMe - 70-100 vCPUs, 768 GB, 4 TV NVMe - 300-400 various other instances (8-16 vCPUs, 32-64 GB) - 300-400 GB/min internal traffic - 100-120 GB/min external traffic

@Snowden: For those wondering about @SignalApp's scaling, #WhatsApp's decision to sell out its users to @Facebook has led to what is probably the biggest digital migration to a more secure messenger we've ever seen. Hang in there while the Signal team catches up.

halfmatthalfcat: I use Akka Cluster extensively with Persistence. It's an amazing piece of technology. Before I went this route, I tried to make Akka Cluster work with RabbitMQ however I realized (like another poster here) that you're essentially duplicating concerns since Akka itself is a message queue. There's also a ton of logistics with Rabbit around binding queues, architecting your route patterns, etc that add extra cognitive overhead. I'm creating a highly distributed chat application where each user has their own persistent actor and each chatroom has their own persistent actor. At this point, it doesn't matter where the user or chatroom are in the cluster it literally "just works". All I need to do is emit a message to the cluster from a user to chatroom or vice versa, even in a cluster of hundreds of nodes, and things just work. Now there's some extra care you need to take at the edge (split-brain via multi-az, multi-datacenter) but those are things you worry about at scale.

rsdav: I've done pretty extensive work in all three major cloud providers. If you were to ask me which one I'd use for a net new project, it would be GCP -- no question. Nearly all of their services I've used have been great with a feeling that they were purposefully engineered (BigQuery, GKE, GCE, Cloud Build, Cloud Run, Firebase, GCR, Dataflow, PubSub, Data Proc, Cloud SQL, goes on and on...). Not to mention almost every service has a Cloud API, which really goes a long way towards eliminating the firewall and helps you embrace the Zero Trust/BeyondCorp model. And BigQuery. I can't express enough how amazing BigQuery is. If you're not using GCP, it's worth going multi-cloud for BigQuery alone.

d_silin: Consumer-grade Optane SSDs are not competetive with flash memory, simple as that. On performance and write endurance, flash-based SSD are "good enough", while also much better on cost-per-GB vs Optane. This reduces available market for Optane to gaming enthusiasts and similar customers - a very small slice of the total PC market. Now, for enterprise markets, best-in-class performance is always in demand, and the product can be (over)priced much higher than on the PC market.

ckiehl: Things I now believe, which past me would've squabbled with: Typed languages are better when you're working on a team of people with various experience levels. Clever code isn't usually good code. Clarity trumps all other concerns. Designing scalable systems when you don't need to makes you a bad engineer. In general, RDBMS > NoSql.

Mary Catherine Bateson: It turns out that the Greek religious system is a way of translating what you know about your sisters, and your cousins, and your aunts into knowledge about what’s happening to the weather, the climate, the crops, and international relations, all sorts of things. A metaphor is always a framework for thinking, using knowledge of this to think about that. Religion is an adaptive tool, among other things. It is a form of analogic thinking.

General John Murray: When you are defending against a drone swarm, a human may be required to make that first decision, but I am just not sure any human can keep up. How much human involvement do you actually need when you are [making] nonlethal decisions from a human standpoint?

berthub: So in the BioNTech/Pfizer vaccine, every U has been replaced by 1-methyl-3’-pseudouridylyl, denoted by Ψ. The really clever bit is that although this replacement Ψ placates (calms) our immune system, it is accepted as a normal U by relevant parts of the cell. In computer security we also know this trick - it sometimes is possible to transmit a slightly corrupted version of a message that confuses firewalls and security solutions, but that is still accepted by the backend servers - which can then get hacked.

Jakob: At the core of these machines are spiked cylinders that determine which notes to play. Basically, a program stored in read-only memory, at a level of expression rather far from the musical notation used to describe the original piece of music...The process of going from a piece of music to the grid definitely counts as programming for me. It is a rather complicated transformation from a specification to an implementation in something that is probably best seen as an analogy to microcode or very long instruction word-style processors.

@gregisenberg: A startup idea isn't "one" idea. It's a 10000 ideas, 100,000 decisions and 1,000,000 headaches

Neil Thompson: When you look at 3D (three-dimensional) integration, there are some near-term gains that are available. But heat-dissipation problems get worse when you place things on top of each other. It seems much more likely that this will turn out to be similar to what happened with processor cores. When multicore processors appeared, the promise was to keep doubling the number of cores. Initially we got an increase, and then got diminishing returns. The first 3 diagrams do all sorts of things: the boat-shaped dial relates polar height to the horizon. The second dial is static, but requires an index string to calculate the position of the sun in the zodiac; the third tells time, here with the help of an original lead weight.

Ravi Subramanian: For some applications, memory bandwidth is limiting growth. One of the key reasons for the growth of specialized processors, as well as in-memory (or near-memory) computer architectures, is to directly address the limitations of traditional von Neumann architectures. This is especially the case when so much energy is spent moving data between processors and memory versus energy spent on actual compute.

@atlasobscura: Did your college math textbooks have moving parts? From 1524 on, a famous German cosmography book came with 5 volvelles! Peter Apian (later knighted for a different book), opted for a novel, tactile-visual teaching style for all earthly and heavenly measurements.

Akin: Design is based on requirements. There's no justification for designing something one bit "better" than the requirements dictate.

Todd Younkin: Hyperdimensional, or HD, computing looks to tackle the information explosion facing us in the years ahead by emulating the power of the human brain in silicon. To do that, hyperdimensional computing employs much larger data sizes. Instead of 32- or 64-bit computing, an HD approach would have data containing 10,000 bits or more.

chubot: Summary: scale has at least 2 different meanings. Scaling in resources doesn't really mean you need Kubernetes. Scaling in terms of workload diversity is a better use case for it.

warent: An app called Robinhood, named after the archetype of a man who took from the rich and gave to the poor, now blocking people from taking from the rich. I hope the irony is not lost on the general public.

Charles Leiserson: Let's get real about investing in performance engineering. We can't just leave it to the technologists to give us more performance every year. Moore's Law made it so they didn't have to worry about that so much, but the wheel is turning.

Andrea Goldsmith: The next generation of wireless networks needs to support a much broader range of applications. The goal of each generation of cellular has always been getting to higher data rates, but what we're looking at now are low-latency applications like autonomous driving, and networks so far have not really put hard latency constraints into their design criteria.

linuxftw: I think the problem boils down to 'product' vs 'project.' Elastic search is very much a product, it's owned by a company, not a foundation. FOSS developers should contribute to projects and not products. Non-copyleft licenses seem to just be code for corporations to build upon, providing them free labor while getting little in return. At least with the GPL, you are getting a promise that they will make available their sources. Consider carefully your expectations when you license your software.

jacobr1: we migrated to k8s not because we needed better scaling (ec2 auto scaling groups were working reasonably well for us) but because we kept inventing our own way to do rolling deploys or run scheduled jobs, and had a variety of ways to store secrets. On top of that developers were increasingly running their own containers with docker-compose to test services talking to each to each other. We migrated to k8s to A) have a way to standardize how to run containerized builds and get the benefits for "it works on my laptop" matching how it works in production (at least functionally) and B) a common set of patterns for managing deployed software.

klomparce: My point of discussion is: as organization's data needs grow over time, obviously there's no single solution for every use case, so there's a need of composition of different technologies, to handle the different workloads and access patterns. But is it possible to compose these systems together with a unifying, declarative interface for reading and writing data, without having to worry about them becoming inconsistent with each other, and also not putting that burden on the application that is using these systems?

Natalie Silvanovich: investigated the signalling state machines of seven video conferencing applications and found five vulnerabilities that could allow a caller device to force a callee device to transmit audio or video data. All these vulnerabilities have since been fixed. It is not clear why this is such a common problem, but a lack of awareness of these types of bugs as well as unnecessary complexity in signalling state machines is likely a factor. Signalling state machines are a concerning and under-investigated attack surface of video conferencing applications, and it is likely that more problems will be found with further research.

UseStrict: We had an (admittedly more complex) monolith application for customer contracts and billing. It wasn't ideal, and was getting long in the tooth (think Perl Catalyst and jQuery), so the powers that be wanted to build a new service. But instead of decomposing the monolith into a few more loosely integrated services, they went way overboard with 20+ microservices, every DB technology imagineable (Oracle RDBMS, Mongo, MariaDB), a full message bus via RabbitMQ, and some crazy AWS orchestration to manage it all. What could've been an effort to split the existing service into manageable smaller services and rewrite components as needed turned into a multi-year ground-up effort. When I left they were nowhere near production ready, with significant technical debt and code rot from already years out of date libraries and practices.

fishtoaster: I've talked* to a number of bootstrapped and non-profit companies who are all-in on cloud and I think there are a few use-cases you're missing beyond just "we value dev velocity over cost savings." The biggest one is ease of scaling vs something like colocation. I talked to a non-profit with incredibly spiky traffic based around whenever they get mentioned in the news. Since every dollar matters for them, being able to scale down to a minimal infrastructure between spikes is key to their survival. Another company I talked to has traffic that's reliably 8x larger during US business hours vs night time and uses both autoscaling and on-demand services (dynamodb, aurora serverless) to pay ~1/3 of what they'd have to if they needed to keep that 8x capacity online all the time.

bane: And that's it. I know there are teams that go all in. But for the dozen or teams I've personally interacted with this is it. The rest of the stack is usually something stuffed into an EC2 instance instead of using an AWS version and it comes down to one thing: the difficulties in estimating pricing for those pieces. EC2 instances are drop-dead simple to price estimates forward 6 months, 12 months or longer. Amazon is probably leaving billions on the table every year because nobody can figure out how to price things so their department can make their yearly budget requests. The one time somebody tries to use some managed service that goes over budget by 3000%, and the after action figures out that it would have been within the budget by using in EC2, they just do that instead -- even though it increases the staff cost and maintenance complexity. In fact just this past week a team was looking at using SageMaker in an effort to go all "cloud native", took one look at the pricing sheet and noped right back to Jupyter and scikit_learn in a few EC2 instances. An entire different group I'm working with is evaluating cloud management tools and most of them just simplify provisioning EC2 instances and tracking instance costs. They really don't do much for tracking costs from almost any of the other services.

