Stuff The Internet Says On Scalability For April 5th, 2019

Wake up! It's HighScalability time:

SHUTDOWN ABORT the last Oracle database running Amazon Fulfillment! pic.twitter.com/DorqTua2LtMarch 29, 2019

How unhappy do you have to be as a customer to take so much joy in end-of-lifing a product?

Do you like this sort of Stuff? I'd greatly appreciate your support on Patreon. I wrote Explain the Cloud Like I'm 10 for people who need to understand the cloud. And who doesn't these days? On Amazon it has 44 mostly 5 star reviews (100 on Goodreads). They'll learn a lot and love you for the hookup.

  • $40 million: Fortnite World Cup prize money; 89%: of people who like Go say they like Go; 170 million: paid iCloud accounts; 533: days bacteria lived on the outside of ISS; 95%: BTC volume is fake; 51: LTE vulnerabilities found by fuzzing; 13,000: CRISPR edits in a single cell; 5G: 762Mbps down and a 19ms ping; 17,000: awesome Historic Blues & Folk Recordings; 3,236: Amazon broadband LEO satellite network; 5.1 million: emails sent during 10 day spam campaign; 

  • Quoteable Quotes:
    • @KimZetter: Crashed Tesla vehicles sold at junk yards and auctions found to contain invasive personal info on driver, including phonebook and calendar info from drivers' paired mobile devices; also found unencrypted video showing what happened just before accident
    • Pete Warden: There are 150 billion embedded processors out there in the world, that’s more than twenty each for every man, woman, and child on earth! Not only did that number amaze me when I first came across it, but the growth rate is an astonishing 20% annually, with no signs of slowing down. That’s much faster than smartphone usage, which is almost flat, or the growth in the number of internet users, which is in the low single digits these days...My objection to the internet of things is that the majority of embedded devices are not connected to any network, and as I’ll discuss in a bit, it’s unlikely that they ever will be, at least more than intermittently.
    • @fharper: But I’m also hurt because the people I cared about, treated me like if I was no one, like if I didn’t deserve to have someone from @npmjs tell me the bad news. People I loved, who at the moment all my accounts were turned off (which was during the meeting), never looked back.
    • Gustav Kuhn: It is only once you start thinking about some of the huge day-to-day challenges our visual system constantly faces that the true wonders of the brain start to emerge. Our brain uses a really clever and almost science-fictional trick that prevents us from living in the past: we look into the future. Our visual system is continuously predicting the future, and the world that you are now perceiving is the world that your visual system has predicted to be the present in the past
    • Young Girl: I’ve thought about deleting instagram, but then I think—what would I do with my life?
    • DSHR: the fundamental problem of digital preservation is economic; we know how to keep digital content safe for the long term, we just don't want to pay enough to have it done.The budgets for society's memory institutions - libraries, museums and archives - have been under sustained pressure for many years. For each of them, caring for their irreplaceable legacy treasures must take priority in their shrinking budget over caring for digitized access surrogates of them. 
    • Joel Hruska: The third problem, and possibly the biggest, is that Optane’s performance advantages appear to be fairly modest and potentially narrowly-tailored. Customers can run up to 30 percent more VM’s per server when using Microsoft HyperV. SAP Hana users can either restart a database 12.5x faster or save 39 percent on the cost of a large, in-memory database. Intel has said it’s shooting for a 1.2x improvement on performance-per-dollar over DRAM on target workloads.
    • @joeerl: I keep telling people ‘programming is understanding’ – the computer is a machine that tests if your ideas are correct. Once you understand a problem the program can usually be written pretty quickly. Understanding a problem can take years
    • ThePleb.Legation: its still quite a bit faster than NAND is likely to ever be, even NVMe RAID 0's best latencies are 10x slower than what Optane is offering, but optane is not competing with NAND its competing with RAM, and its only about half as fast as average ram latencies...at best....which is not great. I could see some form factor advantages if Mobos could drop other interfaces and we could all just use 8 channel DIMM slots for all storage, that would be interesting if not compelling.
    • codelord: Previously I worked at Waymo for a year on the perception module of the self driving car. Based on what I know about the state of the art of computer vision, I can pretty much guarantee that current Tesla cars will never be autonomous. This is probably a huge risk for the investors of Tesla, because currently Tesla is selling a fully-autonomous option on its car which will never happen with current hardware. We need several breakthroughs in computer vision for no-fail image based object detection, and you will need higher resolution cameras, and much much more compute power to be able to process all the images. 
    • Matthew Lamanna: Just the idea of fish with impact particles stuck in their gills from 66 million years ago, and trees with amber with impact particles, it’s so extraordinary that you do a double take for sure
    • A. Jesse Jiryu Davis: Kevin Stevens, a 55-year-old programmer, faced a similar attitude when he applied for a position at Stack Exchange six years ago. He was interviewed by a younger engineer who told him, "I'm always surprised when older programmers keep up on technology." Stevens was rejected for the job. He now works as a programmer at a hospitality company where he says his age is not an issue.
    • Rachel Green: US consumers are becoming more comfortable with going cashless, which is good news for payments firms. In 2018, 29% of US adults said they don’t make purchases using cash during a typical week, up from 24% in 2015. And for payments firms that collect per-transaction card swipe fees, it's in their best interest for retailers to go cashless.
    • @cpurdy: It took 2+ entire racks, floor to ceiling, to house a fully kitted Sun e10000. Designed by a Cray team (acquired by Sun via SGI), the server had up to 64 CPUs and 64GB RAM, and cost over $1m. You can get a dramatically more powerful server today in a 1U enclosure, for under  $10k
    • lewisjoe: If there's one thing I'd take away for the rest of my career from Zoho, it would be frugality in adopting the latest of tech. I believe staying frugal in adopting the latest hype, can only be reasoned about in hindsight.
    • George Church: Probably we should be less concerned about us versus them and more concerned about the rights of all sentients in the face of an emerging unprecedented diversity of minds. We should be harnessing this diversity to minimize global existential risks, like supervolcanoes and asteroids.
    • Yevgeniy Brikman  the DevOps industry is very much in the stone ages, and I don't say that to be mean or to insult anybody. I just mean, literally, we are still new to this. We have only been doing DevOps, at least as a term, for a few years. Still figuring out how to do it. But what's scary about that is we're being asked to build things that are very modern. We're being asked to put together these amazing, cutting edge infrastructures, but I feel like the tooling we're using to do it looks something like that.
    • Max Smolaks: Sir Jonathan Thompson, chief executive and permanent secretary of HMRC, claimed AWS was working out 50 per cent cheaper than Azure. At the core of HMRC's operations is the Multi-Channel Digital Tax Platform – this Platform-as-a-Service has been live for more than five years, surviving three major iterations. It operates on a "typical microservices architecture" with 850 microservices in production, all of them stateless, and MongoDB in the backend – apart from "classic" services that run on Oracle.
    • Jamie Kim: the manufacturing sector’s adoption of ML and advanced analytics—used to improve predictive maintenance—will jump from 28% to 66% in the next 5 years.
    • @jeffbigham: The 2nd day of the month is my favorite day because it’s when I get a $9.95 bill from AWS for something I can’t figure out how to shut down.
    • @peterbourgon: "Lucet can instantiate WebAssembly modules in under 50µs, with just a few KB of memory overhead. By comparison, Chromium’s V8 engine takes about 5ms, and tens of MB of memory overhead, to instantiate JavaScript or WebAssembly programs." IMO this will be the game changer for computation at the edge, a totally different capability of service compared to anything currently on the market or in the pipeline (Lambda, FaaS, V8...)
    • @Ana_M_Medina: “Teams spend too much time reacting to outages instead of building resilient systems” -@aaronrinehart #ChaosDay19
    • Ramez Naam: Building new solar, wind, and storage is about to be cheaper than operating existing coal and gas power plants. That will change everything.
    • @QuinnyPig: S3 is over 235 distributed microservices. THIS IS NOT A CHALLENGE, SOFTWARE ENGINEERS. THEY ARE NOT POKÉMON. YOU DO NOT NEED TO CATCH THEM ALL. S3's requirements are almost certainly not yours. You aren't @awscloud. If you work at AWS please ignore this tweet. #AWSsummit
    • @dhh: In fact, it’s alarming how much of Microsoft’s cut-off-the-air-supply playbook on browser dominance that Google is emulating. From browser-specific apps to embrace-n-extend AMP “standards”. It’s sad, but sadder still is when others follow suit.
    • Anton Andrews: The 21st century is defined by uncertainty. You can find yourself, very efficiently, doing the wrong thing.
    • Clive Thompson: Everyone who was good at coding was also amazing at unbelievable levels of frustration. Because coding itself is both wildly and insanely frustrating. I actually called one of the chapters of my book “Constant failure and Bursts of Joy" because that really described what the actual flow of what a coder is actually like. And I really got this appreciation about how tenacious and persistent programmers are. It’s really interesting.
    • @DavidZipper: At a mobility event today I met an auto industry rep who told me "to enable AV's we need a period of increased urban law enforcement so pedestrians know what they can't do. Then they'll change behavior."  I was so stunned I could barely respond "I think that's a horrific idea."
    • @jasonlk: The Matrix was made in Australia because the budget came in 1/3d of Hollywood ($60M V $180M). 20 years later, same exact ratio for startups outside and inside of SF
    • @vgr: When you don’t understand algorithms they scare you because they seem soulless. When you do understand them they scare you because they hold up a mirror to *your* soullessness. “Ah shit, I’m just a branch-and-bound with pretensions to poetic grace”
    • Jordana Cepelewicz: That also implies that how we form memories — and which memories we form — depends on what we’re trying to do. Not all memories, locations and experiences are created equal (or at least they’re not encoded that way). Google doesn’t pull up a different map for me if I’m headed to Starbucks, versus if I’m going out for a walk. 
    • @ctosays: "Most of our customers are building with #serverless", said Werner Vogels today at #AWSSummit . "Serverless is a whole stack, not just compute. It is really a cloud-first strategy. Nobody will be managing infrastructure anymore."
    • @gchaslot: The YouTube algorithm I worked on heavily promoted Brexit, because divisiveness is efficient for watch time, and watch time leads to ads.
    • strangattractor: Because programming languages are practically irrelevant in making products. They are only important to developers. In the last thirty years of programming I have yet to see a project fail because of the programming language or code quality. More often than not it is because someone made something that no one wanted. From a reliability stand point Windows has always been a nightmare. It is still one of the best and most successful products of all time. It could be written in Urdu and no one would care.
    • Matt Vollrath: CBOR is a relatively new IETF draft standard extensible binary data format. Compared to similar formats like MessagePack and BSON, CBOR was developed from the ground up with clear goals: good JSON conversion
    • Darren Byler: The religious and political transgressions of these detainees were frequently discovered through social media apps on their smartphones, which Uyghurs are required to produce at thousands of checkpoints around Xinjiang. Although there was often no real evidence of a crime according to any legal standard, the digital footprint of unauthorized Islamic practice, or even an association to someone who had committed one of these vague violations, was enough to land Uyghurs in a detention center. 
    • Timothy Prickett Morgan: What MemVerge is trying to address are some of the concerns of companies wanting to adopt Optane PMMs, according to Fan. “What distributed storage software are customers supposed to use with this storage? Optane PMMs have a latency on the order of 100 nanoseconds. All of the previous generations of distributed storage software were all written for media that is media that has a latency of roughly 100 microseconds – three orders of magnitude slower. The faster NVM-Express SSDs from Intel and Samsung can get down to 10 microseconds, and it is still two orders of magnitude away. So if you are using HDFS, Ceph, or Gluster, then 95 percent of the latency will be in the storage software and the stack becomes the bottleneck. That is going to slow down the whole storage system and remove the benefits of the hardware. So a new software stack has to be created to take full advantage of this new media.”
    • Trail of Bits: Window messages are an under-appreciated and often ignored source of untrusted input to Windows programs. Even 19 years after the first open-source window message fuzzer was deployed, 93% of tested applications still freeze or crash when run against the very same fuzzer. The fact that some applications gracefully handle these malformed inputs is an encouraging sign: it means frameworks and institutional knowledge to avoid these errors exist in some organizations.
    • tunesmith: So many parallels to the music industry here [Uber dropping rates for drivers down from 80 cents/mile to 60 cents/mile]. You have a vc-funded company, operating at a loss, commoditizing the "talent" (drivers/musicians), thereby conditioning the audience (passengers/listeners) to expect an unsustainable cost for the service, using that loss leader mentality to drive other solutions out of business, until they argue or take action that the talent should expect even lower rates than they've historically gotten so the company can have a shot at being profitable. When the larger effect is that the company has captured value that used to go to the talent and then instead goes to the investors.
    • Tom Chatfield: We have introduced something exponential into the equations of planetary time – and that something is technology. [Coevolution with technology] marks humanity’s departure from the rest of life on Earth. Alone among species […] humans can consciously improve and combine their creations over time – and in turn extend the boundaries of consciousness. It is through this process of recursive iteration that tools became technologies; and technology a world-altering force.
    • Linus: For example, back in 1994, I was mostly a developer. Sure, I was the lead maintainer, but while I spent a lot of time merging patches, I was also mostly writing my own code. These days I seldom write much code, and the code I write is often pseudo-code or example patches that I send out in emails to the real developers. I'd hesitate to call myself a "manager", because I don't really do things like yearly reviews or budgets, etc. (thank God!), but I definitely am more of a technical lead person than an actual programmer, and that's been true for the last many years.
    • jakedata: I have had the opportunity to work with an Epyc server with a full load of NVMe/U.2 direct connected to all those tasty PCIe lanes. Any given drive worked wonderfully, and writing simultaneously across many drives with DD showed very good performance. Unfortunately any attempt to use Linux native software RAID showed performance barely in excess of a single drive, even with a simple stripe. I don't blame the NVMe, but something in Linux RAID just doesn't scale in performance. I spent days tweaking it, ultimately it was a great disappointment.
    • Eric Berger: It's not clear what small launch companies will reach operational status next and offer a challenge to Rocket Lab. Vector, Firefly, and Virgin Orbit have all talked about flying their rockets to space this year, and there are other, more-secretive companies such as Astra Space also nearing that threshold as well. Chinese startup OneSpace, one of several semi-private efforts in that country talking about orbit in 2019, saw its first orbital attempt fail in March.
    • @tdierks: I've been working on Google's cryptography policy (for engineers). It fits in a tweet: Don't invent your own algorithms, don't design your own protocols, don't code your own implementations, don't manage your own keys, and do ask for advice.
    • @LiorSteinberg: Amsterdam will remove 1,500 parking spots a year until 2025. 11,200 in total. Bicycle parking, sidewalks, and green will replace them. Take a moment and imagine your city with 11k less parking spaces. Have a nice weekend!
    • Shuler: The layer/API between hardware and software is becoming less generic and more specific for those kinds of use cases, solving those kinds of problems. What that means, though, is there are software guys who went to Stanford and trained on Java script and have no idea what a register is. Then there are hardware guys who have no idea what a hypervisor or object-oriented programming is.
    • @tammybutow: A few key SRE practices I strongly believe in:  🌟Do incident review (post-mortem) action items 🌟Use error budgets 🌟Dig into failures and learn from them 🌟Measure Availability & Durability 🌟Focus on business success metrics
    • @adamhj: So happy that @chef is now a 100% open source company. They are done being open core, and I have to say, I'm stoked about it. It aligns the company with its core values in a way that is so much more elegant and understandable.
    • @arungupta: Five pillars of modern apps: - App first, not infrastructure - Serverless - Automate everything - Security is everyone's job - Extract the most value from your data #AWSSummit
    • @jasongorman: My reply: "What thought did you put into exploring the possibilities?" Did they do a truth table or a decision table? Did they sketch a state transition model or a response matrix? Did they use Venn diagrams to explore the entire input space? Did they visuallse concurrency?
    • @Ana_M_Medina: Always love watching @mipsytipsy speak 🦄 🌈 #ChaosDay19 Commandments about #observability: ▪️well instrumented ▪️high cardinality ▪️high dimensionality▪️Event oriented perspective ▪️structured data ▪️software ownership ▪️sampled ▪️ tested in prod (#chaosengineering)
    • @danielbryantuk: "We decided to enforce our SLA by turning the system off for a time within our error budget. This forced consumers to adapt correctly to our SLA, and not rely on higher than advertised availability" #ChaosDay19
    • matt2000: I used to assume that the "10x engineer" was 10x faster at the programming/typing part, but it turns out it's more like identifying the 1/10th of the work that actually matters and just do that efficiently. It is also true that there's a lot of experience in both the identification of that 10%, and the efficient execution of it though.
    • Rob Adamson: Bohr was right. Our universe is more like computer data than matter separated by space and time. If nothing can traverse space faster than the speed of light, then instantaneous entanglement would be impossible.
    • @tmclaughbos: A cloud provider telling someone that moving to their platform will make their application faster means the person saying that doesn't understand cloud, is lying, or is just telling customers what they want to think.
    • Fender: We chose not to deploy the built functions using the builder function, but instead to upload the built, zipped binary to S3, and deploy that package to the microservice’s lambda function(s) from within the CircleCI pipeline. The above optimizations, for a microservice with 50 lambda functions, reduced build times from ~20 minutes to ~3 minutes.
    • Michael Feldman: Moving computational resources into storage devices not only shortens the access path to data, thus lowering latency, but also alleviates the bottleneck at the I/O ports. When it comes to really big datasets – here, we’re talking petabytes – keeping everything on the storage side can make a huge difference. For example, to transfer one petabyte of data from storage to main memory over 32 lanes of PCIe Gen3, takes a full nine hours. That time is cut in half for Gen4 and will be cut in half again for Gen5 when it arrives, but you’re still talking hours. If you have to move the data from a storage array to servers over a 100 Gbps network, you’re looking at over a day to load it. Computational storage means you only have to deal with the extremely fast busses within the device itself.
    • adriancolyer: "In Aurora, the only writes that cross the network are redo log records. No pages are ever written from the database tier, not for background writes, not for checkpointing, and not for cache eviction. Instead, the log applicator is pushed to the storage tier where it can be used to generate database pages in background or on demand. In Aurora, durable redo record application happens at the storage tier, continuously, asynchronously, and distributed across the fleet. Any read request for a data page may require some redo records to be applied if the page is not current. As a result, the process of crash recovery is spread across all normal foreground processing. Nothing is required at database startup." 

  • Maybe it's time to evaluate programmers using a more generative metric?
    • From Two Years at Facebook
      • Good: Having access to some fantastic Facebook tools, especially for monitoring.
      • Bad: Having to deal with some utterly insane Facebook tools, written to replace stuff that actually worked better because writing new stuff is how people get "impact" here.
    • jfasi: To understand why [Google shutting down products] keeps happening, you need to understand the product and engineering culture at Google. As a group, Google engineers and PMs are obsessed with promotion. At the heart of every conversation about system design or product proposal lies an unspoken (and sometimes spoken) question: will working on this get me promoted? The criteria for promotion at Google, especially at the higher levels like SWE III -> Senior and especially at Senior -> Staff and above, explicitly talk about impact on the organization and the business. This has consequences for the kind of teams people try to join and kind of work they choose to do. Maintenance engineering is so not-rewarded that it's become an inside joke. Any team that isn't launching products starts bleeding staff, any project that isn't going to make a big splash is going to be neglected, and any design that doesn't "demonstrate technical complexity" will be either rejected or trumped up. This is also why GMail, YouTube, Search, GCP, Android, and others aren’t going anywhere. They’re making money, they’re core to the business, and there’s plenty of opportunity to work on them and get promoted. They all also share one thing in common: deep down they’re frontends for search or advertising (GCP and Apps are an exception because they make money on their own). Measuring and proving impact on search numbers is a well-known promo narrative at Google, so those products are a safe bet for employees and users. Streaming game services, not so much.

  • Compiled code is usually faster, that's not a surprise. The surprise is you can now compile code for the web. How We Used WebAssembly To Speed Up Our Web App By 20X: This article is a case study on using WebAssembly to speed up a data analysis web tool. To that end, we’ll take an existing tool written in C that performs the same computations, compile it to WebAssembly, and use it to replace slow JavaScript calculations.

  • The Pop Music/Junk Food Connection. Great explanation of what's different in the modern world: experiences are manufactured by teams, using science and technology, to make us want to do something. Pop music makes you want to listen. Junk food makes you want to eat. Viral video makes you want to watch. Apps make you want to engage. Drugs make you want to take more drugs. Hate makes you want to hate more. This example is a pop song. Rick Beato says songs have become so sophisticated in their arrangement and melodic hooks that often a team of 10 song writers and three producers conspire to manufacture a song. Teams of scientists come up with the right formula to addict the masses to these songs using auto tuned voices and synthetic instruments. Songs are put on a grid and assembled by a team of people. This isn't to say they aren't well done. They are perfectly done. That's the point. But like junkfood formulated directly to appeal to our lizard brain, are all these manufactured experiences good for us?

  • Why We [Rainforest] Moved from Heroku to Google Kubernetes Engine. The move from Heroku was driven by the need for better security and more database and compute scalability. Kubernetes was chosen because moving to a containerized environment was relatively easy. They chose GCP because GKE manages the Kubernetes master and nodes; GKE manages autoscaling at the cluster level and also has terrific support for horizontal pod autoscaling at the application level, including support for autoscaling on custom metrics. Their new stack: Terraform, GKE, Cloud SQL, Cloud Memorystore, Helm, Stackdriver. 

  • Why serverless is still in its infancy: What is and what isn’t serverless is indeed a bit of a controversial topic. In our initial vision we argue for a broad definition for serverless. A serverless service should in principle exhibit the following aspects: (1) Granular billing; (2) Minimal operational logic; (3) Event-Driven. We identify six major performance challenges: (1) Overhead; (2) Performance isolation; (3) Scheduling policies; (4) Performance prediction; (5) Engineering for Cost-Performance; (6) Evaluating and Comparing FaaS Platforms. 

  • Bill Gates on 10 Breakthrough Technologies 2019: Robot dexterity; New-wave nuclear power; Predicting preemies; Gut probe in a pill; Custom cancer vaccines; The cow-free burger; Carbon dioxide catcher; An ECG on your wrist; Sanitation without sewers; Smooth-talking AI assistants. 

  • The experience curve means as we accumulate more experience we become more efficient. Perhaps experience is not always valued in the software industry because with the fast cycling of different platforms and frameworks there's not a lot of time for people to climb the experience curve before it all changes again? Much of that experience delta continuously folds into infrastructure, which removes some of the value from individual contributors. There's a value shift from individuals to infrastructure. So when you look at where you spend money on a project these days it should not be suprising if more is being spent on infrastructure. Infrastructure has become a value pool.

  • Envoy performed significantly better than our existing Linkerd 1.0 setup while requiring less processing power and memory resourcesWe deployed Envoy Proxy to make Monzo faster: Our microservices perform tens of thousands of RPC calls per second over HTTP. However, to make a reliable and fault tolerant distributed system, we need service discovery, automatic retries, error budgets, load balancing and circuit breaking...A key reason we were able to do this with Envoy and not Linkerd was due to the significantly lower processing power and memory requirements with Envoy. We’re now running thousands of copies of Envoy across our infrastructure and this number continues to grow as we roll out Envoy as a sidecar to all service deployments.

  • Invoking copyright to take down videos is the real right to be forgotten. You will not be remembered if you don't let people talk about you. The Music BLOCKERS Are Back! (Rant)

  • There's a middle ground between unending config files and using the console and that's infrastructure as software: Full Stack Journey 030: Building Cloud-Native Infrastructure As Code With Pulumi

  • This brings an end to the CloudFoundry Project as a standalone platform and the remnants from the project will remain for sometime supporting existing customers. Lessons From The “Demise” Of CloudFoundry. Standards Matter: While Docker was gaining momentum, CloudFoundry required users to embrace platform specific container engine. Opinionated Platforms Are Risky: The CloudFoundry platform was more opinionated than some competing platforms in the market. Hyper-marketing doesn’t help: From day one, hyper-marketing has been the go to market strategy for CloudFoundry. The platform should support a continuum of services and it should not require a complete rip and replace. The higher order abstraction should be built on top of standardized lower level components so that the on-ramp to the abstraction is a continuum than a steep barrier. If you are doing OSS, build diversity in contribution and business around the OSS project. 

  • In EPISODE 167 — YOUTUBE AND THE END OF FRICTION it's taken as an a priori assumption that technology is inherently amoral. Is that true? Let's use a utility framework. How well is this assumption working for us? Excellent in many, if not most areas. But as we've seen with YouTube and Facebook, it's not working well at all in others. Perhaps we need a Big M notation, where the M is for morality. It's moral framework independent, you get to determine the moral framework, but the point is technology decisions should be considered within some moral framework, in the same way we consider big O notation when selecting algorithms. At least then we as programmers would start considering our choices in a larger context. I know that's something I've never done and we don't do well as an industry.

  • Cloudflare wanted to run code on their 165+ edge locations. This is what they came up with—and why. Fine-Grained Sandboxing with V8 Isolates. Cloudflare runs serverless code across their entire network rather than just a few locations. Running code everywhere has implications that they handle by imposing a tried and true technique—they impose limits. Requests are limited to 50 msecs. Isolates are stateless. Images must be small. Memory usage must be constrained. There is no SLA. There are no timers. Eval is not supported. State on the edge is not supported. Context switching becomes a problem because there's less locality because a machine will have many tenants. Startup time must be small because new code will be paged in more often to handle more tenants. Code is copied to all locations making startup up as fast as possible. V8 was the almost perfect solution to all these problems. There's a big section on security. The big point is a lot of work goes into making V8 secure, so it's more secure than probably anthing else you're using. A thread is started for each incoming HTTP connection. Multiple isolates run on the same thread. DDoS protection happens before Workers. 

  • As a long time Kaiser member I've often lamented all the opportunities to collect data they just flush down the toilet. Instead of viewing every patient interaction as an opportunity to improve the system they just ignore patient outcomes. Kaiser never asks if you got better. All that data that could help diagnose patients is never gathered. It's a shame and painfully backward looking. Machine Learning in Medicine: The accelerating creation of vast amounts of health care data will fundamentally change the nature of medical care. We firmly believe that the patient–doctor relationship will be the cornerstone of the delivery of care to many patients and that the relationship will be enriched by additional insights from machine learning. We expect a handful of early models and peer-reviewed publications of their results to appear in the next few years, which — along with the development of regulatory frameworks and economic incentives for value-based care — are reasons to be cautiously optimistic about machine learning in health care. We look forward to the hopefully not-too-distant future when all medically relevant data used by millions of clinicians to make decisions in caring for billions of patients are analyzed by machine-learning models to assist with the delivery of the best possible care to all patients.

  • Last week we had a good talk about identity. Pat Helland has an article on that very subject: Identity by Any Other Name. As you might imagine, Pat is insightful, tackling the implications of identity at the wider system level: New emerging systems and protocols both tighten and loosen our notions of identity, and that's good! They make it easier to get stuff done. REST, IoT, big data, and machine learning all revolve around notions of identity that are deliberately kept flexible and sometimes ambiguous. Notions of identity underlie our basic mechanisms of distributed systems, including interchangeability, idempotence, and immutability.

  • The point here is instead of driving extensions with webhooks you can insert lambda calls when you need specialization or customization. These AOP style extension points are everywhere once you start looking. How to FaaS like a pro: 12 less common ways to invoke your serverless functions on AWS [Part 1]

  • Fun stories from the dark side. Darknet Diaries

  • If IT isn't at least trying to be profit center then you aren't doing it right. Dick's Sporting Goods' 'Foolish' Software Development Move: Mr. Gaffney, who led a similar effort at Home Depot before joining Dick’s, began by having his team tackle the retailer’s inventory management system. The goal of the program, which is about halfway to completion, is to create a system that would earn at least 10-times in annual revenue the cost of the eight-person team that developed it.

  • Mobile Web Performance @ Caviar: Resizing/compressing to WebP format and using responsive images reduced our First Contentful Paint by 6 seconds. Limiting our DOM nodes reduced our Time to Interactive by 4 seconds. Optimizing our package usage resulted in a decrease of about 145kb in our vendor bundle. Code splitting and dynamically importing packages and components on HTTP/2 reduced our base bundle sizes by over 50% cumulatively, which further reduced our TTI by 10 seconds and brought it down to about 13 seconds.

  • Good advice on Learning to build distributed systems. Under each section there's a lot of references, but the general gist is: Learn through the work of others; Get hands on; Go broad; Become an owner; It takes time. 

  • People realizing UDFs [User Defined Functions] are "evil" has a long history, but the dream of executing code in the database will never die. CMU Advanced Database Systems - 16 Server-side Logic Execution (Spring 2019 (slides) shows achieving a 500x speedup using optimization techniques. And yes, you can imagine some sort of managed hybrid lambda/database/state machine runtime environment spanning datacenters. But the argument against database + code is not one of efficiency. In the end form follows function. Data is in different forms because those formats are optimized to support specific functions. So the dream will remain a dream like a one world government, one diet for everyone, or the one ring.

  • These techniques have gotten Kenna to the point where we can process over 350 million documents a day and we still have room to grow.  Scaling Elasticsearch Part 1: How to Speed Up Indexing: Toggle your refresh interval; Bulk process documents; Route your documents.

  • A recent study by Subbu Allamaraju of Expedia showed that about two-thirds of their outages happen when something is changed. The Rise of Progressive Delivery for Systems Resilience. The following trends are under the umbrella of progressive delivery: Canary testing; Blue-green deployments; A/B testing; Feature toggling; Service meshing; Observability; Chaos engineering. 

  • Five Principles for Thinking Like a Futurist: Forget about predictions. Focus on signals. Look back to see forward. Uncover patterns. Create a community.

  • Four Things We Should Change About Networking: Stop thinking of hop by hop, focus on flow paths; Not one network, many interconnected customised networks; Stop using self-configuring/autonomous operations & use intentional/automated modes; SDN Federation is next interoperability challenge

  • jewang/gesture-demo:  a gesture recognition magic wand that I built as part of a Harry Potter costume for Halloween 2018. The wand detects W (wingardium leviosa) and spiral (flippendo) gestures as inspired by the Harry Potter and the Sorcerer's Stone computer game.

  • google/tink: Using crypto in your application shouldn't have to feel like juggling chainsaws in the dark. Tink is a crypto library written by a group of cryptographers and security engineers at Google. It was born out of our extensive experience working with Google's product teams, fixing weaknesses in implementations, and providing simple APIs that can be used safely without needing a crypto background.

  • aws-samples/aws-serverless-event-fork-pipelines: an architectural pattern where an event source, such as an Amazon SNS topic, is used to send events to multiple processing pipelines. Each processing pipeline creates a separate subscription to the Amazon SNS topic. SNS Subscription Filter Policies can be applied for each subscription to ensure each pipeline only receives the messages they want to process.

  • facebookresearch/PyTorch-BigGraph: is a distributed system for learning graph embeddings for large graphs, particularly big web interaction graphs with up to billions of entities and trillions of edges.

  • dynamodb-geo (article): This project is an unofficial port of awslabs/dynamodb-geo, bringing creation and querying of geospatial data to Node JS developers using Amazon DynamoDB.

  • FlyMC: Highly Scalable Testing of Complex Interleavings in Distributed Systems: We present a fast and scalable testing approach for datacenter/cloud systems such as Cassandra, Hadoop, Spark, and ZooKeeper. The uniqueness of our approach is in its ability to overcome the path/state-space explosion problem in testing workloads with complex interleavings of messages and faults. We introduce three powerful algorithms: state symmetry, event independence, and parallel flips, which collectively makes our approach on average 16× (up to 78×) faster than other state-of-the-art solutions. We have integrated our techniques with 8 popular datacenter systems, successfully reproduced 12 old bugs, and found 10 new bugs— all were done without random walks or manual checkpoints.

  • Fail-Slow at Scale: Evidence of Hardware Performance Faults in Large Production Systems: Fail-slow hardware is an under-studied failure mode. We present a study of 101 reports of fail-slow hardware incidents, collected from large-scale cluster deployments in 12 institutions. We show that all hardware types such as disk, SSD, CPU, memory and network components can exhibit performance faults. We made several important observations such as faults convert from one form to another, the cascading root causes and impacts can be long, and fail-slow faults can have varying symptoms. From this study, we make suggestions to vendors, operators, and systems designers.

  • Cluster storage systems gotta have HeART: improving storage efficiency by exploiting disk-reliability heterogeneity: In this paper, we make a case for exploiting reliability heterogeneity to tailor redundancy settings to different device groups. We present HeART, an online tuning tool that guides selection of, and transitions between redundancy settings for long-term data reliability, based on observed reliability properties of each disk group.

  • A SPEC RG Cloud Group’s Vision on the Performance Challenges of FaaS Cloud Architectures: In this work we, the SPEC RG Cloud Group, identify six performance-related challenges that arise specifically in this FaaS model, and present our roadmap to tackle these problems in the near future. This paper aims at motivating the community to solve these challenges together.

  • Online Event Processing:  we name it OLEP (online event processing) to contrast with OLTP (online transaction processing) and OLAP (online analytical processing). This article explains the reasons for the emergence of OLEP and shows how it allows applications to guarantee strong consistency properties across heterogeneous data systems, without resorting to atomic commit protocols or distributed locking. The architecture of OLEP systems allows them to achieve consistently high performance, fault tolerance, and scalability. @martinkl: My student's experimental new database beats MySQL by approximately 700% on the TPC-C benchmark in initial tests. This research is going somewhere interesting!