hot links

Stuff The Internet Says On Scalability For June 22nd, 2018

High Scalability

22 Jun 2018 — 21 min read

Hey, it's HighScalability time:

4th of July may never be the same. China creates stunning non-polluting drone swarm firework displays. Each drone is rated with a game mechanic and gets special privileges based on performance (just kidding). (TicToc)

Do you like this sort of Stuff? Please lend me your support on Patreon. It would mean a great deal to me. And if you know anyone looking for a simple book that uses lots of pictures and lots of examples to explain the cloud, then please recommend my new book: Explain the Cloud Like I'm 10. They'll love you even more.

$40 million: Netflix monthly spend on cloud services; 5%: retention increase can increase profits 25%; 50+%: Facebook's IPv6 traffic from the U.S, for mobile it’s over 75 percent; 1 billion: monthly Facebook, err, Instagram users; 409 million: websites use NGINX; 847 Tbps: global average IP traffic in 2021; 200 million: Netflix subscribers by 2020; $30bn: market for artificial-intelligence chips by 2022;
Quotable Quotes:
- @evacide: Just yelled “Encryption of data in transit is not the same as encryption of data at rest!” at a journalist on the car radio before slamming it off. I am a hit at parties.
- Drako: I deal with the security industry, where more than 90% of the security cameras are manufactured in China. The chips in those cameras used to be made in a lots of different places. They’ve since migrated to China, and a lot of the government customers I engage with are unwilling in any way, shape, or form to deploy those cameras. They have a huge problem sourcing cameras that are not based on those chips. There is a lot of concern about the Trojans in chips and Trojans near the chips. It’s the first situation I’ve encountered where the customer is honestly concerned about this.
- Memory Guy: But there are even more compelling reasons for certain applications to convert from today’s leading technologies (like NAND flash, DRAM, NOR flash, SRAM, and EEPROM) to one of these new technologies, and that is the fact that the newer technologies all provide considerable energy savings in computing environments...Something consistent about all of them is that they are nonvolatile, so they don’t need to be refreshed like DRAM, they use faster and lower-energy write mechanisms than either NAND or NOR flash, and their memory cells can be shrunk smaller than current memory technologies’ scaling limits, which means that they should eventually be priced lower than today’s memory chips.
- Charlie Demerjian: what does Intel have planned for their server roadmap? Why is it causing such consternation among OEMs, ODMs, and major customers? For the same reason the 14/10nm messaging is causing consternation among investors, but the server side is in much worse shape. How bad is it? Three major roadmap updates in 29 days with serious spec changes, and it got worse from there.
- Xtracerx: for me the biggest value to serverless functions is how nicely they tie in to the ecosystem of a cloud provider. using them to respond to storage events on s3 or database events or auth events is super easy and powerful.
- Daniel Lemire: So Roaring bitmaps can be thousands of times faster than a native JavaScript Set. they can can be two orders of magnitude faster than FastBitSet.js. That Roaring bitmaps could beat FastBitSet.js is impressive: I wrote FastBitSet.js and it is fast!
- Bryan William Jones: Cool. "But it turns out that the Orbiters' photos were actually super-high-rez, shot on 70mm film and robotically developed inside the orbiters, with the negs raster-scanned at 200 lines/mm and transmitted to ground stations using... lossless analog image-compression technology."
- Bcrescimanno: I worked at Netflix when OpenConnect was introduced and I don't remember anyone internally thinking it was unnecessary (though, even at the time, the company was large enough that you didn't know everyone). Quite contrary, this was the era of the 250GB / month cap from Comcast and we could observe clearly that they were throttling Netflix traffic. OpenConnect, the ability to deploy the CDN directly into the internal network of these ISPs served multiple purposes--not the least of which was to expose the fact that they were holding Netflix for ransom. So, to say that was the executive foreseeing things is a bit of revisionist history. It doesn't lessen the impact or importance of OpenConnect; but, it grew out of a very real impasse with a very large ISP. Ultimately, Netflix did end up paying Comcast in 2014 and, surprise surprise, the throttling stopped.
- Eitally: there are a few critical differences between GCP and AWS or Azure. Setting aside the network quality & performance, which is objectively superior with Google, outside of GCE almost every other GCP product is offered as a managed service. Beyond that, there are several -- like Spanner -- that don't exist anywhere else. I fully appreciate a desire to avoid vendor lock-in, but there are plenty of situations where allying with a vendor that offers a superior product/service for your specific business need is absolutely the correct decision.
- Nefaspartim: That's a popular statement these days, "x is old and needs replacing". I've heard that about SQL, CDNs, Python, cut-through switching, BGP... mostly from folks who have only been in the industry a few years. I appreciate anyone that gets into tech because they want to make things better, but displacing good, stable technologies "just because they're old" isn't the right mentality.
- Pavel Tiunov: Redshift is cheaper as soon as you don’t leverage multi-stage querying and don’t care a lot about SQL query structure. BigQuery can be much more cost effective if you structure your data warehouse querying very well and split it into stages.
- Ukev: The interesting thing about serverless for me is that although it's compelling to run services without needing to "manage servers", other tooling has largely closed that gap. When I'm scripting a system deploy to AWS (for example), provisioning and deploying a Docker image to ECS, or even an AMI via Packer is a roughly equivalent amount of development work to configuring a Lambda. And, usually a container or EC2 instance will have much better performance and cost characteristics than serverless. The exception is services that can benefit from a "scale to zero" capacity, which is great for personal projects but doesn't fit very many production apps.
- Scurvy: +1 for BGP/OSPF-based VIPs in a layer 3 network. It's 2018. No reason why you can't easily run a layer 3 network with rich routing protocols. Super easy. You don't need application level "discovery" when the IP never changes; the network will tell you where it currently lives. I'll take a layer 3 network over raft/consul any day and all night.
- Jon Brodkin: Comcast disabled throttling system, proving data cap is just a money grab
- Crabbone: I worked on a big (100+ K loc) project written in Go for about two years. In the end, I think, that the company, which started it has made a mistake when choosing the language. Mostly not because of some language features, but because some programmers wouldn't use Go no matter what, but would agree to use Python. So, at some point a chunk of company's infrastructure was written in Python, and another, larger part was written in Go, but they wouldn't play well with each other.
- Stuart Kelly: The cold answer in these novels is: to any extent. When I interviewed Banks, just before his untimely death in 2013 at the age of 59, we talked about why the Culture did not sublime like other species. He was adamantine: the Culture would stay until everything else in the universe was like them. Not exactly utopian, not exactly anarchist.
- Threemanycats: Question: how many of you out there are using these full-fledged web servers for new projects? With the popularity of things like Node or .NET Core (with it's own lightweight web server like Kestrel) it seems like the popular thing to do is have all your web server related work done in application middleware, that or in a cloud hosted API gateway. Gone are the days of a 'heavy' web server.
- TheWix: As an aside this is probably the biggest issue with "REST". That and people thinking an HTTP-based API with pretty URIs is REST. It is a pain in the ass creating a full-fledged REST API without any standard of how to represent things like links or what HTTP Verbs can be run against a resources (do I make an OPTIONS call every time??). We have so many competing specs like HAL, JSON-LD, Hydra, etc. None of which are standard (closest being JSON-LD). So, there are very, very few maintained libraries. If you want to go the REST route you either roll your own client AND server libraries OR you don't do Hypermedia which is where the real power of REST lies.
- Danabramov: To provide some extra context on this: at FB, we can't ship any RN update (or really, any RN commit) without updating our own apps for it. No product teams at FB are going to agree to rewrite their code just because an infrastructure team came up with a new way to do something. The reason updates are easier at FB mostly has to do with atomicity of commits. Because FB uses RN from master (and in practice all code lives in the same monorepo), codemods can be applied to products together with the corresponding infrastructure changes. Since the upgrades have a commit granularity instead of the monthly stable releases we cut in open source, there are no big delays between a regression being introduced and fixed for FB products. This discrepancy is unfortunate, but I don’t really see a way around it for an actively developed library--which might be your point. Undoubtedly RN is in active development, and being a moving target, it's easier for FB teams to “follow” it. Still, large backwards-incompatible changes are just as infeasible for us as for everybody else without either an automated codemod or an opt-in strategy.
- Guru Santiago: I build and design products in the US and I have recently found that parts that were easy to purchase are either unavailable or are now being bought by bigger companies that have more buying power. I can afford to purchase thousands of parts in hopes that I can continue building my products. Now the tariffs will make it even harder to stay in business. The only way I can overcome this is to push all my manufacturing to china to avoid the additional 25%. I don’t think that this is the way to make America Great Again.
- Nick Halme: Take World of Warcraft's much larger sharding, in the low thousands, with guild size being about 1,000. If WoW is doing it all wrong, then either it has succeeded despite this poor social engineering, or this kind of size is either fine or better for player's group-forming capabilities. Another interesting thought is that, despite there not being a persistent world in place, briefly meeting strangers from some other part of the world is a fundamental part of most smaller session-based games, such as FPS games or MOBA's. The playerbase is nearly entirely based on this brief, strange contact, and it could be argued that people strive to attain some kind of regularity by (sometimes) hovering around a favourite server, where they are able to build a rapport with other regulars on that server.
- David Rosenthal: Thus, as we see with Bitcoin, if the business of running P2 [cryptocurrency-based peer-to-peer storage service] peers becomes profitable, the network will become centralized. But that's not the worst of it. Suppose P2 storage became profitable and started to take business from S3. Amazon's slow AI has an obvious response, it can run P2 peers for itself on the same infrastructure as it runs S3. With its vast economies of scale and extremely low cost of capital, P2-on-S3 would easily capture the bulk of the P2 market. It isn't just that, if successful, the P2 network would become centralized, it is that it would become centralized at Amazon!
The tax man cometh and the platonic ideal of a perfectly friction free internet may never be the same. Or perhaps the murder of net neutrality already accomplished that? Supreme Court Clears Way for Sales Taxes on Internet Merchants. What has brought the internet low into the grime of an imperfect and transitory real world? This: "Internet retailers can be required to collect sales taxes in states where they have no physical presence." It's interesting to note all this was started by a company whose business is managing sales tax compliance for companies. Not everyone’s incentives are aligned. Implications? Since rules are different for every state you’ll need to know the rules and potentially file a sales tax return in all 50 states. It does not matter if the state you operate in has sales tax or not. If something is subject to sales tax it also could be subject to income tax so a return for each state might be needed. States could increase sales taxes to take advantage of this new income stream.
Facebook released videos for Networking @Scale 2018 recap. Topics include: A Learning Platform for Network Intent Graph Modeling; Scaling the Facebook Backbone Through Zero Touch Provisioning; Optics Scaling Challenges; oad-Balancing at Hyperscale; Edge Fabric: Steering Oceans of Content to the World.
How do you deliver over 10 trillion messages to millions of subscribers at a rate of 200K messages per channel? How Pusher Channels has delivered 10,000,000,000,000 messages.
- Clients subscribe to channels like btc-usd or private-user-jim, and then other clients publish messages to those channels.
- Channels employs three time-honored techniques to deliver these messages at low latency: fan-out, sharding, and load balancing. Together, sharding, fan-out and load balancing ensure that the Channels system has no single central component. This property is key to the horizontal scalability which enables it to send millions of messages per second.
- Subscribers are shared by around 170 large edge servers, each of which holds around 20,000 connections to subscribers. Each edge server remembers the channels that its clients are interested in, and subscribes to these channels in a central service, Redis
- With fan-out alone, there is still a central Redis component which all publishes go through. Such centralization would limit the number of publishes per second. To get past this limit, the central Redis service is made up of many Redis shards.
Few things are quite as satisfying as watching packets flow over a network. Peeling back the layers by peaking deep inside a packet is the best way to learn how a protocol stack really works. Learn how in How I use Wireshark.
When I saw React Native at Airbnb show up on Airbnb’s engineering blog I knew it would make waves. When Airbnb says it’s dropping the bright shiny new thing—React Native—for the bright shiny old thing—native iOS and Android development—it’s big news. Wading through the oddly effusive praise for React Native, it seems it was just too hard for a large team (100 mobile devs) to mix RN with native apps, so “we will be sunsetting React Native and putting all of our efforts into making native amazing.”
- Airbnb still wants to write code just once, it’s the means that will change: Several teams have experimented with and started to unify around powerful server-driven rendering frameworks. With these frameworks, the server sends data to the device describing the components to render, the screen configuration, and the actions that can occur. Each mobile platform then interprets this data and renders native screens or even entire flows using DLS components.
- Facebook took note and plans on making RN easier. State of React Native 2018: We're working on a large-scale rearchitecture of React Native to make the framework more flexible and integrate better with native infrastructure in hybrid JavaScript/native apps.
- Also, Supporting React Native at Pinterest and at Coursera.
Speeding up our Webhooks System 60x. Good example of how a system evolves. An approach worked and then as the system scaled it didn’t. A transaction log was used to store webhooks. MySQL stored the log and eventually became the bottleneck. They moved to AWS Kinesis using a phased rollout strategy. Result: the old system would very often take more than a minute just to process a transaction and send it. Our P90 latency was over 90 seconds. With the new system, our P90 latency sits at around 1.5 secs – that’s a 60x improvement!
Speaking of transaction logs, some great detail in MySQL 8.0: New Lock free, scalable WAL design: The new WAL design provides higher concurrency when updating data and a very small (read negligible) synchronization overhead between user threads!
Math can tell a story. Telling Stories About Little's Law: The law says that the mean concurrency in the system (𝐿) is equal to the mean rate at which requests arrive (λ) multiplied by the mean time that each request spends in the system (𝑊)...Telling stories about our systems, for all its potential imprecision, is a powerful way to build and communicate intuition...The system was ticking along nicely, then just after midnight a spike of requests from arrived from a flash sale. This caused latency to increase because of increased lock contention on the database, which in turn caused 10% of client calls to time-out and be retried. A bug in backoff in our client meant that this increased call rate to 10x the normal for this time of day, further increasing contention. And so on...Each step in the story evolves by understanding the relationship between latency, concurrency and arrival rate. The start of the story is almost always some triggering event that increases latency or arrival rate, and the end is some action or change that breaks the cycle. Each step in the story offers an opportunity to identify something to make the system more robust. Can we reduce the increase in 𝑊 when λ increases? Can we reduce the increase in λ when 𝑊 exceeds a certain bound? Can we break the cycle without manual action?
Cool History of Kubernetes on a Timeline: To tell the story of how Kubernetes evolved from an internal container orchestration solution at Google to the tool that we know today, we dug into the history of it, collected the significant milestones & visualized them on an interactive timeline.
Pulumi is a new entrant into the multi-language and multi-cloud development platform space. Pulumi’s CEO is Joe Duffy. You might remember Joe from a fascinating series of articles he wrote on Midori, detailing the advanced platform work he did at Microsoft. In that series he said, “My biggest regret is that we didn’t OSS [Midori] from the start, where the meritocracy of the Internet could judge its pieces appropriately.” With Pulumi it looks like he’s making good on that sentiment. You can read the pitch at Hello, Pulumi! The TL;DR: “with Pulumi, 38 pages of manual instructions become 38 lines of code. 25,000 lines of YAML configuration becomes 500 lines in a real programming language.” Cleary it’s a infrastructure-as-code model, only it’s cross cloud. The cloud model is based on CoLaDa, or containers, lambda, and data. Pulumi wants to be the glue by providing a consistent programming model and set of management practices based on code. As with all attempts at providing a higher level model, The Law of Leaky Abstractions applies. You get ease of use until you splat flat against the abraction wall.
- Lindydonna: (I'm a product manager at Pulumi.) Pulumi lets you describe cloud resources using code instead of a config language. It's not like Heroku, it's more like a deployment tool (e.g. Serverless Framework, Terraform, Claudia.js, Chalice, etc). The difference compared to other deployment tools is that you use regular code, but it's turned into a declarative plan when you run `pulumi update`. So, you get the benefits of a regular programming language, while still following best practices of immutable infrastructure.
- aChrisSmith: Each Pulumi program is ran within the context of "a stack". The stack is essentially a collection of cloud resources. So when the Pulumi program runs, it will create resources that aren't in the stack, or update existing ones. So if you create any resources during dev/testing, you just need to `pulumi destroy` those stacks and all of the cloud resources will be reclaimed. This, IMHO, is one of Pulumi's best features. In that it makes it super-easy to create your own instance of a cloud application. For example, I have my own development instance of app.pulumi.com by just creating my own Pulumi stack and rerunning the same application.
- Joeduffy: The major difference is that Pulumi does immutable infrastructure. The code describes a goal state and our system manages your environment to ensure that it matches the goal state. This means you can preview changes before making them, and that, if you do make them, you've got a full audit trail of who changed what and when. Rollbacks are trivial because you just point us at an old goal state worked and we can chew on it until your live environment matches. We can even detect drift, so if someone manually updates in the cloud console, we can tell you about it. As a result, we always have a full object graph of your cloud resources, and can tie it back to the source code which created it, opening up some unique diagnostics capabilities. The difference with scripting libraries, like AWS's Boto, or the Azure PowerShell SDK, is that they mutate cloud resources directly, and in an ad-hoc manner. So, you don't know what they are going to do before you run them. And in the event of failure, you're more likely to be in a corrupt state and unable to recover. Rollbacks are difficult. There's also no audit trail beyond the cloud access logs, and information like dependencies are lost, so resources end up disassociated from the code and workflow that created or updated them. Many people encounter these problems and need to build complex systems on top to address them. Or they end up using a solution like CloudFormation or Azure Resource Manager, which eschews code in favor of JSON/YAML/templates.
- Also, Cloud Agnostic Architecture is a Myth.
So, you like all that event driven automation? How about when it turns its great unblinking eye on you? Awesome story. The Machine Fired Me: Once the order for employee termination is put in, the system takes over. All the necessary orders are sent automatically and each order completion triggers another order. For example, when the order for disabling my key card is sent, there is no way of it to be re-enabled. Once it is disabled, an email is sent to security about recently dismissed employees. Scanning the key card is a red flag. The order to disable my Windows account is also sent. There is also one for my JIRA account. And on and on. There is no way to stop the multi-day long process. I had to be rehired as a new employee. Meaning I had to fill up paperwork, set up direct deposit, wait for Fedex to ship a new key card. Also, Goodbye, Mr. Chips.
When Androids turn to crime they’ll be flashed as punishment. "Automata" Episode 2 - DUST Exclusive Premiere.
There are different kinds of growth. Facebook is driven by networks effects. Adding a new user increases the value of the service for everyone. Nobody goes to Facebook for content. Each new subscriber doesn’t increase the value of Netflix to other users, unless you count some marginal improvement by additional data capture. Inside the Binge Factory: Netflix operates by a simple logic, long understood by such tech behemoths as Facebook and Amazon: Growth begets more growth begets more growth. When Netflix adds more content, it lures new subscribers and gets existing ones to watch more hours of Netflix. As they spend more time watching, the company can collect more data on their viewing habits, allowing it to refine its bets about future programming...There’s no such thing as a ‘Netflix show.’ That as a mind-set gets people narrowed. Our brand is personalization...Netflix’s data allows it to be vastly more precise, giving it an enormous competitive advantage.
Once upon a time I wanted to shoot some hoops. Finding local courts is always spotty. So I thought: why not search the entire world for basketball courts using image classification on satellite images? A very modern way to solve the problem. It quickly became obvious asking a local would be easier. Now? Maybe not. mapbox/robosat (article): an end-to-end pipeline written in Python 3 for feature extraction from aerial and satellite imagery. Features can be anything visually distinguishable in the imagery for example: buildings, parking lots, roads, or cars.
Want to know how DNS works? Anatomy of a Linux DNS Lookup – Part 1, Part II. Also, Building a DNS server in Rust.
Why did SmugMug buy Flickr? It solves their community problem. SmugMug is a paid service. Building a community on a paid service is hard. You don't get network effects. With large vibrant communities, Flickr solves the community problem for SmugMug. Triangulation 351: SmugMug and Flickr.
Does your choice of programming language change how you think about solving problems? Most likely. How does our language shape the way we think?: I have described how languages shape the way we think about space, time, colors, and objects. Other studies have found effects of language on how people construe events, reason about causality, keep track of number, understand material substance, perceive and experience emotion, reason about other people's minds, choose to take risks, and even in the way they choose professions and spouses. Taken together, these results show that linguistic processes are pervasive in most fundamental domains of thought, unconsciously shaping us from the nuts and bolts of cognition and perception to our loftiest abstract notions and major life decisions. Language is central to our experience of being human, and the languages we speak profoundly shape the way we think, the way we see the world, the way we live our lives.
Games in the cloud? Behind the scenes with the Dragon Ball Legends GCP backend. Sure, you choose GCP for scalability, reliability, Spanner, BigQuery, Cloud Pub/Sub, and Cloud Dataflow. You might not have considered the network: One of the main reasons BNE decided to use GCP for the Dragon Ball Legends backend was the Google dedicated network. For two players from two different continents to communicate through Google’s dedicated network, players first try to communicate through P2P, and if that fails, they failover to an open-source implementation of STUN/TURN Server called coturn, which acts as a relay between the two players. That way, cross-continent battles leverage the low-latency and reliable Google network as much as possible. Also, Google Production Environment.
What new gaming experiences can you design with 5g in mind? How 5G will transform mobile gaming and marketing: Higher download speed enables superior on-demand mobile gaming; Enhanced reactions driven by ‘zero’ latency; Numerous connections per square meter; Better accuracy in positioning; Immersive in-game experience; More profitable business. Also, 5G Standard Finalized in a Major Step Toward Commercialization.
Inter-thread coms is all about the threads, oh, and the locks, and don't forget the queues. Adventures with Memory Barriers and Seastar on Linux: This article shows that by directing all inter-core communications into a single path, that path can undergo extreme optimization...a relatively new Linux system call membarrier(), recently gained an even newer option called MEMBARRIER_CMD_PRIVATE_EXPEDITED. The system call does exactly what you would guess (and if you didn’t, have a look here): it causes a memory barrier to be issued on all cores that are running threads that belong to this process.
GitHub’s MySQL HA solution: orchestrator to run detection and failovers. We use a cross-DC orchestrator/raft setup as depicted below. Hashicorp’s Consul for service discovery. GLB/HAProxy as a proxy layer between clients and writer nodes. anycast for network routing. The new setup removes VIP and DNS changes altogether. Results: Reliable failure detection, Data center agnostic failovers, Typically lossless failovers, Data center network isolation support, Split-brain mitigation (more in the works), No cooperation dependency, Between 10 and 13 seconds of total outage time in most cases.
Moving more than 500 containers to k8s has problems. Scaling Kubernetes for 25M users. You should: increase the per node container limit; set your CPU and memory requests as soon as possible; isolate critical pods using node affinities; set up instrumentation; review the rolling-update strategy.
OK, but isn’t the problem using the wrong language for the wrong job? Strings Are Evil. Reducing memory allocations from 7.5GB to 32KB.
Dlib.net (article): modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real world problems.
NationalSecurityAgency/lemongraph: a log-based transactional graph (nodes/edges/properties) database engine that is backed by a single file. The primary use case is to support streaming seed set expansion. The core of the graph library is written in C, and the Python (2.x) layer adds friendly bindings, a query language, and a REST service. LemonGraph rides on top of (and inherits a lot of awesome from) Symas LMDB - a transactional key/value store that the OpenLDAP project developed to replace BerkeleyDB.
dropbox/divans: The divANS crate is meant to be used for generic data compression. The algorithm has been tuned to significantly favor gains in compression ratio over performance, operating at line speeds of 150 Mbit/s.
New Memories for Efficient Computing: This paper highlights the limitations that some of these emerging memory technologies face to scale to the most advanced process nodes while preserving compelling performance at affordable manufacturing cost. Memory companies and semiconductor foundries are working in close collaboration to co-develop, and ramp embedded memory to mass production. Using standard CMOS materials and simple manufacturing processing steps and tools provides the highest chance to succeed in this competitive market.
Galaxy formation efficiency and the multiverse explanation of the cosmological constant with EAGLE simulations. BackRe Action: In summary, the paper finds that the multiverse hypothesis isn’t falsifiable. If you paid any attention to the multiverse debate, that’s hardly surprising, but it is interesting to see astrophysicists attempting to squeeze some science out of it.
Comparing Languages for Engineering Server Software: Erlang, Go, and Scala with Akka: This paper investigates 12 highly concurrent programming languages suitable for engineering servers, and analyses three representative languages in detail: Erlang, Go, and Scala with Akka. We have designed three server benchmarks that analyse key performance characteristics of the languages. The benchmark results suggest that where minimising message latency is crucial, Go and Erlang are best; that Scala with Akka is capable of supporting the largest number of dormant processes; that for servers that frequently spawn processes Erlang and Go minimise creation time; and that for constantly communicating processes Go provides the best throughput.
Remember, there’s never a bad dog, only bad owners. Eamonn O'Brien-Strain - Serverless Gone Bad.
This would be more exciting if there wasn’t the distinct possibility this kind of information could be used to avoid paying for patients with the greatest chance of dying. Scalable and accurate deep learning with electronic health records: For predicting inpatient mortality, the area under the receiver operating characteristic curve (AUROC) at 24 h after admission was 0.95 (95% CI 0.94–0.96) for Hospital A and 0.93 (95% CI 0.92–0.94) for Hospital B. This was significantly more accurate than the traditional predictive model, the augmented Early Warning Score (aEWS) which was a 28-factor logistic regression model (AUROC 0.85 (95% CI 0.81–0.89) for Hospital A and 0.86 (95% CI 0.83–0.88) for Hospital B) (Table 2). If a clinical team had to investigate patients predicted to be at high risk of dying, the rate of false alerts at each point in time was roughly halved by our model: at 24 h, the work-up-to-detection ratio of our model compared to the aEWS was 7.4 vs 14.3 (Hospital A) and 8.0 vs 15.4 (Hospital B). Moreover, the deep learning model achieved higher discrimination at every prediction time-point compared to the baseline models. The deep learning model attained a similar level of accuracy at 24–48 h earlier than the traditional models (Fig. 2).
ServiceFabric: a distributed platform for building microservices in the cloud: ServiceFabric (SF) enables application lifecycle management of scalable and reliable applications composed of microservices running at very high density on a shared pool of machines, from development to deployment to management. SF runs in multiple clusters each with 100s to many 100s of machines, totalling over 160K machines with over 2.5M cores.

Stuff The Internet Says On Scalability For June 22nd, 2018

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale