advertise
« How is software developed at Amazon? | Main | Give Meaning to 100 Billion Events a Day — The Shift to Redshift »
Friday
Mar012019

Stuff The Internet Says On Scalability For March 1st, 2019

Wake up! It's HighScalability time:

 

10 years of AWS architecture increasing simplicity or increasing complexity? (Michael Wittig)

 

Do you like this sort of Stuff? I'd greatly appreciate your support on Patreon. Know anyone who needs cloud? I wrote Explain the Cloud Like I'm 10 just for them. It has 39 mostly 5 star reviews. They'll learn a lot and love you forever.

 

  • 1.3 billion: npm package downloads per day; 20: honeybee communication signals used to coordinate thousands of workers; 71: average global life expectancy; 120K: max inflight SQS messages; 80%: shared code between iOS, Android, the web; 1 TB: microSD card; 20%: increase in value wind energy using ML; 64%: respondents cite optimizing cloud spend as the topvinitiative; 250: drones augmenting small military units; 35,880: record robots shipped to North American companies; 50K: aerial photos of the UK; 119%: increase in demand for AI talent; 18TB: MAMR hard drive; $20 million: Pinterest paid more than expected for AWS; 100,000: MySQL connections; 19%: all requests come from Bots, APIs, and search engine crawlers; 

  • Quotable Quotes:
    • @evazhengll: A surgeon in #China performed world’s 1st remote operation using '#5G Surgery' on animal, removing its liver, through controlling robotic arms in a location 30 miles away. It was made possible by using a low latency of 0.1 seconds, the lower the latency, the more responsive the robot
    • @AWSonAir[email protected] uses Amazon ECS to scale to support 20,000 orders per second. #AWSSummit
    • @antoniogm: Know why the European startup scene sucks? Because American startups have a huge, high-GDP, early-adopter market from day one, and they internationalize AFTER scaling. Euros have to internationalize IN ORDER TO scale, and most die in the process. GDPR makes this *worse*.
    • Ivan Ivanitskiy: Even though blockchain does not allow for modification of data, it cannot ensure such data is correct.
    • @kelseyhightower: Kubernetes is for people building platforms. If you are a developer building your own platform (AppEngine, Cloud Foundry, or Heroku clone), then Kubernetes is for you.
    • @adrianco: I think the main thing cloud native apps do that datacenter apps don’t do is scale elastically (even down to zero in some cases) and maintain high utilization, so you stop paying when you stop using the resource.
    • @kellabyte: Also almost every mention of SEDA  is incorrect IMO. If you read the paper the goal of the paper was to dynamically adjust CPU resources by *CHAINED* queues where thread pools can move threads between stages so that stages who needed more compute time got more threads.
    • Michael Levin: There's a section of my group that works on synthetic morphology. We have some pretty amazing new data on the ability of cells to learn to cooperate in completely different configurations. You can make artificial living machines that are nothing like the animal from which they came. They're able to learn to cooperate through some combination of emergence and guided self assembly. 
    • Ted Kaminski: Developers seem to routinely just stick queues in the system, and then don’t take them seriously. Queues should be treated as databases by default.
    • Netscout: In the second half of 2018, we saw threat actors building crimeware that’s cheaper and easier to deploy—and more persistent once installed. At the same time, many groups applied business best practices that further extend the reach of attacks, while making it even easier for customers to access and leverage malicious software and DDoS attack tools.
    • Fred Brooks: there is no single development, in either technology or management technique, which by itself promises even one order of magnitude improvement within a decade in productivity, in reliability, in simplicity.
    • JPL: The rover experienced a one-time computer reset but has operated normally ever since, which is a good sign. We're currently working to take a snapshot of its memory to better understand what might have happened.
    • @ben11kehoe: #Kubernetes is a useful step forward for many, many organizations that have existing traditional architecture in place. But many people confuse k8s reducing friction with #k8s being *the answer* when, for greenfield applications, #serverless should be the default
    • Carlos Macián~ memory has stopped being a commodity in ASIC design in particular in the new architectures that are emerging with AI. There's a need to change quite drastically the Von Neumann architecture that has been followed to date. You want to optimize the interconnect between memory and the processing engines to minimize the power dissipated through data movement. That's one possibility. The other possibility is to connect processing and storage as tightly as you can. Because at the end of the day this is exactly what you are going to be doing—processing data, storing data, processing data, storing data—the closer you can tie them together architecturally the more efficient you will be. 
    • Bank Eakasit: Upgrading from HTTP/1.1 with encryption to HTTP/2 with encryption will reduce both CPU usage and bandwidth of your server.
    • @antirez: There is more in the blog post, and is that IMHO the right way to full utilize the machine is too coordinate multiple share nothing instances. So the goal is to improve such coordination. You can disagree but please report the whole idea not the half you choose to.
    • @EwanToo: The number of times I see people compare API Gateway and Lambda to a single EC2 or cheap VM host, as if the availability, security, and operational capability of the 2 are remotely comparable... Yes, the $10 a month VM will be cheaper than the near-infinitely scaling mainframe!
    • NewStack: If you can physically say where a thing is that's the edge. You can't say that with the cloud.
    • @KarlHabegger: Profound quote from the CEO of CloudZero: "Serverless creates a clear path towards mapping revenue generating activities to infrastructure spending." #FinDevOps #ServerlessDays #ATX
    • Peter: isn’t it a bit bleak to suggest that we can’t trust entities that aren’t subject to the death penalty, imprisonment, or other punitive sanctions? Aren’t there other grounds for morality? Call me Pollyanna, but I like to think of future conscious AIs proving irrefutably for themselves that virtue is its own reward.
    • @juliaferraioli: We already know that measuring value with LoC (lines of code) is a terrible idea. Using number of commits instead is not any better.
    • Anastasiya Rashevskaya: Mix 3D, 2D and AR. We expect 3D elements to continue their way forward in game art. They are widely applied in different genres to make the gaming environment as realistic as possible. Combined with 2D and AR tools, it can have even a bigger impact on a recipient and evoke a variety of different emotions: starting from anger and ending up with absolute happiness. Emotional bond is essential to higher player involvement in the game world.
    • Jessica Klein: The blacklist, says Stokes, “creates these really incredible expectations in the community that [it] can somehow protect people’s property.” In reality, one top 21 block producer failing to correctly configure the blacklist would make the entire network vulnerable to bad accounts. And that’s exactly what happened with the 2.09 million EOS.
    • King Link: If somehow a game can’t be profitable at 60, raise the price, or make better decisions. The fact that I hear companies NEED more money isn’t a problem, but switching to microtransactions hurts the product, even if it makes the product more profitable.
    • @JefClaes: Cloud billing is much more like receiving the bill during an operation. While you’re under anesthesia.
    • Eric Friedman: I've been looking for opportunities to move logic out of these derived classes into components; as it greatly simplifies the ability to create new types of objects. If we don't use components then, ss referenced in the article, if we wanted to create a new object, we'd either have to use a Minion or create a new class to represent that object. Using components easily allows you to build the functionality of a new object using reusable blocks of code.
    • @ShortJared: My thinking on serverless these days in order of consideration. - If the platform has it, use it - If the market has it, buy it - If you can reconsider requirements, do it - If you have to build it, own it
    • Pavel Tiunov:  Unoptimized 100M rows test dataset ran by Serverless MySQL Aurora can be queried in 176 seconds. A query time of less than 200ms is achievable for the same dataset using multi-stage querying approach.
    • Ethan Banks: I used to think companies would make decisions based on which technology served them the best. Now I think companies make decisions based on whether their boss will sign off without a lot of discussion.
    • Linus Torvalds: Some people think that "the cloud" means that the instruction set doesn't matter. Develop at home, deploy in the cloud. That's bullsh*t. If you develop on x86, then you're going to want to deploy on x86, because you'll be able to run what you test "at home" (and by "at home" I don't mean literally in your home, but in your work environment)...Guys, do you really not understand why x86 took over the server market? It wasn't just all price. It was literally this "develop at home" issue. Thousands of small companies ended up having random small internal workloads where it was easy to just get a random whitebox PC and run some silly small thing on it yourself. Then as the workload expanded, it became a "real server". And then once that thing expanded, suddenly it made a whole lot of sense to let somebody else manage the hardware and hosting, and the cloud took over...Without a development platform, ARM in the server space is never going to make it. Trying to sell a 64-bit "hyperscaling" model is idiotic, when you don't have customers and you don't have workloads because you never sold the small cheap box that got the whole market started in the first place.
    • antirez: It's extremely hard to agree with Linus on that. One problem in his argument is that he believes that everybody has a kernel hacker mindset: most today's developers don't care about environment reproducibility at architecture level. The second problem is that he believes that every kind of development is as platform sensitive as kernel hacking, and he even makes the example of Perl scripts. The reality is that one year ago I started the effort to support ARM as a primary architecture for Redis, and all I had to do is to fix the unaligned accesses, that are anyway fixed in ARM64 almost entirely, and almost fixed also in ARM >= v7 if I remember correctly, but for a subset of instructions (double words loads/stores). Other than that, Redis, that happens to be a low level piece of code, just worked on ARM, with all the tests passing and no stability problems at all. 
    • @davecheney: Quite frankly I’m not sold on microservices. Sure, nobody should be writing a monolithic application these days but I’m pretty sure people missed the point of Bezos’ APIs or GTFO memo and distracted themselves by decomposing things ad nausum.
    • wgjordan: Aurora splits out 'database' nodes (the server instances you provision and pay for) from 'storage' nodes (a 'multi-tenant scale-out storage service' that automatically performs massively-parallel disk I/O in the background). Instead of MySQL writing various data to tablespaces, redo log, double-write buffer, and binary log, Aurora sends only the redo-log over the network to the storage service (in parallel to 6 nodes/3 AZs for durability). No need for extra tablespace, double-write buffer, binary-log writes, or extra storage-layer mirroring, since durability is guaranteed as soon as a quorum of storage nodes receives the redo-log. The reduced write amplification results in 7.7x fewer network IOs per transaction at the 'database' layer for Aurora (vs standard MySQL running on EBS networked storage, in the benchmark described in the paper), and 46x fewer disk IOs at the 'storage' layer [1]
    • HaProxy: As expected, in all cases, relying on an external, central load balancer is better in environments where there is moderate or high contention for system resources. This is quickly followed by the Least Connections algorithm, then by Power of Two. Finally, Round Robin, which is not very good in this case, and the Random algorithm, which creates significant peaks and has a distribution that likely follows a long tail, round out the set.
    • altotrees: Stories like these are the ones that inspire me the most. People using technology in a way that helps them in their everyday lives, or in a hobbyist fashion. I feel like many of the things I pick-up at work are out of career necessity: the latest framework, bits of an up and coming language that is going to be "the future", semi fluency in a stack so I can aid in a project,I enjoy it, but I don't enjoy it like in the same way I once did.
    • Brent Ozar: But can an 80-core server write any faster than a 16-core server? Well…no. You can spend five times more, but you hit a wall at 16 cores. The takeaway: bottlenecked on log IO Right now, over 16 cores is a waste of money.
    • Theofilos Petsios: Our experiments indicate that Ubuntu 18.04 shows the largest adoption of OS and application-level mitigations, followed by Debian 9. On the other hand, OpenSUSE 12.4,  CentOS 7 and RHEL 7 also deploy common hardening schemes, and show wider adoption stack-clash mitigations while shipping a much more tight-knit set of packages by default.
    • @JoeEmison: And for the record, the largest general state problem at scale isn’t small amounts of latency (which we all have to accept at scale), but intermittent failure to write and read state.
    • @QuinnyPig: "EC2 and RDS are the two offerings in our stack where you have to do most of the operations yourself" says a man who's apparently never had to work with CloudFormation.
    • @kendo451: I met a guy at a social gathering today who works for the SEC.  His job is archiving crypto-currency/ICO websites in a form that is admissible as evidence in court.  The wheels turn slowly, but it's clear they've eventually going to work the entire list.
    • Tom Abate: The prototype [entire computer onto a single chip] is built around a new data storage technology called RRAM (resistive random access memory), which has features essential for this new class of chips: storage density to pack more data into less space than other forms of memory; energy efficiency that won’t overtax limited power supplies; and the ability to retain data when the chip hibernates, as it is designed to do as an energy-saving tactic.
    • @zander: Our fundamental model of computation is not actually how complex systems compute anymore.  This is why we are seeing whole new classes of attacks. - @rodneyabrooks @longnow
    • @_mattburman: personal site hosting evolution 2015: static site on GitHub pages 2016: site hosted on own cloud VM 2017: added cdn 2018: new ssr'd site w/ added automated deployment workflow 2019: site on cloud k8s cluster managed with terraform
    • @sadisticsystems: Now that I'm tuning sled for 96-core machines with multiple sockets & domains there really isn't a choice. I'm a big fan of lock-free within a socket, but cross-socket/domain you really need to be shared-nothing. sharding is seriously the only way forward
    • @JoeEmison: This is so very true. GraphQL means that your API data model can be the same as the data model in your front end and back end (can’t use RDBMS though). And it makes software so much more maintainable and changeable.
    • Scott Goering: What’s in: Consumer experience over brand experience. All brands, whether they’re delivering packages to your door or providing your morning coffee, are rethinking the way customers are engaged both in-store and online. What’s out: Optimization. The past decade has brought many technology advances that allowed corporate-IT departments to drive costs out of delivering service to the business. What’s back: Data. OK, data didn’t actually go anywhere—it’s just still really important. 
    • Tuna-Fish: No. The paper notes that Spectre can, and will in the future be able to defeat all programming language level techniques of isolation. With properly designed OoO, Spectre cannot defeat process isolation. The fundamental lesson that everyone must take to heart is that in the future, any code running on a system, including things like very high-level languages running on interpreters, always have full read access to the address space of the process they run in. Process isolation is the most granular security boundary that actually works. Or in other words, running javascript interpreters in the same address space as where you manage crypto is not something that can be done. Running code from two different privilege levels in the same VM is not something that can be done. Whenever you need to run untrusted code, you need to spin up a new OS-managed process for it.
    • Sabine Hossenfelder: Building larger colliders is not the only way forward in the foundations of physics. Particle physicists only seem to be able to think of reasons for a next larger particle collider and not of reasons against it. This is not a good way to evaluate the potential of such a large financial investment.
    • DSHR: I agree with all of this, except that I think that they under-estimate the synergistic cost savings available from optical media technology if it can be deployed at Facebook scale. Given that all current storage technologies are approaching physical limits in the foreseeable future, see Byron et al's Table 9, economic models should follow their example and include decreasing Kryder rates.
    • @UnlikelyLass: Don’t worry — containers are replacing [dynamic libaries] with giant modular monoliths which *contain* all the shared libraries. What could go wrong?
    • Geoff Huston: One effective form of attack on the authoritative DNS server infrastructure, including the root servers, is the so-called random name attack. If you want to target the online availability of a particular domain name, then a random name attack will attempt to saturate the domain name’s authoritative name server (or servers) with queries to resolve names in that zone, putting the server (or servers) under such a level of load that ‘legitimate’ queries are no longer answered. The result is that the name goes dark and the denial of service attack is successful.
    • foone: The Challenger disaster wasn’t a single mistake or flaw or random chance that resulted in the death of 7 people and the loss of a 2 billion dollar spaceship. It was a whole series of mistakes and flaws and coincidences over a long time and at each step they figured they could get away with it because they figured the risks were minimal and they had plenty of engineering overhead. And they were right, most of the time…Then one day they weren’t. Normalization of deviance is the idea that things are designed and limits are calculated. We can go this fast, this hard, this hot, this cold, this heavy. But we always want to optimize. We want to do things cheaper, quicker, more at once.
    • Paul Johnston: Serverless is about aiming to reduce your code to zero and use the cloud vendors services to do as much of the work as possible. Which is why when someone talks to me about “running FaaS on Kubernetes” as being serverless, I find that baffling. That is, to me, seriously increasing the amount of code, and decreasing the number of services being used, and so completely opposite to the idea of serverless. Yes, it uses “Functions" and if that was the only definition of what makes something serverless, then fine, but if you take a look at the above, this approach becomes ridiculous.
    • effbee: My impression was that many [Facebook] employees hold a self-contradictory view about the extent of their influence at the company. When asked about their jobs, they tell you that they're working hard on fixing the problem and making impact ("where better to fix it than from inside?"). But when confronted w/stories like Onavo, they get defensive because "it's a big company, I had no way of knowing." Which is fair, honestly; the problem is that they think they can fix anything in the first place. Part of the problem is that FB advertises itself internally as being super transparent but it isn't at all. (This applies mostly to product/data+ML people. The infra folks I worked with for the most part just want to make their money and go home.) 

  • How is software developed at Amazon? Get a couple of pizzas and watch this excellent interview: DevOps at Amazon with Ken Exner, GM of AWS Developer Tools
    • Key themes: decomposition, automation, and customer focus.
    • Amazon loves decomposition. Amazon used to have a monolithic organization and software architecture (Perl/Mason/C++). They decomposed the monolith into services and decomposed the organization into two pizza teams. Teams are autonomous, independent, and have ownership. Teams own a service end-to-end. They deal with customers, development, testing, support, etc. 
    • Scaling is by mitosis. Teams split apart into smaller teams that completely own a service. EC2 started as one two pizza team. 
    • Amazon loves automation. Automate all the things. Their first tools automated the build and release process, then deployment was automated. At first it's scary that a committed change automatically flows in to production, but anything you can do manually can be put in automation, so it happens the same way every single time. As part of every deployment they go through several different kinds of testing. Started with integration testing. Browser and web based testing. Load testing. They monitor and measure everything. They found they were able to push out changes more frequently and the quality was higher. They could release more and better. 
    • Deployment is a pessimistic process, constantly trying to find reasons to fail a deployment either in pre-production or in production. In production they roll out to one box in one AZ. Any problems? Rollback. Success? Fan out to the AZ, then to more AZs, and then more regions. If a problem is found then roll back to a known good state.
    • Security is managed throughout the entire process. Developers need to think like security engineers. That's part of Amazon's culture. Engineers need to be developers, operators, architects, testers and security experts. Amazon teaches developers these skills. Robert A. Heinlein would be proud
    • When starting a new project the first thing developers work on is an architecture and a threat model. The threat model is reviewed with a security engineer. Developers own their own security because they're closest to the problem, so they're most likely to find problems. Then development is started. Code is submitted for review. Peers give feedback before commit. Static analysis is performed. Then it goes to the build which also has static analysis. Then it goes in to the release pipeline where there are more checks. There are canary monitors that run positive and negative checks against the deployment before the code goes out. 
    • Checks are built in to the entire pipeline through a combination of local and globally mandated policies. If you can inspect a pipeline you can determine if it's following best practices. If you can describe best practices people can create rules that govern the shape, structure, and contents of the pipeline. As an organizational leader you can have rules for your team, like every new commit must have 70% unit test code coverage before it can deploy. There are AWS wide rules that cover every deployment, like you can't deploy to every region at the same time. That's a bad practice, it can be stopped with a rule. Rules can be applied at the team level, organizational level, and the company level. This inspection capability makes sure people can't do bad things. Pipelines have best practices baked in from years of learning. It has been very liberating. Developers don't have to make mistakes and learn the hard way. Through automation you can ensure processes are being followed every single team. DevOps is also DevSecsOps, it's about injecting security into the process. 
    • Developers on a team are responsible for architecture, it doesn't come from architects. Once they have an architecture it's reviewed with an architect or a principle engineer. The role of a principle engineer is to review and teach, not do the architecture. Same with security. The role of a security engineer is not to create the threat model, that's a developer in the team, they review threat models. Same with testing. A team owns the entire process. A lot of time is spent teaching because you want developers to learn.
    • Leaders at Amazon are expected to model what's important. Operations is important at Amazon. You know that because leadership spends a lot of time on operations. For something to be taken seriously leadership must take it seriously. For example, any team must be able to present their dashboards at ops meetings every week. Every blip must be able to be explained. 
    • The best way to plan is bottom up. Teams closest to the product are closest to the customer. They know what the customer wants. The people closest to the customer should be telling Amazon what to do. Every year there are two docs OP1 and OP2 (Operating Plan). Every organization level writes a 6 page document about what they want to do the next year. In the plan you say what you would do if you had flat resources and incremental resources. You present your business plans in 6 pages at every level of the organization. Managers take the 6 page docs from all the teams they manage, make their own 6 page doc and present it to their management. This happens all the way up to Bezos. Resources then flow down. 
    • The layers of management arbitrate different requests and apply judgement. The ideas still come from teams closest to the customer. 
    • Teams also have goals and they are given resources to attain those goals, which are tracked. Teams are thought of as startups and management acts as a board of directors managing their different startups by reviewing goals and metrics. 
    • Teams can have specialists. They can have a mix of different skills, like a webdev, SE, PM, doc writer, marketer, etc.
    • Communication and consistency can be difficult because the teams are separate. Amazon often end up with two of something, but it's better to have two of something rather than none of something. An accepted risk. It can be fixed afterwards. It's better not to slow things down. Consistency is solved by refactoring teams. Create another team, another service to handle that responsibility.
    • How do you convince another team to do something you need them to do? You must be convincing. Global initiatives are driven top down during the annual planning process. For example, if they're going into a new region, teams must plan for that.
    • Here you can see the roots of why AWS beat GCP to the punch. You don't need a complicated value-chain argument. Google originated self-serve fulfillment. If Google would have won you could have easily made a convincing backward looking argument as to why. The key difference is Amazon from the start focussed on the customer. Bottom up, Amazon adaptively grows their entire organization in response to customer inputs. Google barely wants to admit they have customers. That's the root of it, not being first to market.

  • A good list of book recommendations from Jessie Frazelle. 

  • What's different about 5g? Shahriar Discusses: The big thing for 5g networks is the use of beam forming phased arrays for communication. People talk about going beyond the Marconi era. Marconi invented wireless transmission in the late 1800s. The reason it was such a success is it used isotropic radiation. It's the easiest thing to build. You just put out an antenna, send radiation over the air, covering the atmosphere completely, you don't care if most of that energy is wasted and goes into nothing. If there's a receiver somewhere that can collect enough energy then it will use that energy to create wireless communication. This was very successful because of its simplicity and it was fine for frequencies up to a couple gigahertz. Our entire consumer wireless industry has been based on that. For the last 100 years we have been doing basically the same thing. We've been increasing capacity by cramming more bits per hertz. Now we want to cram so much information in the wireless signal we can no longer do it at the frequencies we've been using. What do we do? If you want to go from 4g LTE to 5g something has to change, for the first time in 100 years or so. Let's go to higher frequencies. Let's go to millimeter frequencies starting at 28 gigahertz. Now you can no longer do isotropic radiation in a sphere anymore because the efficiency of being able to transmit and the losses of those signals at those frequencies in free space are so much more. You can't just afford transmitting as much as you want over the air and waiting for someone to catch it. Your link budget will be terrible. Now we need to actually target the beam at individuals instead of sending it everywhere. As soon as you start targeting you get into the phased array problem where you have to beam form electronically so you can track people as they move around. This is a completely new way of communicating.

  • Niantic on Designing a planet-scale real-world AR platform: a central part of our platform is a real-time geospatial storage, indexing, and serving engine, which manages world-wide geospatial objects that developers can control. However, because we envisioned a world where single-world AR games that integrate and tie everyone’s reality together need to operate on a massive scale, with monthly usage measured in the billions of users, a major part of the technological investment we made was in horizontal scalability while retaining a single world instance, primarily by rethinking how server-authoritative games could be conducted on top of horizontally scalable Kubernetes container technology in conjunction with NoSQL denormalized datastores, rather than on the single instance relational SQL databases that MMOs in the past were typically built on. Consequently, Pokémon GO is built entirely on this platform, and has demonstrated concurrent real-time usage of several million players in a single, consistent game environment, with demonstrated monthly usage in the hundreds of millions...we focused our efforts on creating an intuitive API that handles the intricacies of querying and caching both map and geospatial objects as the player moves about the world, allowing developers to code a planet-scale, single-instance real-time multiplayer gameplay with ease, and thereby freeing up opportunities to focus on finding the fun in their game designs...Our technology optimizes for real-time AR that achieves peer-to-peer multiplayer latencies that are in the tens of milliseconds. To put this in perspective, with rendering at 60fps, each new image is displayed at ~16ms, so we render the actual real position of players...we created a comprehensive set of APIs for real-time multiplayer AR experiences using the phone as a control device and viewing portal into the virtual world.

  • Encrypting your MySQL database doesn't cost as much as you might think. Measuring Percona Server for MySQL On-Disk Decryption Overhead: For a high number of threads, there is no measurable difference between encrypted and unencrypted storage. This is because a lot of CPU resources are spent in contention and waits, so the relative time spend in decryption is negligible. However, we can see some performance penalty for a low number of threads: up to 9% penalty for hardware decryption. When data fully fits into memory, there is no measurable difference between encrypted and unencrypted storage.

  • TL;DR: C++20 may well be as big a release as C++11. Here's Herb Sutter's trip report2019-02 Kona ISO C++ Committee Trip Report. Cpp On Sea 2019 videoes are now available. Plus another trip report. The big news is C++ now has coroutines! And modules!

  • Thread pools are a prank gift that keeps on giving. The real problem is when using thread pools there's no mental execution model. From visual cues programmers can't tell the code they're writing runs in a pool. So at some random time later a programmer will add local code that looks right but does something very stupid in a pool environment. And because of the stochastic nature of thread pools they may never know until it's too late. And finding those kind of problem is a nightmare. The Unscalable, Deadlock-prone, Thread Pool: The core issue with thread pools is that the only thing they can do is run opaque functions in a dedicated thread, so the only way to reserve resources is to already be running in a dedicated thread. However, the one resource that every function needs is a thread on which to run, thus any correct lock order must acquire the thread last....My favourite approach assigns one global thread pool (queue) to each function or processing step. The arguments to the functions will change, but the code is always the same, so the resource requirements are also well understood...Complex programs are often best understood as state machines. These state machines can be implicit, or explicit. I prefer the latter. I claim that it’s also preferable to have one thread pool5 per explicit state than to dump all sorts of state transition logic in a shared pool. If writing functions that process flat tables is data-oriented programming, I suppose I’m arguing for data-oriented state machines.

  • Jerry Hargrove with another beautiful interpretive drawing of a presentation. This one is on AWS QLDB (Quantum Ledger Database).

  • I think Fargate is naturally faster and can be more so given the number of knobs you can tune. This will likely cost you more, both in direct resource costs and in engineering time trying to fine-tune the knobs. Whether that's worth it depends on your business needs.  AWS API Performance Comparison: Serverless vs. Containers vs. API Gateway integration: Do you need high performance? Using dedicated instances with Fargate (or ECS/EKS/EC2) is your best best. Is your business logic limited? If so, use API Gateway service proxy. In the vast number of other situations, use AWS Lambda. Lambda is dead-simple to deploy (if you’re using a deployment tool). It’s reliable and scalable. 

  • Irony: npm was rewritten in Rust. Community makes Rust an easy choice for npm. Most of the operations npm performs are network-bound and JavaScript.  However, looking at the authorization service that determines whether a user is allowed to, say, publish a particular package, they saw a CPUbound task that was projected to become a performance bottleneck. The Go rewrite took two days. During the rewrite process, the team was disappointed in the lack of a dependency management solution. They found a stark contrast in the area of dependency management when they began the Rust implementation. "Rust has absolutely stunning dependency management," one engineer enthused. npm’s first Rust program hasn't caused any alerts in its year and a half in production. "My biggest compliment to Rust is that it's boring," offered Dickinson. 

  • Building a smart device? It just got easier. New – RISC-V Support in the FreeRTOS Kernel

  • Think of him as a cross between Albert Einstein and the Dos Equis guy10,000 Hours With Claude Shannon: How A Genius Thinks, Works, and Lives: Cull your inputs;  Big picture first. Details later; Don’t just find a mentor. Allow yourself to be mentored; You don’t have to ship everything you make; Chaos is okay; Time is the soil in which great ideas grow; Consider the content of your friendships; Put money in its place; Fancy is easy. Simple is hard; The less marketing you need, the better your idea or product probably is; Value freedom over status; Don’t look for inspiration. Look for irritation. 

  • Good example of a complex project. Building A Serverless IoT FinTech App with AWS and NodeJS.

  • Here's how you can combine using an infrastructure provider with specializing you're own infrastructure. Terraforming Cloud Infrastructure: About a year-and-a-half ago Picnic migrated to a new Cloud infrastructure model. We moved to declarative Infrastructure-as-Code using HashiCorp Terraform and Docker on Kubernetes. This migration allowed our team to stop worrying about the base systems on which we ran our applications and start loving/working with a true microservices-ready architecture. [We] worked on a process for scaling from 1 to N. And N here was not merely two — we wanted Production, Staging and Development environments for each of the two markets. Oh, and another shared one — we called it Global, with its own set of three environments Let me draw a mental picture of what our application infrastructure looks like (top-down): Applications containerized with Docker. Docker containers (pods/deployments/jobs) deployed on Kubernetes, using Helm Charts. Kubernetes, provisioned EKS clusters on AWS using Terraform.

  • inters/vita: is a virtual private network (VPN) gateway you can use to interconnect your LANs. Vita acts as a tunnel between your local, private network and any number of remote Vita gateways. With it, nodes spread across your outposts can communicate with each other as if they were on the same LAN, with confidentiality and authenticity ensured at the network layer. Vita is probably more efficient at encapsulating traffic than your application servers. You can free cycles for your application by offloading your packet encryption and authentication workload to Vita.

  • rancher/k3s: Lightweight Kubernetes. Easy to install, half the memory, all in a binary less than 40mb. 

  • facebookincubator/magma:  an open-source software platform that gives network operators an open, flexible and extendable mobile core network solution. Allowing operators to offer cellular service without vendor lock-in with a modern, open source core network.

  • hobby-kube/guide: This guide answers the question of how to setup and operate a fully functional, secure Kubernetes cluster on a cloud provider such as Hetzner Cloud, DigitalOcean or Scaleway. It explains how to overcome the lack of external ingress controllers, fully isolated secure private networking and persistent distributed block storage. pstadler: I'm running a three node cluster on Hetzner Cloud for less than $10 a month. Comprehensive guide and automated provisioning available here

  • Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask: Scaling software systems to many-core architectures isone of the most important challenges in computing today. A synchronization scheme is said to scale if its performance does not degrade as the number of cores increases. Ideally, acquiring a lock should for example take the same time regardless of the number of cores sharing that lock. In the last few decades, a large body of work has been devoted to the design, implementation, evaluation, and application of synchronization schemes. et, the designer of a concurrent system still has little indication, a priori, of whether a given synchronization scheme will scale on a given modern many-core architecture and, a posteriori, about exactly why a given scheme did, or did not, scale.

  • CMU Advanced Database Systems - 11 Larger-than-Memory Databases (Spring 2019)12 Recovery Protocols. 

  • Small World with High Risks: A Study of Security Threats in the npm Ecosystem: Our results provide evidence that npm suffers from single points of failure and that unmaintained packages threaten large code bases. We discuss several mitigation techniques, such as trusted maintainers and total first-party security, and analyze their potential effectiveness.

  • Hyperscan: A Fast Multi-pattern Regex Matcher for Modern CPUs: a high performance regular expression matcher for commodity server machines. Hyperscan employs two core techniques for efficient pattern matching. First, it exploits graph decomposition that translates regular expression matching into a series of string and finite automata matching. Second, Hyperscan accelerates both string and finite automata matching using SIMD operations, which brings substantial throughput improvement. Our evaluation shows that Hyperscan improves the performance of Snort by a factor of 8.7 for a real traffic trace

  • FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds: a software-based RDMA virtualization framework designed for containerized clouds. FreeFlow realizes virtual RDMA networking purely with a software-based approach using commodity RDMA NICs. Unlike existing RDMA virtualization solutions, FreeFlow fully satisfies the requirements from cloud environments, such as isolation for multi-tenancy, portability for container migrations, and controllability for control and data plane policies. Also, Slim: OS Kernel Support for a Low-Overhead Container Overlay Network

  • Free book. Cloud Native DevOps With Kubernetes

  • Spectre is here to stay An analysis of side-channels and speculative execution: The recent discovery of the Spectre and Meltdown attacks represents a watershed moment not just for the field of Computer Security, but also of Programming Languages. This paper explores speculative side-channel attacks and their implications for programming languages. These attacks leak information through micro-architectural side-channels which we show are not mere bugs, but in fact lie at the foundation of optimization.

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>