Stuff The Internet Says On Scalability For July 8th, 2016
Hey, it's HighScalability time:
Juno: 165,000mph, 1.7 billion miles, missed orbit by 10 miles. Dang buggy software. If you like this sort of Stuff then please support me on Patreon.
- $3B: damages awarded to HP from Oracle; 37%: when to stop looking through your search period; 70%: observed Annualized Failure Rate (AFR) in production datacenters for some models of SSDs;
- Quotable Quotes:
- spacerodent: After Christmas there was this huge excess capacity and that is when I first learned of the EC2 project. It was my belief EC2 came out of a need to utilize those extra Gurupa servers during the off season:)
- bcantrill: That said, I think Sun's problem was pretty simple: we thought we were a hardware company long after it should have been clear that we were a systems company. As a result, we made overpriced, underperforming (and, it kills me to say, unreliable) hardware. And because we were hardware-fixated, we did not understand the economic disruptive force of either Intel or open source until it was too late.
- @cmeik: I am not convinced the blockchain and CRDTs *work.*
- daly: Managers make decisions. Only go to management with your need for a decision and always present the options. They went to management with what was, in essence, a complaint. Worse, it was a complaint that had nothing to do with the business. Clearly they were not keeping the business uppermost in their priority queue. So management made a business decision and fixed the problem.
- @colettecello: Architect: "we should break this down into 6 microservices" Me: "you have 6 teams who hate each other?" Architect: "how did you know that?"
- Matt Stats: The differences between BSD and Linux all derive from basic philosophical differences. Once you understand those, everything else falls into place pretty neatly.
- @wattersjames: "Last year, Johnson & Johnson turned off its last mainframe"
- Allan Kelly: But in the world of software development this mindset [economies of scale] is a recipe for failure and under performance. The conflict between economies of scale thinking and diseconomies of scale working will create tension and conflict.
- Jeff G: Today, a large part of my business is migrating companies off the monolithic Java EE containers into lightweight modular containers. Yes, even the tried and true banking and financial industries are moving away from Java EE.
- collyw: "Weeks of programming can save hours of planning" is a favorite quote of mine.
- xiongchiamiov: when I see a team responsible for hundreds of microservices, it's not at all surprising when I find they're completely underwater and struggling to keep up with maintenance, much less new features.
- Robert Plomin: We're always talking about differences. The only genetics that makes a difference is that 1 percent of the 3 billion base pairs. But that is over 10 million base pairs of DNA. We're looking at these differences and asking to what extent they cause the differences that we observe.
- @jmferdegue: Micro services as a cost reduction strategy for project delivery. Marco Cullen from @OpenCredo at #micromanchester
- J.R.R. Tolkien: I like, and even dare to wear in these dull days, ornamental waistcoats.
- @johnregehr~ HN commenter has reached enlightenment : In both cases, after about a year, we found ourselves wishing we had not rewritten the network stack.
- @nigelbabu: OH: 9.9999% uptime is still five 9s.
- @jessfraz: "We are going to need a floppy and a shaman" @ryanhuber
- Peter Cohen: So, why the cloud? Because, the developer.
- @CompSciFact: 'The fastest algorithm can frequently be replaced by one that is almost as fast and much easier to understand.' -- Douglas W. Jones
- @igrigorik: Improved font loading in WebKit: http://bit.ly/29eaxV2 - tl;dr: 3s timeout, WOFF2, unicode-range, Font Loading API. hooray!
- @danielbryantuk: "I've worked on teams with 200+. We had 3 people just to make JPA work" @myfear on scaling issues #micromanchester
- AWS Origin Story: Jassy tells of an executive retreat at Jeff Bezos’ house in 2003. It was there that the executive team conducted an exercise identifying the company’s core competencies
- @sheeshee: ".. you are charged for every 100ms your code executes and the number of times your code is triggered." the 1970ies are back. (aws lambda)
- @KentBeck: accepting mediocrity as the price of scaling misunderstands the power law distribution of payoffs.
- @cowtowncoder: that is: cost efficiency from AWS et al is for SMALL deployments, and at some point it always, invariably becomes cheaper to DIY
- Exascale Computing Research priorities: Total power requirements suggest that CPUs will not be suitable commodity processors for supercomputers in the future.
- Here's how Instagram does it. Instagram + Android: Four Years Later: At the core of this principle is the idea that the Instagram app is simply a renderer of server-provided data, much like a web browser. Almost all complex business logic happens server-side, where it is easier to fix bugs and add new features. We rely on the server to be perfect, enforced through continuous integration testing, and dispense with null-checking or data-consistency checking on the client.
- Good story on how WePay is moving from a monolith to a services based architecture on top of Kubernetes. Advantages: autoscaling, rolling updates, a pure model independent of software assigned to specific machines. WePay on Kubernetes: ‘It Changed Our Business’.
- Julia Ferraioli with a really fun explaination of Kubernetes using legos.
- THE MEGAPROCESSOR. A 20kHz behemoth CPU you can actually see in action. So cool. Lonnie Veal nails it: My first thought was: "This is the Voice of the Colossus..." If anything, this reminds me of all those old Sci-Fi Movies & Shows like Star Trek where the control rooms always had those massive banks of blinking lights.
- What cost billions and was never used? Nitro Zeus, Stuxnet's bigger, badder brother. The US could have destroyed Iran's entire infrastructure without dropping a single bomb.
- All you have to do to bias an algorithm is carefully select your training data. Are Face Recognition Systems Accurate? Depends on Your Race.
- Is it possible garbage collection doesn't have to suck? Go’s march to low-latency GC: It’s the story of how improvements to the Go runtime between Go 1.4 and Go 1.6 gave us a 20x improvement in garbage collection (GC) pause time, of how we’ve gotten another 10x improvement in Go 1.6’s pauses, and of how sharing our experience with the Go runtime team helped them give us an additional 10x speedup in Go 1.7 while obsoleting our manual tuning.
- Minor. That's the answer to any question about overhead. MySQL with Docker – Performance characteristics: The key takeaway here is that heavy I/O-bound load evens out the differences which might otherwise be present, resulting in Docker performing just as well as a stock instance. When not bound by I/O, Docker imposes a minor overhead, especially when running through the bridged network.
- The Apple app store requires more testing than is required of self-driving cars. Is it time to test scenarios and edge cases before self-driving software can be deployed? A white truck turning in the road is a scenario that should have been regression tested under numerous conditions. Continuous software deployment is not OK for self-driving cars. We all know even the slightest code change can lead to disaster.
- StorageMojo on The top storage challenges of the next decade. Over the last 10 years we've seen a lot of changes: Cloud storage and computing that put a price on IT’s head; Scale out object storage; Flash. Millions of IOPS in a few RU; Deduplication; 1,000 year optical discs. For the future the fundamental driver is that we do IT for the information, not the infrastructure...the future will be won by smarter architectures, not brute force.
- From a data POV, this is the ultimate your are the product play: 23andMe Sells Data for Drug Search.
- City machines predate datacenter machines by a few years. The city as a practical machine: So what do these various settlements have in common? As pointed out by Lynch, they are practical settlements, built for a specific purposes, often in haste and often as a temporary settlement. They tend to have some common spatial attributes, including highly planned layouts and physical separation from other settlements.
- An excellent counter point Response: CAM Table Basics to a point Basics: What is Content Addressable Memory (CAM). The magic revealed.
- How do we put back the secondary indices on NoSQL databases, without compromising scalability, availability, and performance? Murat is on the case. Replex: A Scalable, Highly Available Multi-Index Data Store: The paper compares Replex to Hyperdex and Cassandra and shows that Replex's steady-state performance is 76% better than Hyperdex and on par with Cassandra for writes. For reads, Replex outperforms Cassandra by as much as 2-9x while maintaining performance equivalent with HyperDex. In addition, the paper shows that Replex can recover from one or two failures 2-3x faster than Hyperdex.
- Gil Tene with another great explanation. This time on memory barriers: The simple rule to remember is this: before a CPU ever sees the instructions you think you are running, a compiler (static, JIT, whatever) will have a chance to completely shuffle those instructions around and give them to the cpu in any order it sees fit. So what a CPU actually does with barriers has no functional/correctness implications, as it is the compiler's job to make the CPU do what it needs to. Barriers/fences/memory models are first and foremost compiler concerns, and have both a correctness and (potential and profound) performance implications. The CPU concerns are only secondary, and only effect performance (never correctness). This is critical to understand and get your head around before trying to reason about what fences/barriers/etc. *mean* to program code.
- Excellent tutorial. The Life of a Serverless Microservice on AWS.
- Lessons from developing a distributed system in Haskell. Distributed Systems in Haskell: Only ever block if there are no messages whatsoever waiting for your server; Don't use interrupt-based timeouts; Separate your server logic and any networking; Try to have pure server logic; Use Monads to simplify your code as it gets bigger; Use Cloud Haskell and Lenses and other nice libraries to simplify your life and your code.
- Why do we use the Linux kernel's TCP stack?: That fast networking framework Seastar from before is written using something from Intel called DPDK. The deal with DPDK seems to be that it's a network card driver and some libraries, but instead of it giving you packets through interrupts (asynchronously), instead it polls the network card and say "do you have a packet yet? now? now? now?".
- Looks like our solar system could be made of materials derived from at least two different super novas. Evolution of the Solar System Inferred from Sm Nd Isotopic Studies - Lars Borg (SETI Talks 2016)
-
The cost of the async state machine: In this case, we can see that the sync benefit has effectively been eliminated. The cost of the async state machine is swallowed in the cost of the blocking waits. But unlike the sync system, when the async state machine is yielding, something else can run on this thread.
-
Great 4 part series of articles on creating a video chat application. Video Conference Part 1: These Things Suck. Covers idea like Discrete Cosine Transform, Quantization, RLE, block state formats, Entropy Coding, Choose UDP over TCP for networking: "In other words, we only resend data when absolutely necessary, and only in the right order relative to other updates." Final Statistics: Avg bits per pixel: 0.6 bits for low quality; 1.0 bits for good quality Max usable sustained packet loss: 25%; Max survivable sustained packet loss: 75%.
- While showing how SQL can be used to compute trendlines to analyze data, Periscope found the average rating of the top 15 films has not changed over time. There’s virtually no correlation. IMDb vs RottenTomatoes Ratings with SQL Trendlines. A clear explanation of complex queries.
- What speeds up writes in a database? Getting durable, faster: In an internal benchmark we did, we were in 2nd place, ahead of pretty much everything else. The problem was that the database engine that was ahead of us was faster by x40 times. You read that right, it was forty times faster than we were. And that sucked. Monitoring what it did showed that it didn’t bother to call fsync, instead, it used direct unbuffered I/O (FILE_FLAG_NO_BUFFERING | FILE_FLAG_WRITE_THROUGH on Windows). Those flags have very strict usage rules (specific alignment for both memory and position in file), but the good thing about them is that they allow us to send the data directly from the user memory all the way to the disk while bypassing all the caches, that means that when we write a few KB, we write a few KB, we don’t need to wait for the entire disk cache to be flushed to disk. That gave us a tremendous boost. Other things that we did was compress the data that we wrote to the journal, to reduce the amount of I/O, and again, preallocation and writing in sequential manner helps, quite a lot.
- On Scaling Decentralized Blockchains: Our results suggest that reparameterization of block size and intervals should be viewed only as a first increment toward achieving next-generation, high-load blockchain protocols, and major advances will additionally require a basic rethinking of technical approaches.