hot links

Stuff The Internet Says On Scalability For December 11th, 2015

High Scalability

Dec 11, 2015 — 14 min read

Hey, it's HighScalability time:

Cheesy Star Trek graphics? Nope. It's hot gas streaming into Pandora’s Cluster.If you like Stuff The Internet Says On Scalability then please consider supporting me on Patreon.

100 million: John Henry as played by a conventional computer loses to a quantum computer; 400,000: cores in PayPal's OpenStack deployment; 10TB: max size of Google Cloud SQL database; 9%: Kickstarter projects that don't deliver; $2.3 trillion: worth of The Forbes 400 members; billions: worth of Spanish treasure ship;

Quotable Quotes:
- Pandalicious: I actually expect that down the road most large open source projects will start distributing a standardized build environment via docker containers.
- @glasnt: "Optimise for speed flexibility & evolution" "Whoever is iterating faster has a huge advantage" - @adrianco #yow15
- @erikbryn: LIDAR goes from $75K to $500, leaves Moore's Law in the Dust
- Henry Miller: One has to believe wholeheartedly in what one is doing, realize that it is the best one can do at the moment—forego perfection now and always!—and accept the consequences which giving birth entails.
- @jedws: "uber is way more reliable on Saturday and Sunday because there are no engineers working on the.system" #yow15
- @samkottle: "Waffles are like kubernetes on a dish" -@rbranson
- @brian_klaas: No server is easier to manage than no server, but are we moving all the complexity to the front-end?
- @Carnage4Life: Death of #unbundling part 2: Facebook shutting down lab which shipped side apps like Hello, Rooms & Slingshot
- @carlosfairgray: Efforts to drive uncertainty out of development have only driven innovation out of development. #yow15 @DReinertsen
- @quinnnorton: “Let’s legislate secure cryptographic backdoors” is the 21st century’s “let’s pass a law to make π = 3”
- @jessitron: To call an API, or just grab it from the database? Don't tap into another team at the spine. Talk to their faces.
- Brian Chesky: One of the keys to get to scale, is to do things that don’t scale. One other important lesson within this lesson is — 100 customers who love you > 1,000,000 users.
- IbanezDavy: The areas of where we expect quantum computers to be faster are roughly known. There are cases where classical computers will still perform better than a quantum computer. But D Wave has been criticized of not truly having a quantum computer, so I think they are motivated in just demonstrating that they do indeed have one.
- @tiagogriffo: "We developed the product so fast that marketing had not time to change the requirements" said a PM. From @DReinertsen talk at #yow15
- @xaprb: push 10,000 metrics/sec at 1-sec resolution for 1000 servers for a year and see if it scales forever ;-)

Apple has open sourced Swift for reals, not just a code dump months too late to be of use. Swift is on github, you can look at the code, see the entire version history from the very first check-in, see what's changing, contribute, file bugs, etc. So it's a real open source project. Apple is even porting key frameworks like their Foundation libraries over to Swift. If you are looking for the one language to rule them all, that can run fast enough on the server, be used for web apps, and run on mobile, Swift is making the case for being that language, which is no doubt what Apple also wants it for. Incentives align. Expect developers to quickly fillout the tool chain. How does Swift compare? Go vs Node vs Rust vs Swift. Swift is fast, but lacks language primitives for parallelism.

Ruby can be much faster. 25,000+ Req/s for Rack JSON API with MRuby~ MRuby is a minimal version of Ruby, that can be embedded in any system that supports C...There is a new HTTP web server called H2O, which is really, really fast...When H2O is compiled, it embeds a MRuby interpreter that can be used to run Ruby code. The result: an astonishing: 28,000+ requests per second.

Fox guarding the chickens. U.S. states pass laws backing Uber’s view of drivers as contractors.

In the same way there's always a tradeoff between ASIC and white box solutions, there's also an ebb and flow between domain specific languages and general purpose languages. Google replaced Sawzall, a DSL for performing powerful, scalable analysis, with a software ecosystem built around Go. Replacing Sawzall — a case study in domain-specific language migration. The result: we’ve found that with carefully designed libraries we can get most of the benefits of Sawzall in Go while gaining the advantages of a powerful general-purpose language. The overall response of analysts to these changes has been extremely positive. Today, logs analysis is one of the most intensive users of Go at Google, and Go is the most-used language for reading logs through the logs proxy.

There's a new data mining Barbie. The new talking Hello Barbie doll has the mind of Siri: "Equipped with Siri-like voice-recognition software and a wi-fi connection, Hello Barbie can respond to questions from kids about everything from her favorite color to career goals." Unfortunately I can't take credit for the data mining comment, I heard it on TWiT.

If you have 70 data caching stations around the world connected with fast links and you are already expert at caching your own content, starting your own CDN makes a lot sense. So that's what Google did. Cloud CDN. Interestingly, Google may be trying to turn these caching stations into datacenters, so says Google's Secret Plan to Catch Up to Amazon and Microsoft in Cloud. If you could use Kubernetes to place work on the edge and combine that with some kind of multi-datacenter database, you would have yourself very low latency access to a lot of mobile devices.

So, what happens when your computer gets hungry and you are the only food around? Engineers build biologically powered chip: Columbia Engineering researchers have, for the first time, harnessed the molecular machinery of living systems to power an integrated circuit from adenosine triphosphate (ATP), the energy currency of life. They achieved this by integrating a conventional solid-state complementary metal-oxide-semiconductor (CMOS) integrated circuit with an artificial lipid bilayer membrane containing ATP-powered ion pumps, opening the door to creating entirely new artificial systems that contain both biological and solid-state components.

MySQL isn't dead yet. Wix thinks of MySQL as the better NoSQL. They have an active-active setup across 3 datacenters, handles 200K requests per minute, with 1.0-1.5 msec latency average. Scaling to 100M: MySQL is a Better NoSQL: "we’ve found that MySQL, when used creatively as a key/value store, can do a better job compared to MySQL with a normalized data model (like the one above)—and to most NoSQL engines. Simply use MySQL as a NoSQL engine. Our existing system has scaling / throughput / concurrency / latency figures that are impressive for any NoSQL engine." To accomplish this however they give up transactions in the database, do not normalize data, do not use foreign keys, query only on a primary key or index, and do not use joins or aggregations.

Videos from Streaming Media West Conference are now available.

dougbarrett: For example, I just spun up some new analytics software I wrote last week in golang and it peaks around 50k req/min, averages around 23k req/min, I'm using 4 nodes which realistically could probably handle 100k req/min without issue since in peak it's consuming about 50% of available resources to us as it is right now, but $100/mo is well worth it for the piece of mind, then combined with a $20/mo percona mysql server on digitalocean that averages around 4% cpu load, we're paying $120/mo to handle on the low end 927,360,000 http requests/month and all I have to do is a git push and the code automatically sets up the DB structure and DB indexes when booting the app up.

So much pain! What I Learned from Working in Failed Platforms: Sometimes it feels as though I was deeply involved in every failed technology during the past 20 years...Being open minded, means being civilized in our discussion of technologies...keep you aware of how the landscape around you may be changing...The most important lesson I’ve learned is to not let your career and your identity become too tied to a single technology.

Fried has put together The Ultimate Guide to Online Privacy, it's a massive 10,000 word post listing over 150 tools. Nice coverage and range of topics. Though that we have or need 150+ tools is probably part of the problem.

Dang. Fermilab Experiment Finds No Evidence That We Live in a Hologram. We may still live in a simulation, so all is not lost.

Hans Rosling giving hope to the hopeless. Are we better off than we think? Despite global inequalities, most of the world is better off than you think - and better off than it has ever been before.

We got a Heisenberg situation here. From RUM to Robot Crawl Experience. Up to 75% of the traffic to a site can be from bots crawling the site. So if Google hammers a site by crawling it, and if the site is slow, it's ranking will tank. The reason a site could be slow is Google is killing it with SEO kindness.

Like lists? Wikipedia has a whole List of lists of lists. I like the Lists of Middle-earth articles.

Spotify shows off their latency based load balancer. They have some cool videos showing round-robin and Join the Shortest Queue approaches. Both approaches lose replies when machines are failing fast. To help Circuit Breakers are used to monitor machine health and take them out of rotation when the metrics go bad. As a fix they created Expected Latency Selector (ELS), a probabilistic load balancer, which performs better when machines are failing. Interesting stuff.

The dreaded triple fault. Yesterday’s message latency issues & what we’re doing about it. HipChat was hit by a thundering herd as the rapid adoption of Windows 4.0 overloaded their platform. In preparation they had tripled the number servers and tripled it again when the problems popped up. As a fix they've upgraded their connection handling infrastructure, though it's not clear what this means. A bug may have made it impossible for users to decline the upgrade, which may account for the larger than expected number of upgrades.

Should you use Google analytics or roll your own? SpiderOak ditched Google and spent the time and resources to build their own analytics infrastructure. Why?: "by using Google Analytics, we are furthering the erosion of privacy on the web." Great discussion on HackerNews. Lots of debate surrounding the value of building it yourself versus using someone else's service, interesting how privacy was not the hot topic.

If you've ever wanted to learn the Wolfram Language there's now a book to help. It's free on the web and looks good.

Videos from Strata + Hadoop World - Singapore 2015 are available.

Asynchronous and non-blocking IO. There is a difference: Asynchronous IO refers to an interface where you supply a callback to an IO operation, which is invoked when the operation completes.
Non-blocking IO refers to an interface where IO operations will return immediately with a special error code if called when they are in a state that would otherwise cause them to block.

AI will replace smartphones within 5 years, Ericsson survey suggests. If an app on a smart phone is like consuming information with a spoon, using an AI is consuming information through a straw. And if you only eat through a straw, you are going to starve.

Will the price of SSDs fall enough to make DAWN, a Durable Array of Wimpy Nodes a reality? Or is it still too early in the morning for DAWN? In SSDs: Cheap as Chips? David Rosenthal says not yet. Though SSD prices have fallen, they need to become about 3x more expensive than disk for DAWN to rise. And as SSDs still command a price premium and as supply doesn't look like it will increase, we'll just have to wait.

Here's a free ebook for learning modern Perl.

Varnish Foo - Working With HTTP caching. Great section on how to use the browser to dig into cache control headers. The section on how to build HTTP commands on the command line using HTTPie is also excellent, as are the descriptions of various headers.

Moore's Law isn't the only source of performance improvements, algorithms are also getting better. Algorithms vs Moore’s Law.

Good Lessons Learned In Big App Development, A Hawaiian Airlines Case Study. Use lightweight front-end sandbox when backend APIs are not available. Using static JSON files as input data worked for a while, but failed as things got more complex. The solution: "PouchDB handled localStorage, which also allowed us to sync up to a lightweight CouchDB server so that we could share data states between team members". Better yet, mock the endpoints. Work directly with the backend team to integrate the UI. Throwing code over the fence didn't work. All teams should use the same tools and processes. Use @mixins instead of :extends. Better yet, OOCSS and BEM would have been ideal frameworks to approach a project of this size. AngularJS was a great solution for a big project because of the flexibility it gives you in creating UI components, but it was laggy on tablets and slow desktop computers. The single most painful thing they did was the custom form controls. They decided to scrap these custom form controls in favor of their native counterparts.

Fascinating notes on fast grep and why the Linux grep is faster than the grep on FreeBSD: got data into memory with zero-copy; didn't parse newlines first; used mixed DFA and NFA for regex; used Boyer-Moore instead for simple patterns.

Videos from the Ada Lovelace Symposium are available.

Journalling Revisited: Since moving to using xfs as our journalling file-system, we have observed no re-occurrence of the periodic latency spikes seen when running on ext4. This improvement is highly workload-dependent, and the findings presented here will not suit all cases.

Wireless remote brain interfaces. So do you still think back doors are a good idea? If we establish the precedent of back doors now then there will be back doors into your mind.

I don't find them radical, but if you want the web to win, here are some Radical Statements about the Mobile Web: The web isn't close to competing with higher-end native apps; Users don't care about the web; The DOM is slow; The DOM makes JavaScript slow; Maybe the DOM is not the answer. Prediction #3: By 2020, A special mode for shared memory multi-threading becomes available in asm.js, with a fallback to a single-threaded cooperative mode if asm.js isn't specially recognized.

For a beginner or even an old hand, Alvaro Videla has put together a great introduction to the main concepts of distributed systems. What We Talk About When We Talk About Distributed Systems (video). The main concepts covered are: Timing Model; Interprocess Communication; Failure Modes; Failure Detectors; Leader Election; Consensus; Quorums; Time In Distributed Systems; A Quick Look At FLP.

Here are Some transcripts from the Scaling Bitcoin workshops.

By open-sourcing their AI hardware design Facebook is taking the Open Compute Project in an unexpected direction. From the specs it seems like it will be a spendy box, but hey, you don't have to do the heavy lifting on the design, so you've saved a lot.

Twitter has open sourced Finatra 2.0: the fast, testable Scala services framework that powers Twitter. They use it to run hundreds of services.

Snapdeal on How we’re building a system to scale for billions of requests per day. Lots of great details, but there's an interesting observation: Real-time Bidding (RTB) is killing Ad-tech. The Real-time bidding (RTB) protocol is bloated; Large players invest in extremely large amount of hardware to offset the complexity introduced by the RTB spec; Small players can’t compete with large players; All of the above is hurting competitiveness on the whole and limiting advertiser choices.

Why Percentiles Don’t Work the Way you Think: If you want to compute percentiles at intervals and then store the results in a time series database -- as some extant databases currently do -- you might not be getting what you think you are.

It began with a rebel monk. Earliest known piece of polyphonic music discovered: Typically, polyphonic music is seen as having developed from a set of fixed rules and almost mechanical practice. This changes how we understand that development precisely because whoever wrote it was breaking those rules. It shows that music at this time was in a state of flux and development, the conventions were less rules to be followed, than a starting point from which one might explore new compositional paths.

That crazy wabbit has published another wonderful chapter, this time on Modular Architecture: Client-Side. On Debugging Distributed Systems, Deterministic Logic, and Finite State Machines: If you don’t design your distributed system for testing and post-mortem analysis,1 you will find yourself in lots of trouble.

This is cool. Jeroen Domburg Implementing the Tamagotchi Singularity.

Perhaps humans will purposefully give AIs consciousness so that they have skin in the game so they will pause when making the decision to destroy the world? That time I was nearly burned alive by a machine-learning model and didn’t even notice for 33 years: one of the major themes in Red Plenty is the tension between Kantorovich’s vision of a decentralised, instantly responsive socialist economy, and the Party’s discretionary power – between communism and the Communists, if you like. The RYAN story flips this on its head. This time, it wasn’t the bureaucrats’ insistence on clinging to power that was the problem. It was the solution. The computer said “War”; only fundamentally political, human discretion could say “Peace”. As Joseph Weizenbaum put it, a computer can decide but it cannot choose.

How many servers can one person manage? Brent Ozar says it depends what you are doing with the servers. If you are just managing the hardware then 1000's; if you are responsible for the OS then 100's; if you are responsible for standalone instance SQL server then 50-100; if you are responsible for high availability and disaster recovery then 10-50; if you are responsible for performance then 1-5.

Why Hyperconverged Infrastructure is so Hot: There’s pressure on IT departments to be able to provision resources instantly; more and more applications are best-suited for scale-out systems built using commodity components; software-defined storage promises great efficiency gains; data volume growth is unpredictable; and so on.

Examining IPv6 Performance: The measurements are within 10ms of each other 60% of the time...The current connection failure rate for IPv4 connections was seen to be some 0.2% of all connection attempts, while the equivalent connection failure rate for unicast IPv6 is nine times higher, at 1.8% of all connection attempts.

Here's in detail how Netflix performs High Quality Video Encoding at Scale: We designed for automated quality control checks throughout so that we fail fast and detect issues early in the processing chain. Video is processed in parallel segments. This decreases end-to-end processing delay, reduces the required local storage and improves the system’s error resilience. We have invested in integrating video quality metrics into the pipeline so that we can continuously monitor performance and further optimize our encoding.

OpenFastPath: an open source implementation of a high performance TCP/IP stack that provides features that network application developers need to cope with today’s fast-paced network.

Apache Myriad: a project which aims to have Hadoop jobs run inside the same physical infrastructure that's managed by Mesos. Typically in a data center, you want to run multiple workloads. Hadoop is one of the workloads that you would want to run inside a data center.

Serverless (formerly JAWS): The serverless application framework – Use bleeding-edge AWS services to redefine how to build massively scalable (and cheap) apps!

go-metrics: Go port of Coda Hale's Metrics library. And here's how they can be used: Metrics for Microservices.

Kong: a scalable, open source API Layer (also known as a API Gateway, or API Middleware). Kong runs in front of any RESTful API and is extended through Plugins, which provide extra functionalities and services beyond the core platform.

Information Effects: Computation is a physical process which, like all other physical processes, is fundamentally reversible. From the notion of type isomorphisms, we derive a typed, universal, and reversible computational model in which information is treated as a linear resource that can neither be duplicated nor erased. We use this model as a semantic foundation for computation and show that the “gap” between conventional irreversible computation and logically reversible computation can be captured by a type-and-effect system.

I have new short story on Amazon: The Uncommonly Devine Supper Club. If you are squeamish then I'd stay away, but if not you might find it interesting.

Stuff The Internet Says On Scalability For December 11th, 2015

High Scalability

Read more

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale

The Swedbank Outage shows that Change Controls don't work