hot links

Stuff The Internet Says On Scalability For September 18th, 2015

High Scalability

18 Sep 2015 — 8 min read

Hey, it's HighScalability time:

This is how you blast microprocessors with high-energy beams to test them for space.

terabits: Facebook's network capacity; 56.2 Gbps: largest extortion DDoS attack seen by Akamai; 220: minutes spent usings apps per day; $33 billion: 2015 in-app purchases; 2334: web servers running in containers on a Raspberry Pi 2; 121: startups valued over $1 billion

Quotable Quotes:
- A Beautiful Question: Finding Nature's Deep Design: Two obsessions are the hallmarks of Nature’s artistic style: Symmetry—a love of harmony, balance, and proportion Economy—satisfaction in producing an abundance of effects from very limited means
- @Carnage4Life: ad blocking Apple has done to Google what Google did to MSFT. Added a feature they can't compete with without breaking their biz model
- @shellen: FWIW - Dreamforce is a localized weather system that strikes downtown SF every year causing widespread panic & bad slacks.
- @KentBeck: first you learn the value of abstraction, then you learn the cost of abstraction, then you're ready to engineer
- @doctorow: Arab-looking man of Syrian descent found in garage building what looks like a bomb
- @kixxauth: Idempotency is not something you take a pill for. -- ZeroMQ
- @sorenmacbeth: Alice in Blockchains
- Sebastian Thrun: BECAUSE of the increased efficiency of machines, it is getting harder and harder for a human to make a productive contribution to society
- Coding Horror: Getting the details right is the difference between something that delights, and something customers tolerate.
- @mamund: "[92% of] all catastrophic failures are the result of incorrect handling of non-fatal errors."
- Charles Weitz: Almost every cell in our body has a circadian clock. It helps every cell figure out when to use energy, when to rest, when to repair DNA, or to replicate DNA.
- @kfury: Web development skills are like cells in your body. Every 7 years they're completely replaced by new ones.
- Alexey Gorshkov: We’re learning how to build complex states of light that, in turn, can be built into more complex objects.
- @BenedictEvans: Ad blocking = taking money away from people whose work you read. Everyone has reasons, or excuses. But it remains true
- Gaffer on Games: I swear you guys are like the f*cking climate change deniers of network programming..not just a rant, also deeply informative.
- @anoemi: I don't use emojis because when I use smiley faces, I like to stay close to the metal.
- @neil_conway: in practice, basically no app logic gets retry logic right (esp. for read-only xacts, which can abort under serializable).
- @xaprb: All roads lead to Rome. All queueing theory studies lead to Agner Erlang. All scalability studies lead to Neil Gunther.

Why doesn't Google use git? Here's why. Stats on the Google source code repository: 1 billion files, 9 million source files, 2 billion lines of code, 35 million commits, 86 terabytes, 45 thousand commits per workday, 25,000 Googlers from all over the world, billions of file read requests per day (800K QPS peak). All in one single repository. The rate of change is on an exponential growth curve. Of note: robots commit 30K times per day, humans only 15K. From a talk by Rachel Potvin: The Motivation for a Monolithic Codebase.

The problem is as soon as Medium becomes everything it also becomes nothing. Medium's Evan Williams To Publishers: Your Website Is Toast.

If you appreciate the technical aspects of the intricate bot games Ashley Madison is said to have played then you might enjoy Darknet, a book that takes the same idea to chilling extremes. AI driven Distributed Autonomous Corporations use bitcoin and anonymous markets to take the world to the brink. Only a gambit worthy of Captain Kirk saves the day.

Points to ponder. Why I wouldn’t use rails for a new company: I worry now that rails is past its zenith, and that starting a new company with rails today might be like starting a company using Java Spring in 2007...Everyone knows that ruby is slow...over time other frameworks simply picked up those innovations [Rails]...If you want to future-proof your web application, you have to make a bet on what engineers will want to use in three years.

Do we have a data crisis? Yes says Adrian Colyer: most of the applications that we write that work with persistent data are broken, and the situation seems to be getting worse...Even applications that work with a traditional RDBMS are prone to integrity violations...Building a better datastore without also building a better way for applications to interact with the datastore isn’t going to solve the data crisis...If you reflect on some of the papers we’ve been looking at, an interesting pattern starts to emerge that is very application-centric.

Thoughtful approach for Rendering 12,000 Image Albums at Imgur: Our new album rendering code uses what I'm going to call biphasal rendering...The render of an album page is now broken down in to 2 phases as the name suggests...This is achieved by rendering a large blank div above and below the images that are currently in the viewport buffer rather than rendering each placeholder individually.

Doesn't this turn work into another version of High School? *shiver* @codinghorror: "If you tolerate even one developer whom the other developers think is a problem, you'll hurt the morale of the good developers."

Adding a level of indirection is simply algorithm-washing. All you've done is shift the elitism to the algorithm maker, caretakers, data harvesters, and UI builders. So Eric Schmidt is a bit self-serving when he says Apple Music's human curation is 'elitist': As a bonus, it's a much less elitist taste-making process - much more democratic - allowing everyone to discover the next big star through our own collective tastes and not through the individual preferences of a select few.

Anatomy of a Modern Production Stack. A modern stack should be: Self healing and self managing; Supports microservices; Efficient; Debuggable. The parts: a simplified and manageable Linux distribution; something has to be able to bootstrap those machines and get them running as productive members of the cluster; a system for setting up and managing containers; an efficient way to capture, name and distribute the set of files that make up a container at runtime; a central place to store and load Container Image; a system for structuring what is running inside of a container; need to get containers running across multiple hosts; Orchestration Config; Network Virtualization; Container Storage Systems; Discovery Service; Production Identity and Authentication; Monitoring; Logging; Deep Inspection and Tracing.

Videos from the Golang UK Conference 2015 are now available.

ZitchDog framing Facebook's Relay in an interesting perspective: Between relay and react-native, we have the potential for a game-change in application architecture. A single repository could easily host frontends for each application target, (iOS, Android, web) along with a backend written in Relay/GraphQL. What's more, static analysis using flow/TypeScript could provide type checking across all applications, all the way back to the storage backend. Exciting stuff.

More Eric Schmidt. Intelligent machines: Making AI work in the real world: Something changed in those last few years, an inflection point, a final push over the line from 'This could work' to 'Wow, this works better than anything else we've come up with!' Indeed, deep learning really took off when it got an infusion of computing at immense scale, using networks of thousands of computers working together.

What are the odds of me seeing this and then posting this link? Probability Cheatsheet.

The problem is someone will take all your component parts, generalize it a bit, package them up, and presto, you have another framerwork. Are distributed frameworks necessary?: perhaps what we really need is documentation on how to stand up multiple processes and how to tie them together. I believe we have most if not all of the technology to do it. Mesos and Kubernetes can be used to execute processes on a cluster of machines. Queueing technology like Kafka and NSQ can be used to pass messages between processes. Processes can be written in many different languages and can be packaged in containers using Docker or similar to manage dependencies.

Here's how LinkedIn implemented faceted search: We chose to use inverted index posting lists to count facet values. Using the inverted index for facet counting allows us to guarantee exact counts for low cardinality, while at the same time allowing us to estimate the counts for high cardinality values, providing a significant performance boost. Our approach also retains the option for us to early terminate when scoring documents.

In a war when a new weapon is developed a strategy to counter it is also developed. Apple’s Support of Ad Blocking May Upend How the Web Works. Look for the battle to continue.

States are losing money to online services. They are fighting back with Byzantine rules for defining when a company has nexus in a state. Here's nearly 400 pages on a mind numbing subject that could impact your bottom line. Taxing Cloud-Community Transactions: Receipts from cloud computing are characterized as services in 12 states; the sale, lease, license or rental of intangible personal property in 5 states and the sale of tangible personal property in 1 state. Cloud computing is an area ripe for statue legislatures to address how it is sourced for both sales and income tax purposes.

Amazing! NASA Engineers and Scientists-Transforming Dreams Into Reality: At the start of the Apollo program, the onboard flight software needed to land on the moon didn’t exist. Computer science wasn’t in any college curriculum. NASA turned to mathematician Margaret Hamilton, of the Massachusetts Institute of Technology, to pioneer and direct the effort. With her colleagues, she developed the building blocks for modern “software engineering,” a term Hamilton coined.

Here's how Twitter built their high-performance replicated log service: Logs are a building block of distributed systems and once you understand the basic pattern you start to see applications for them everywhere...By first writing all updates to a log, and then having each replica read and apply those updates in order, we are able to ensure that a compare-and-set operation will leave all replicas in the same state.

Need a network design for your highly scalable web application? Greg Ferro has you covered in great detail. A Segmented Front End Web Network Architecture.

Interesting reading after the post on Uber. Under the Hood: A Look at Sidecar’s On-demand Logistics Infrastructure. Many of the same issues. Sidecar makes use of cars, bikers and walkers, which requires an even further generalization of the dispatch algorithm. They say the combination yields a much lower cost than a car only model for hyper local deliveries. Instead of cells they break up the grid into .25 mile square chunks and predict demand in each chunk every 15 minutes. A bit like weather prediction I imagine.

Why use Event Sourcing instead of relational model? PHPun answers: CQRS is not the only way to scale, it can't even be described as the best way to scale per se, it's just one way to decouple two aspects of a model, and remove any bottleneck resulting from having both in sync. In some cases, it's the most natural approach ever (like the realtime MMORPG example - there really isn't any better way to do this I can think of) in other cases it'll be more of a tradeoff.

Good look into what happened Inside @Scale 2015. There are presentations from Twitter, Facebook, Pinterest, Microsoft, and Google. And here are all the videos.

Awesome exploration of Counting unique items fast – Better intersections with MinHash: getting intersection counts through sketching techniques do carry a reasonable error percentage that needs to be considered when exposing analytics using them. KMinHash is an effective sketching technique that gives more accurate results for intersection counts compared to the Inclusion/Exclusion method of HLLs.

Seven Microservices Anti-patterns: Cohesion Chaos; Not taking Automation Seriously; Layered Services Architecture; Relying on Consumer Sign-off; Manual Configurations Management; Versioning Avoidance; Building a gateway in every service.

Curt Monash has product updates for MongoDB and for DataStax (Cassandra).

Some nicely detailed Notes from 2015 WebRTC Conference Co-Hosted by Google.

Murat with a paper summary: One Trillion Edges, Graph Processing at Facebook-Scale.

LASP: a Language for Distributed, Eventually Consistent Computations. (video)

Terrapin: a low latency serving system providing random access over large data sets, generated by Hadoop jobs and stored on HDFS clusters. From Pinterest who has been using it for over a year to serve data for applications like Pinnability and discovery data, which in turn helps create great experiences for mobile & web Pinners. In production, Terrapin handles: Hundreds of nodes (servers/computers); Hundreds of terabytes with replication; Millions of QPS.

Stuff The Internet Says On Scalability For September 18th, 2015

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale