hot links

Stuff The Internet Says On Scalability For July 5, 2013

Hey, it's HighScalability time:

(Dolls nerds can nest with)

Quotable Quotes:
- @Carnage4Life: "Google uses Bayesian filtering the way Microsoft uses the if statement" - http://www.joelonsoftware.com/items/2005/10/17.html … <= finally at the point where I get this
- @etherealmind: You can dramatically improve blog performance by blocking Amazon IP address ranges. Tells you how much information mining is occurring.
- Randy Bias: Choice is possible only when there’s architectural consistency between public and private cloud infrastructure. Those who focus only on API compatibility are either confused or intentionally misleading people. There is NO API COMPATIBILITY without architectural compatibility.
- Nassim Nicholas Taleb: Everything that is fragile and still in existence (that is, unbroken), will be harmed more by a certain stressor of intensity X than by k times a stressor of intensity X/k, up to the point of breaking.

First Law of Parallelism: all instructions are in parallel until acted on by a serializing force.

Murat Demirbas takes on Spanner: Google's Globally-Distributed Database with a helpful summary of what it is (scalable, multi-version, globally-distributed, and synchronously-replicated database) and an insightful look into what TrueTime is good for (snapshot reads (reads in the past)).

Sean Hull with the secret of life: Scalability Happiness – A Quiet Query Log: In 17 years of consulting that is the single largest cause of scalability problems. Fix those queries and your problems are over. By and large, if scalability is our goal, we should work to quiet the activity in the slow query log. This is an active project for developers & DBAs. Keep it quiet and your server will run well.

Joyent released Manta, a highly scalable, distributed object storage service with integrated compute. Dave Pacheco wrote a nice overview on fault tolerance in Manta. Data is saved in multiple AZs in a single region.

The Lindy Effect: the lifetimes of intellectual artifacts follow a power law distribution. Also, There's Just No Getting around It: You're Building a Distributed System.

Simwood is using Redis to implement anti-fraud and security features within microseconds. Redis can do 100k reads or writes per second on commodity hardware and over 1m on workstation grade hardware. They switched from an architecture based on MySQL using SSD and memory tables. Offline map-reduce was replaced with real-time.

Nice summary by w300x of Johan Bergström "What, Where And When Is Risk In System Design: Good talk, what I found most enlightening is the fact that developers tend to think in terms of complex systems, and sysadmins tend to think of infrastructures as machinery peiced together. Hence why in systems it's often the case that you increase redundancy of components and in development you often reuse reliable code.

How We Built Filmgrain, Part 2 of 2. In the follow up they went against type and built an API on TCP instead of HTTP, but still chose to use JSON, which is an interesting choice. They chose client based load balancing from a list of endpoints provided from S3. Python and gevent are used to concurrently process messages.

Useful old technologies: ASN.1. Agreed, ASN.1 rocks, it was both expressive and efficient. JSON, not so much.

A very easy way to get To Know a Garbage Collector: Teaching the basics of 8 technical terms, 2 algorithms, and comparing and contrasting them in that short a period of time is an extremely fun challenge.

Excellent explanation of Normal vs. Fat-tailed Distributions. Why the difference matters: Let's say people deposit their money in your bank, and you use it to place bets. If you think the outcomes of the bets are normal, but they're actually fat-tailed, the bets will still pay off most of the time. But sometimes you'll be very, very wrong.

Cloud server showdown: Amazon AWS EC2 vs. Linode vs. DigitalOcean. And the winner is benchmark controversy...Linode. Their 8 core server really rocks in this case.

Man Invents New Language for Turning Graphics Chips Into Supercomputers: the trick is that you have to build new software that’s specifically designed to tap into these chips. He just released a new programming language called Harlan dedicated to building applications that run GPUs. “GPU programming still requires the programmer to manage a lot of low-level details that often distract them from the core of what they’re trying to do,” says Eric Holk. “We wanted a system that could manage these details for the programmer, letting them be more productive and still getting good performance from the GPU.”

The King is dead long live the web server: Nginx just became the most used web server among the top 1000 websites.

Trash Your Servers and Burn Your Code: Immutable Infrastructure and Disposable Components: So why not take this approach (where possible) with infrastructure? If you absolutely know a system has been created via automation and never changed since the moment of creation, most of the problems I describe above disappear. Need to upgrade? No problem. Build a new, upgraded system and throw the old one away. New app revision? Same thing. Build a server (or image) with a new revision and throw away the old ones.

Vanilla Java shows how to profile for micro jitter on a new machine: Using thread affinity, without isolating the CPU doesn't appear to help much on this system. I suspect this is true of other versions of Linux and even Windows. Where affinity and isolation helps, it may still make sense to busy wait as it appears the scheduler will interrupt the thread less often if you do.

ScalaDays 2013 Presentations are now available online.

With the App Store becoming a digital Thunder Dome, Agant is going back to a one programmer shop. Good discussion On Hacker News.

Cool Transaction Library for DynamoDB.

Afraid of SSD? The story goes on: which database operation do you think causes most random IO operations? Of course it’s our old friend the join—it is the sole purpose of joins to gather many little data fragments from different places and combine them into the result we want. Joins can also greatly benefit from SSDs. SSDs actually voids one of arguments often brought up by NoSQL folks against relational databases: with SSD it doesn’t matter that much if you fetch data from one place or from many places.

You like me, you really like me. It turns out Basho would select Erlang all over again. True love. They love its "let it crash" and many "small heaps" approach to resilience.

If you are writing the equivalent of a scheduler in your application then Go 1.1's new scheduler may be of some help. Writing schedulers is definitely hard. Bypassing the OS for a user space scheduler is the key.

Large Scale Document Clustering: Clustering and Searching 50 Million Web Pages: The final result is that this approach is able to search 13 fold less documents than the previous best reported approach on the ClueWeb09 Category B 50 million document collection. The theoretical clustering evaluation at INEX suggested that fine grained clusters allow better ranking of clusters for collection selection.

Not what you think it is. FluxCapacitor - a Java-based reference application demonstrating the use of many Netflix Open Source. Looks amazing.

Native Code Performance and Memory: The Elephant in the CPU: Want to create fast, fluid apps? This talk provides an overview of new hardware and how C++ lets you, the developer, take advantage of it. Learn more about compiler optimizations and CPU performance; auto-vectorization and scalar optimizations will be highlighted!

A comprehensive study of Convergent and Commutative Replicated Data Types: Eventual consistency aims to ensure that replicas of some mutable shared object converge without foreground synchronisation. Previous approaches to eventual consistency are ad-hoc and error-prone. We study a principled approach: to base the design of shared data types on some simple formal conditions that are suﬃcient to guarantee eventual consistency. We call these types Convergent or Commutative Replicated Data Types.

Janus: Optimal Flash Provisioning for Cloud Storage Workloads: Using measurements from production workloads in multiple data centers using these recommendations, as well as traces of other production workloads, we show that the resulting allocation improves the ﬂash hit rate by 47–76% compared to a uniﬁed tier shared by all workloads.

Don't forget to put Geek Reading by Robert Diana in your infoplex. Always a good collection of links.

Stuff The Internet Says On Scalability For July 5, 2013

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale