hot links

Stuff The Internet Says On Scalability For February 6th, 2015

High Scalability

06 Feb 2015 — 6 min read

Hey, it's HighScalability time:

What a beautiful example of Moore's law visualized through the evolution of Lara Croft! (from @silenok)

$1 million: per day gross of Clash of Clans
Quotable Quotes:
- @dancow: In 45 minutes, the largest trader in U.S. equities went bankrupt because of bad devops
- @bmdhacks: How to be a 10x engineer: Incur technical debt fast enough to appear 10x as productive as the ten engineers tasked with cleaning it up.
- @CompSciFact: Scaling poorly: Performance degrades with problem size
  Poorly scaled: Things change far more rapidly in one direction than others
- @mikiobraun: Before scaling out, a machine learning person would always try some approximation shortcut to achieve speed up. #cheating #orisit
- @cshirky: 3/4 If your organization has ever made a significant and unpleasant change based on something you measured, you can probably use more data.
- @PatrickMcFadin: Service Discovery Overview: ZooKeeper vs. Consul vs. Etcd vs. Eureka
- @jaykreps: TIL: Dequeuing a single item in RabbitMQ requires traversing every single item in the queue. Oh my.
- @Carnage4Life: No single recipe 4 success. Great companies had bad habits; Apple micromanagement, Google random side projects & Facebook used fricking PHP
- Stubbornly Persistent: although life would persist in the absence of microbes, both the quantity and quality of life would be reduced drastically.

At inflection points change the world must. Netflix: In the early days of Netflix streaming, circa 2008, we manually tracked hundreds of metrics, relying on humans to detect problems. Our approach worked for tens of servers and thousands of devices, but not for the thousands of servers and millions of devices that were in our future. Complexity and human-reliant approaches don’t scale; simplicity and algorithm-driven approaches do.

IBM is turning Watson into a platform, offering 5 new services: Speech to Text, Text to Speech, Visual Recognition, Concept Insights, Tradeoff Analytics. GA probable next month. Good discussion on Hacker News. Most of the services allow for training through feedback. Some question the quality of the services, but it's early days. Pricing is not set. Hopefully it won't suffer from what these next gen deep learning services tend to suffer from: expensivitis. Who can afford $1.00 per 1000 API calls for a mobile app that needs to acquire users? IBM, make it cheap, try for ubiquity. Cool stuff will happen.

Looking for that next step in distributed reliability? Look at TLA+. Murat has several articles on TLA+ and is using it his teaching distributed systems class. Oh, TLA stands for Temporal Logic of Actions. Leslie Lamport has many papers on TLA. James Hamilton wrote up their experiences at Amazon using TLA+: Challenges in Designing at Scale: Formal Methods in Building Robust Distributed Systems: TLA+, a formal specification language invented by ACM Turing award winner, Leslie Lamport. TLA+ is based on simple discrete math, basic set theory and predicates with which all engineers are quite familiar. A TLA+ specification simply describes the set of all possible legal behaviors (execution traces) of a system.

Which model will win? Google is cloud oriented which uses more data which uses more bandwidth. Apple is more client based which puts less pressure on bandwidth. On the Gilmor Gang.

CPU Backdoors. This is fact: who really knows what's on a chip? You think testing software is hard? What's the test for additional functionality silently added to a chip? How do you prove a negative?

APIs are not dead yet. Prismatic announced their Interest Graph API that automatically identifies the thematic content of a piece of text: each interest is implemented as a learned, statistical model that can automatically analyze the content of an article holistically.

Our new multiplatform world really sucks for programmers. Remember when HTML was everywhere? Ah, good times. So this is impressive: How Google Inbox shares 70% of its code across Android, iOS, and the Web. Google has a history of cross compilation tools, which is the approach they used: "the real enabler for Inbox is called J2ObjC, which, as the name implies, converts Java code meant for Android into iOS-ready Objective-C code." Google has more details at Going under the hood of Inbox. Good discussion on reddit, where there's a warning from experience that says all these levels of abstraction eventually collapse and you end with more work and worse results than if you had just done separate efforts to begin with. Beware of the One Ring.

10,000 Hours with Reid Hoffman: What I Learned: When there’s a complex list of pros and cons driving a potentially expensive action, Reid seeks a single decisive reason to go for it—not a blended reason. For example, we were once discussing whether it’d make sense for him to travel to China. There was the LinkedIn expansion activity in China; some fun intellectual events happening; the launch of The Start-Up of You in Chinese. A variety of possible good reasons to go, but none justified a trip in and of itself. He said, “There needs to be one decisive reason. And then the worthiness of the trip needs to be measured against that one reason. If I go, then we can backfill into the schedule all the other secondary activities. But if I go for a blended reason, I’ll almost surely come back and feel like it was a waste a time."

Where should you spend your upgrade dollar? Maybe not on faster memory. RAMing speed: Does boosting DDR4 to 3200MHz improve overall performance? DDR4 may be slightly slower than DDR3, but offers higher capacities. Excellent discussion in the comments. Is this the era of diminishing returns? Will memory move to the CPU? Who uses the bandwidth we have now?

Pinterest tells us How we made JavaScript testing 15x faster. Using C is not the corect answer. Nor was parallelism surprisingly enough. The secret was jsdom: a node.js command-line utility implementing WHATWG DOM and HTML standards and isn’t concerned with rendering, painting and other tasks that make a browser CPU and memory hungry. Internal benchmarks showed remarkable 5-20x increases in speed for most of our tests. DOM-heavy tests had the biggest performance improvements.

Designing a datacenter? Ivan Pepelnjak has a book for you. It's called: Data Center Design Case Studies. Browsing the table of contents it looks like a lot of dark magic is covered. You might like the attitude as well: the networking technology part tends to be way easier to solve than the oft-ignored application-level challenges.

Change and stasis are both hard to deal with, just in different ways. Evernote is telling the story of change in The Great Shard Migration, Part I, their effort to move multiple sharding schemes to a new standard. It's well written and full of helpful details.

Etsy found their most important pages were becoming their slowest. Not an uncommon evolution. Rebuilding the Foundation of Etsy’s Seller Tools. Their approach: refactor not rewrite, establish project wide standards in the form of style guides and page architectures, ship early so sellers can drive the process with their feedback. A Bootstrap like approach to CSS was instituted. They went thick client and the backend would work for any client. They went with Marionette instead of Backbone. The slowness was do in part to the amount of data that needed to be accessed from across a cluster of database shards. They put in orchestration layer, but it's not clear how that improved speed.

The Internet History Podcast does a great job recording the history of our era. Most recently there's an interesting talk with Amazon’s Technical Co-Founder and Employee #1, Shel Kaphan. I know from doing research in the past that there's probably more written of any one of Napoleon's battles than there is on this defining era of history.

Can you ever go back to the garden? Build Static Websites.

It's a question that comes up often and here's a good resource: Basic Rules of Cassandra Data Modeling. Don't be afraid of writes. Don't be afraid of denormalization. Spread data evenly. Minimize the number of partitions. Figure out the queries you need to satisfy and model accordingly, then "Try to create a table where you can satisfy your query by reading (roughly) one partition."

Stream processing, Event sourcing, Reactive, CEP… and making sense of it all. Good introduction to the topic.

A great read of your are running a message board site. Mike Schwartz shares his thoughts behind the process of Scaling XenForo on Digital Ocean’s IaaS. Very useful.

PayPal on Implementing a Fast and Light Weight Geo Lookup Service. Great details on a capability that is so necessary yet surprisingly hard to do.

Are we really so simple? Collision avoidance predicts pedestrians’ behavior: So, what makes this different from other models of crowd behavior? The main difference is the simplicity of the model. The strength of the reaction of a pedestrian to another is only given by a single factor: the time to collision.

Today's paper: WANalytics: Analytics for a geo-distributed, data intensive world.

Time-series storage design (part one): Akumuli uses different approach. All data stored in one large WAL that consists from many volumes. Each volume allocated beforehand and should have a fixed size (4Gb). WAL is append only.

The wise choice? Sophia: an embeddable, transactional key-value database. It has unique architecture that was created as a result of research and reconsideration primary algorithmic constraints of Log-file based data structures, such as LSM-tree. Sophia is designed for fast write (append-only) and read (range query-optimized, adaptive architecture) small to medium-sized key-values.

Greg Linden has another tasty set of Quick links.

Stuff The Internet Says On Scalability For February 6th, 2015

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale