hot links

Stuff The Internet Says On Scalability For November 21st, 2014

High Scalability

21 Nov 2014 — 7 min read

Hey, it's HighScalability time:

Sweet dreams brave Philae. May you awaken to a bright-throned dawn for one last challenge.

80 million: bacteria xferred in a juicy kiss;
Quotable Quotes:
- James Hamilton: Every day, AWS adds enough new server capacity to support all of Amazon's global infrastrucrture when it was a $7B annual revenue enterprise.
- @iglazer: What is the test that could most destroy your business model? Test for that. @adrianco #defragcon
- @zhilvis: Prefer decoupling over duplication. Coupling will kill you before duplication does by @ICooper #buildstufflt
- @jmbroad: "Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful." ~George Box
- @RichardWarburto: Optimisation maybe premature but measurement isn't.
- @joeerl: Hell hath no version numbers - the great ones saw no need for version numbers - they used port numbers instead. See, for example, RFC 821,
- JustCallMeBen: tldr: queues only help to flatten out burst load. Make sure your maintained throughput is high enough.
- @rolandkuhn: «the event log is a database of the past, not just of the present» — @jboner at #reactconf
- @ChiefScientist: CRUD is dead. -- @jboner #reactconf
- @fdmts: 30T of flash disks cabled up and online. Thanks @scalableinfo!
- monocasa: Immutable, statically linked, minimal system images running micro services on top of a hypervisor is a very old concept too. This is basically the direction IBM went in the 60's with their hypervisors and they haven't looked back.
- Kiril Savino: Scaling is the process of decoupling load from latency.

Perhaps they were controlled by a master AI? Google and Stanford Built Similar Neural Networks Without Knowing It: Neural networks can be plugged into one another in a very natural way. So we simply take a convolutional neural network, which understands the content of images, and then we take a recurrent neural network, which is very good at processing language, and we plug one into the other. They speak to each other—they can take an image and describe it in a sentence.

You know how you never really believed the view in MVC was ever really separate? Now this is MVC. WatchKit apps run on the iPhone as an extension, only the UI component runs on the watch. XWindows would be so proud.

Shopify shows how they Build an Internal Cloud with Docker and CoreOS: Shopify is a large Ruby on Rails application that has undergone massive scaling in recent years. Our production servers are able to scale to over 8,000 requests per second by spreading the load across 1700 cores and 6 TB RAM.

Machine learning isn't just about creating humavoire AIs. It's a technology, like electricity, that will transform everything it affixes with its cyclops gaze. Here's a seemingly mundane example from Google, as discussed on the Green (Low Carbon) Data Center Blog. Google has turned inward, applying Machine Learning to its data center fleet. The result: Google achieved from 8% to 25% reduction in its energy used to cool the data center with an average of 15%. Who wouldn’t be excited to save an average of 15% on their cooling energy costs by providing new settings to run the mechanical plant? < And this is how the world will keep those productivity increases reaching skyward.

Does anyone say "I love my water service"? Or "I love my garbage service"? Then why would anyone say "I love Facebook"? That's when you've arrived. When you are so much a part of the way things are that people don't even think of loving them or not. They just are. The Fall of Facebook.

How Nike thinks about app development: Lots of micro services: Nike's plan: Build a series of services that do little things like checkout and reading data and then bring them together into larger apps that'll be easier to tweak in the future.

Some slides and videos from React San Francisco 2014 are now available.

Cross platform development ssucks, hard, but here's something interesting. Google Inbox shares 70% of its code across Android, iOS, and the Web using J2ObjC, which converts Android Java code to iOS-ready Objective-C code. How Google Inbox shares 70% of its code across Android, iOS, and the Web.

Dominic Umbeer with a nice set of notes on GOTO Berlin 2014 Day 1 and Day 2.

If I said a post on Hacker News had 892+ comments, what would be the topic? Docker? iOS vs Android? Nope. How about .NET. Microsoft takes .NET open source and cross-platform. Some very good comments, but is it too late? .NET/C#/Visual Studio is an excellent platform, so maybe not. But at least now there's a chance.

Networking is still the bottleneck and Intel wants to pop the top off. Omni-Path architecture, will come to market next year, offering 100 Gb/sec links on switches that are denser and zippier than InfiniBand gear.

This. trhway: their schematics [Fabric, the next-generation Facebook data center network] of datacenter reminds about schematics of a big server 15 years ago. Server racks instead of CPU-boards. "The datacenter is the computer."

Marc Gravell: one thing that I've learned over and over again is: at the application level, sure: do what you want - you're not going to upset the runtime / etc - but at the library level (in these areas, and others) you often need to do quite a bit of work to minimize CPU and memory (allocation / collection) overhead.

How a Memory Is Made: On the other hand, he said, these experiments are “limited, because in the real world, real memory is not about single strong memories.” Rather, said Silva, we remember events as “strings” of individual sensory memories.

Surprising performance boost by pinning single threaded application to a core? No.

Autoscaling, welcome to Google Compute Engine: Autoscaler can respond to a number of different metrics such as CPU load, QPS on a HTTP Load Balancer and metrics...Autoscaler performs well even in unexpected scenarios such as sudden traffic spikes...an application could scale from zero to handling over 1.5 million requests per second using Autoscaler.

How beautiful...Memex #001 Final. Is there a science fiction book where Vannevar Bush was able to realize his vision? That would be interesting.

A lost art, but thus stuff really makes a difference. Coding for Performance: Data alignment and structures: This article collects the general knowledge and Best-Known-Methods (BKMs) for aligning of data within structures in order to achieve optimal performance.

If you like the JVM tool chain, but not the Java language, then consider Clojure at Scale: Why Python Just Wasn’t Enough for AppsFlyer: 2 Billion events per day...We started to encounter issues like one of the critical Python processes taking too long to digest the incoming messages...We’ve been toying around with the idea of introducing Functional Programming into the company for some time...the entire system is based on micro-services...Clojure provides its own approach to concurrency and it might take some time to adjust to it...This is a huge advantage: coding is more focused on the logic itself, rather than the plumbing around locks...We experienced a significant performance boost when we moved AppsFlyer to Clojure. In addition, using functional programming allows us to have a really small code base with only a few hundred lines of code for each service. The effects of working in Clojure dramatically speed up the development time and allow us to create a new service in days.

When might you want to replace TCP? When you discover Netlix is chewing up 9.5% of upstream traffic on the North American Internet with ACKs. Since connections are usually asymmetric, meaning upstream connections often suck, relatively speaking, ACK drops on the upstream can cause throttling and degradation on the downstream. Replace with what? A UDP based protocol that doesn't use ACKs.

Here's how Pinterest built Pinterest News. The problem: rank millions of events a day and use that to construct a feed for each individual Pinner. The process: decide if creating the service on multiple platforms for 10s of millions of users has the required ROI; decide it doesn't need to be real-time; build out an infrastructure that could scale to 10 percent of users rather than 100 percent; initially build out the feature on iOS. They used two internal services on the backend: Zen and PinLater. They built a queuing system on top of Redis.

ScaleScale show there's a lot more to DNS than a simple lookup. A lot more. Global Routing with Anycasted DNS: It’s common for us to do something like: pick a region (e.g. US-WEST or US-EAST), then within the region, send 95% of traffic to colo and 5% to AWS, and make that sticky so most of the time the same 5% of users to go AWS to ensure good cache locality, and if colo infrastructure gets overloaded, start shifting weight so more traffic goes to AWS with auto-scaling enabled, and on and on. There’s a lot going on in our stack. On the delivery side, we’re touching stuff from the hardware/NIC level (crazy packet filtering), doing deep traffic engineering in BGP, leveraging low level kernel features to get as precise as routing DNS queries to specific cores to maximize cache locality, and hitting a totally custom written nameserver that executes complex routing algorithms for every single request. At a higher level, what we’ve built is a big globally distributed real-time system, and we’ve tried to use the right tools for the right jobs.

Here's LinkedIn Operating Apache Samza at Scale: Apache Samza is a framework that gives a developer the tools they need to process messages at an incredibly high rate of speed while still maintaining fault tolerance. It allows us to perform some interesting near-line calculations on our data without major architectural changes. Samza has the ability to retrieve messages from Kafka, process them in some way, then output the results back to Kafka.

Murat with more of his patent pending paper summaries. Paper Summary: Calvin, Distributed transactions for database systems. Paper Summary: Granola, Low overhead distributed transaction coordination.

Adventures in Encodings: Ideally, we could have small ziplists linked together to allow for quick single-ziplist operations while still not limiting the number of elements we can store in an efficient, pointer-free data structure.

The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox: the data management community has focused heavily on the design of systems to support training complex models on large datasets. Unfortunately, the design of these systems largely ignores a critical component of the overall analytics process: the deployment and serving of models at scale.

Stuff The Internet Says On Scalability For November 21st, 2014

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale