hot links

Stuff The Internet Says On Scalability For April 25th, 2014

High Scalability

25 Apr 2014 — 8 min read

Hey, it's HighScalability time:

New World Record BASE jumping from World's Tallest Building. #crazy

30 billion: total Pinterest pins; 500,000,000: What'sApp users (700 million photos and 100 million videos every single day); 1 billion: Facebook active users on phones and tablets.
Quotable Quotes:
- @jimplush: Google spent 2.3 billion on infrastructure in Q1. Remember that when you say you want to be "the Google of something"
- Clay Shirky: I think one of the things that happened to the P2P market for infrastructure is that users preference for predictable pricing vs resource-sensitive pricing is so overwhelming that they will overpay to anyone who can promise flat prices. And because the logic of centralization vs decentralization is so price sensitive, I don't think there is any logical reason to assume a broadly stable class of apps, separate from current pricing data for energy, cycles, and storage.
- @chipchilders: Stop freaking making new projects just for the sake of loose coupling. Damn it people.
- Benedict Evans (paraphrased): A startup 15 years ago raised 10 million dollars, had 100 people, and a million users. Now you raise a million dollars, have 10 people, and a 100 million users.
- @francesc: "Go was created for the cloud infrastructure, when we used to call it servers" - @rob_pike at #gophercon
- @postwait: distributed system: an arbitrarily large state machine w/ "unknown" & "f*cked" states wherein you can't observe the movement between states.
- @enneff: "In Ruby regular expressions are actually very fast... compared to all the other things you can do." --@derekcollison LOL #gophercon
- Steve Jobs: This needs to be like magic. Go back, this isn’t magical enough!
- @jamesurquhart: The complexity isn’t in the tech, it is in the interconnected apps and comps in systems. Managing interconnectedness is managing complexity.
- Alex Pentland: Put another way, social physics is about how human behavior is driven by the exchange of ideas—how people cooperate to discover, select, and learn strategies and coordinate their actions—rather than how markets are driven by the exchange of money.

Steve Jobs with the carrot: This is important, this needs to happen, and you do it. And now this stick: Guess what, you’re Margaret from now on.

A fantastic look at Uplink Latency of WiFi and 4G Networks by Ilya Grigorik: WiFi can deliver low latency first hop if the network is mostly idle. By contrast, 4G networks require coordination between the device and the radio tower for each uplink transfer. First off, latency aside, and regardless of wireless technology, consider the energy costs of your network transfers! Periodic transfers incur high energy overhead due to the need to wake up the radio on each transmission. Second, same periodic transfers also incur high uplink coordination overhead - 4G in particular. In short, don't trickle data. Aggregate your network requests and fire them in one batch: you will reduce energy costs and reduce latency by amortizing scheduling overhead.

As someone who embraced HTLML because it did away with that whole design thing in favor of the doing thing, please let this be so, not because I have any animosity to any of the products, but because I suck at design: Designer Duds: Losing Our Seat at the Table: It’s now 2014, and I doubt seriously whether I’m alone in feeling a sense of anxiety about how “design” is using its seat at the table. From the failure of “design-oriented” Path [1] to the recent news that Square is seeking a buyer [2] to the fact that Medium is paying people to use it [3], there’s evidence that the luminaries of our community have been unable to use design to achieve market success. More troubling, much of the work for which we express the most enthusiasm seems superficial, narrow in its conception of design, shallow in its ambitions, or just ineffective.

It's like Sherlock for programmers. How to detect bank loan fraud with graphs : part 2. A fun way to use graph databases, finding criminals by analyzing patterns using graph algorithms. Also, Building a Graph-based Analytics Platform: Part I.

Cache Invalidation Strategies With Varnish Cache. Good explanation of different techniques: purging, bans, tagging, grace, TTL. It covers the issue of distributing cache invalidations to multiple caches, but it doesn't seem fault tolerant.

To be smarter the brain had to get more social. A general pattern for intelligence at different scales? Finding turns neuroanatomy on its head: Researchers present new view of myelin: The fact that it is the most evolved neurons, the ones that have expanded dramatically in humans, suggests that what we're seeing might be the "future." As neuronal diversity increases and the brain needs to process more and more complex information, neurons change the way they use myelin to "achieve" more. It is possible that these profiles of myelination may be giving neurons an opportunity to branch out and 'talk' to neighboring neurons. These long myelin gaps may be needed to increase neuronal communication and synchronize responses across different neurons. < For more on the amazing ways human social networks improve problem solving take a look at Social Physics: How Good Ideas Spread-The Lessons from a New Science.

Videos are now available from the YOW! Conferences and Workshops held in Brisbane, Melbourne. Quite a variety of topics. Some titles: Explorations in Next Generation Web Languages, Living in a Post-Functional World, and a A Little Graph Theory for the Busy Developer.

Humans, tribal to the end. Maintaining open source projects is hard: This is one of the frustrating parts of open source. It's hard to team up with others across perceived boundaries. Yet another example I saw recently is Chef vs. SaltStack. What's the difference? They both do automation. Sure, they have different customers and different architectures. But the obvious difference is Chef is for Ruby people and SaltStack is for Python people. That's all there is to it sometimes.

How does Stack Exchange work around the lack of Redis clustering? Simply: We only solve things that are actually problems, but we have plenty of options. To explain our setup - we currently have only 4 physical redis servers for the production environment (servers for dev/QA/etc are irrelevant to this discussion); 2 in each DC. We have 8 redis instances on each redis box currently, that each serve fundamentally different roles - one for our core cache etc (where "core" here means "not careers"), one for machine-learning, one for pub/sub distribution, etc. These 8 instances are then replicated (master/slave) between the servers for HA / DR, persistence and performance reasons (noting that SE.Redis allows individual commands to be targeted, i.e. "demand master", "prefer master", "prefer slave", "demand slave").

Adrian Cockcroft on Disrupting the Storage Industry. What's Next? Cassandra with local disk replaces Oracle/MySQL, SAN, Array, etc; AWS S3 and Google Data Store replace tape for backup/archive; Cloud prices halve every two years; Epic Google and AWS price war (everyone else dies); SSD moves to the memory channel, for even lower latency.

Yes, he is everywhere. Adrian Cockcroft on Migrating to Microservices. Key theme is microservices are one of the strategies for satisfying your need for speed to market. Teams own service groups; one verb per single function micro-service; size doesn't matter; one developer independently produces a micro-service; each micro-service is it's own build; stateless business logic; stateful cached data access layer. Microservices feed into a particular development and deployment models that enable faster time to market along with high availability.

Ticks a lot of boxes on the database checklist. Symas Lightning Memory-Mapped Database (LMDB): LMDB is an ultra-fast, ultra-compact key-value embedded data store developed by Symas for the OpenLDAP Project. It uses memory-mapped files, so it has the read performance of a pure in-memory database while still offering the persistence of standard disk-based databases, and is only limited to the size of the virtual address space, (it is not limited to the size of physical RAM).

Iron.io explains how they use Docker to keep containers up to date with new software versions. Nice details. The advantages: Building separate and isolated environments for each runtime/language; Obtaining support for CoW filesystem (which translates into a more secure and efficient image management approach); Having a reliable way to switch between different runtimes on the fly.

Horizontal and vertical scaling should work together. Five reasons why vertical scalability matters: Having more cores offers more consistent performance; Simplified debugging and performance characteristics; Good insurance for the unknown; Increased efficiency at scale; An alternative consolidation strategy to virtualization.

Sweet post on The anatomy of connection pooling. Graphs and everthing! Connections are expensive so the goal is reduce creation/deletion churn. So you create a pool of pre-opened connections and share them amongst apps. The impact: The connection pooling is 600 times faster than the no pooling alternative.

Robert Engels: This is why really low-latency systems (even with Java), use separate processes in order to better isolate cache usage, but even here, unless your complete data set fits in the unshared L2 cache, you're going to have a problem as the Level 3 cache is going to be blown by other "enterprise" processes running on the other cores. You can isolate these processes onto different machines and pay the network penalty, or you need to not just do core isolation, but CPU isolation, and that gets expensive real quick...

So don't bet on cheap flash in your pricing models just yet. Henry Newman on HD vs. SSD Economics: Flash can do everything that hard disk can. The supply of both flash and hard disk is constrained. Thus flash will command a premium over hard disk prices so that the market directs the limited supply of flash to those applications, such as tablets, smartphones, and high-performance servers, where its added value is highest.

Some excellent live blogging is happening for the first annual GopherCon in Denver, Colorado. Looks like a lot of good info being shared. 700 is a lot of attendees for a first time conference. A lot of developers are on the Go.

Microservices: It’s not (only) the size that matters, it’s (also) how you use them – part 3. I love the conclusion here: 2 way communication between services is the root of many problems and these problems are not getting any smaller by making our services smaller (microservices). We have seen that asynchronous communication can break the temporal coupling between our services, but ONLY if it takes place as a true one-way communication.

Notes On Concurrent Ring Buffer Queue Mechanics: Having recently implemented an MPMC queue for JAQ (my concurrent queue collection) I have been bitten several times by the quirks of the underlying data structure and it's behaviour. This is an attempt at capturing some of the lessons learnt. I call these aspects or observations 'mechanics' because that's how they feel to me (you could call them Betty).

Lots of videos from on Carnegie Mellon Computer Architecture. Learn the deeper magic behind the lesser magic.

The Impact of the IaaS Price Wars: I truly believe that the basic IaaS services of compute, network, and disk are essentially a commodity now, and it is the PaaS capabilities that will set these players apart in the long haul. That is also why I am closely watching the PaaS pure players out there to see if companies like IBM, Oracle, HP, and others purchase them and add them to their IaaS offerings so they can catch up to AWS.

Sirius - A distributed system library for managing application reference data. From Comcast, it uses Scala, Paxos, and Akka to maintain a distributed, low-latency, in-memory reference datasets. Looks like an interesting approach.

Bridging the Gap: Opportunities in Coordination-Avoiding Databases: Weakly consistent systems surface a controversial tension between, on the one hand, availability, latency, and performance, and, on the other, programmability. We propose the concept of coordination avoidance as a unifying, underlying principle behind the former and discuss lessons from our recent experiences mitigating the latter.

So how do I actually program using causal consistency? Don't Settle for Eventual Consistency: To demonstrate the scalability of Eiger, we ran the Facebook TAO workload on N client machines that fully loaded an N-server cluster that is replicating writes to another N-server cluster (i.e., the N=128 experiment involves 384 machines). This experiment was run on PRObE's Kodiak testbed,6 which provides an Emulab with exclusive access to hundreds of machines. Figure 12 shows the throughput for Eiger as N scales from eight to 128 servers/cluster. The bars show throughput normalized against the throughput of the eight-server cluster. Eiger scales out as the number of servers increases; each doubling of the number of servers increases cluster throughput by 96 percent on average.

Same question as above. How do we build real programs using CRDTs? Strong Eventual Consistency and Conflict-free Replicated Data Types: In this talk, we present sufficient conditions for SEC and a strong equivalence between the conditions. We show that SEC is incomparable to sequential consistency. We present some basic CRDTs, and study in depth advanced CRDTs such as sets, graphs, and a sequence CRDT for co-operative text editing.

Stuff The Internet Says On Scalability For April 25th, 2014

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale