hot links

Stuff The Internet Says On Scalability For February 21st, 2014

High Scalability

21 Feb 2014 — 11 min read

Hey, it's HighScalability time (a particularly bountiful week):

The Telephone Wires of Manhattan in 1887 (full)

$19 billion: you know what it is; $46 billion: cost of Sochi Olympics; 400 gigabytes: data transmitted during the Sochi opening ceremony; 26.9 million: Stack Overflow community monthly visitors; 93 million: Candy Crush daily active users; 200-400 Gbps: The New Normal in DDoS Attacks
Quotable Quotes:
- @brianacton: Facebook turned me down. It was a great opportunity to connect with some fantastic people. Looking forward to life's next adventure.
- @BenedictEvans: Flickr: $35m. Youtube: $1.65bn Whatsapp: $19bn. Mobile is big. And global. And the next computing platform. Paying attention?
- @taziden: On the Internet, worst cases will become common cases #fosdem #postfix
- Brian Hayes: Any quantum program must have a stovepipe architecture: Information flows straight through.

So you think Verizon is stealing your Netflix bandwidth? Not so fast says Dan Rayburn, Netflix’s Streaming Quality Is Based On Business Decisions by Netflix & ISPs, Not Net Neutrality: It's both, it is Netflix but it's also ISPs who don't want to spend the money to improve the end result to the customer.

DIDO is back! Many moons ago I wrote How Will DIDO Wireless Networking Change Everything? DIDO, a low-latency high bandwidth wireless network technology that promises each wireless user to use the full data rate of shared spectrum simultaneously with all other users, by eliminating interference between users sharing the same spectrum. That was three years ago. Does it work yet? Yes, quite possibly: Perlman’s pCell: The super-fast future of wireless networking, or too good to be true? Exciting stuff.

Great 100+ comment thread on reddit about coding for SSDs that covers a series of 6 SSD related articles: Coding for SSDs – Part 1: Introduction and Table of Contents. A raging debate about which are the real optimization strategies and which are just legend.

If you want to know how those spooky ads follow you all over the place, popping up on every site you visit like a bad hallucination, then give Dare Obasanjo's article How Facebook Knows What You Looked at on Amazon a read. Facebook uses the Facebook Exchange along with the exchange of identity tokens in collusion with other ad networks and sites.

This has implications beyond way beyond sports. Small Data in Sports: Little Differences that Mean Big Outcomes: Strata 2014: The gap between legendary and anonymity in sports is often less than a 1% performance difference in elite sports. Thus, finding the core, modifiable variables that determine performance and tweaking them ever so slightly can alchemize silver medals into gold ones.

Chip Overclock with a fascinating historical tour of DTMF (dual-tone multi-frequency) signalling, which is apparently still used in the aeronautics industry. Plus useful coverage of Fourier Transforms and Spectral Analysis.

I think of type systems as a kind of the physics of a programming environment, encoding the physical laws of computation. Michael Bernste has a much more nuanced explanation in What is a Type System for?: The idea that the purpose of a type system is to prevent undesirable application states eluded me for some time, and I believe this is the case for many other people as well.

Capers Jones lays down the laws, that is: Programming Laws and Reality: Do We Know What We Think We Know? You may not know all these laws, but feel free to follow them as they are supported by data gathered from 20,000 projects. But I wonder of some of these laws don't apply to web projects where faster iteration, frequent deployment, and frequent addition of new design and features is the rule? The laws: Boehm's Second Law; Brooks' Law; Conway's Law; Cunningham's Law of Technical Debt; Hartree's Law; Jones's Law of Programming Language Utility #3; Jones's Law of Software Defect Removal; Lehman/Belady Laws of Software Evolution; Senge's Law; Wirth's Law; Yannis' Law.

Ethereum Scalability and Decentralization Updates: The Bitcoin blockchain is currently over 12 GB in size, requiring a period of several days for a new bitcoind node to fully synchronize, the UTXO set that must be stored in RAM is approaching 500 MB, and continued software improvements in the source code are simply not enough to alleviate the trend. With every passing year, it becomes more and more difficult for an ordinary user to locally run a fully functional Bitcoin node on their own desktop, and even as the price, merchant acceptance and popularity of Bitcoin has skyrocketed the number of full nodes in the network has essentially stayed the same since 2011.

Move All The Things to the Cloud. James Hamilton on Energy Efficiency of Cloud Computing: Moving all office workers in the United States to the cloud could reduce the energy used by information technology by up to 87%. These energy savings are mainly driven by increased data center efficiency when using cloud services (email, calendars, and more). The cloud supports many products at a time, so it can more efficiently distribute resources among many users. That means we can do more with less energy.

Ivan Pepelnjak says use virtual servers for firewalls, load balancers, and IPS/IDS systems, but they don't have enough horse power for L2/L3 switching. Software hasn't eaten everything...yet.

The Pre-Socratics argued the impossibility of change. So Why Aren't All Data Immutable?: It can be hard to get decent transaction processing performance based on append-only methods. One reason for the popularity of update-in-place approaches is simple: storage used to be really expensive. This is no longer the case.

Harish Ganesan is a fountain of useful AWS information, so you might like 24 Best Practice Tips for architecting your Amazon VPC. Not a simple process, but very powerful in practice.

Not a simple trick, but here's how Soundcloud creates a nice visual effect when "displaying images that greatly reduces the perceived load time of the image, and it is especially effective for long-lived, single-page applications."

The Fallacy of First Mover Advantage: Don't worry if someone else has done your idea. First Mover advantage is not writ in stone. You can still succeed as have Apple, Facebook, Google, etc.

Tim Bray finds Google doesn't want remote workers. Then what's all that fancy conferencing equipment for?

Multi-core scaling: it’s not multi-threaded: What I’m trying to show you here is that “multi-core” doesn’t automatically mean “multi-threaded”. Snort is single-threaded, but a multi-core product. It doesn’t actually use memory-mapping to share data among processes, and therefore lacks some features, but they probably will in the future.

It's not plastics any more. Storage Mojo says it's plasmonics & metamaterials: Basically, metamaterials and plasmons enable new ways of writing and reading. While I’ve stressed the optical options, plasmons can also be magnetic, perhaps making them more applicable to today’s hard drives. Also, Where does ReRAM fit? Also also, StorageMojo’s Best Papers of FAST ’14.

Snapchat Hired Away One Of Google's Top Cloud Engineers. Maybe Snapchat is not so Dirt Cheap to Run after all?

"It is possible to drastically reduce the capital cost of building a data centre network" says Greg Ferro in Why Cheap Network Equipment Makes a Better Data Centre. Some lessons: ECMP network designs are awesome; Cheap hardware changes the way you build networks; Cheap also means replaceable; Low Cost Got Business Attention; We bought some spare equipment because it was cheap; Features in the network are “missing”; Go multi-vendor.

Pron with a clear and concise definition. Designing scalable software for multicore processors: I think the distinction between concurrency and parallelism is apt here. Parallelism is the cooperative use of multiple cores to accelerate a single computation by splitting the data (data parallelism), while concurrency is about regulating competitive access to resources by concurrent, distinct, computations. Commutativity and what you call set semantics are relevant for parallelism, while queues are more appropriate for concurrency.

If this sounds familiar, "I spend more than half of my time integrating, cleansing and transforming data without doing any actual analysis. Most of the time I’m lucky if I get to do any ‘analysis’ at all." then you'll love this presentation Skills of the Agile Data Wrangler. Great commiseration and insights.

The case for JMH when benchmarking Java.

AppNexus on How Bad Can 1GB Pages Be? In short, manufacturers are adding huge pages because translating virtual addresses to physical ones is slow. The address translation table is (mostly) stored in normal memory and is too large to fit in cache. Thus, translating a virtual address requires 4 reads, any of which can hit uncached memory.

CrowdProcess shows the power of running parallel jobs in browsers (and by implication JavaScript). They see linear speedups with some jobs 288x faster than when run on a single machine. Yes, that's what happens when you parallelize something, but this is in a browser.

Interesting how when gigahertz can't be the marketing strategy how the message has to go up the value chain, to the business value of the new technology. Intel introduces new 15-core server chips to handle big data from the internet of things. Intel’s data center group, said that the new chips have set 20 new world records in mission critical performance.

Motivated by a massive storm in his Slovenian' home, we have more Ivan Pepelnjak (always a good thing), this time on disaster recovery (part1, part2): Things that look great might actually do more harm than good. Technologies that look great in PowerPoint might bring down your network; Find the simplest possible technology that will meet your recovery time objectives and stick with it; Go for easy wins that solve the most pressing problems; Recovery isn’t instantaneous; Untested recovery solutions are useless. Relying on them is stupid; Documentation is mandatory; Don’t trust subcontractors without verifying their work at least a few times; Know when to give up; Learn from your errors; Update your plan with anything new. And my favorite: Badly implemented redundant design is sometimes worse than a non-redundant one.

A very good look at how one of the most popular messaging layers was designed and implemented. ZeroMQ: The Design of Messaging Middleware.

Excellent article on Video Processing at Dropbox. Lessons: The combination of pre-transcoding, shorter segments at the beginning and lowered buffering time in the video processing pipeline allowed to reach our goal of 2-3 seconds startup time on a client on a good connection; pre-transcoding everything would be nice and makes things much easier to implement, however it’s too expensive at our scale; fast-starting is required to handle video files generated by mobile devices if you want the transcoder to progress while you feed data into it; HTTP Live Streaming is a great solution to enable streaming over heterogeneous networks/devices and allows for flexible and creative solutions in the way you structure your output; load balancing is not to be underestimated. It’s a tricky problem and can easily trash your system if done wrong; experimenting with ffmpeg parameters lets you explore the tradeoff between quality and latency that is appropriate for your application.

Detailed information on Scaling Elasticsearch with Indexing from Wordpress.com, who really should know. They: create one index for every 10 million blogs, with 25 shards per index; We use index templates so that as our system tries to index to a non-existent index the index is created dynamically.

Distributed Semaphores with RabbitMQ: In this blog post we are going to address the problem of controlling the access to a particular resource in a distributed system.

Ars Technica with wonderful exploration of the tech behind the Olympics: Forget its hotels, Sochi’s tech has been up for the Olympic challenge. Once you get past the panopticon level of surveillance it's quite impressive: 2,500 wireless access points to handle 120,000 concurrent device connections; first in history to offer LTE coverage (4G) with 270 LTE-capable base stations; over 137 miles of fiber for the LTE; LTE-Advanced (LTE-A) trial in Sochi, downlink speed reached 271.75Mbps; built more than 621 miles of fiber lines to provide 35 Olympic objects with Internet connectivity (up to 140Gbps total throughput), organized a separate 110Gbps link for 30 broadcasters; 5,600 computers and 400 servers at the Olympics.

A Different View of Hadoop Performance: The lesson here is, don’t just blindly tune to your data set; break out your tools and figure out what’s actually going wrong. A lot of people misunderstand, or simply ignore, how network performance can affect your MapReduce speed. We can rely on Hadoop’s resilience to make sure the job finishes, but we’re explicitly masking correctness issues with performance issues. Just eliminating retries and TCP retransmission, we can get a speedup of 80%+. That’s the real problem to solve.

Caffe: aims to provide computer vision scientists with a clean, modifiable implementation of state-of-the-art deep learning algorithms.

What should you use for session storage? So many options. Session Handling for 1 million requests per hour: All these seems like much smaller issues when compared to other alternatives. So we went with DynamoDB. Over the last few months, as our traffic increased from 10,000 requests per hour to 1 million requests per hour, all we had to do to scale was increase read IOPS from 100 to 1000 and write IOPS from 10 to 100.

Square on creating a Faster RSA in Java with GMP: Unfortunately, generating the certificate was slower than I expected. RSA signing is computationally expensive; performing a 2048-bit private key operation takes several solid milliseconds on a modern CPU. The operation can't be parallelized either, so while adding CPU power enables us to sign hundreds of requests per second, additional CPUs can't make an individual operation run any faster. I needed a faster implementation.

Speed is not just nice, it enables new ways of doing things. rcthompson: I'm reminded of a recent article that crossed the front page of HN, about a group writing some sort of fluid layout engine for their mobile app. They were able to optimize it from tens of layouts to thousands of layout per second, which enabled their strategy of trying a bunch of different layouts and picking the best one. This wouldn't have been possible if they stuck with the naive solution. Making it fast enabled a new feature that simply couldn't exist with the slow solution.

Nice breakdown of game interactions. Replication in networked games: Latency (Part 2): In this article we surveyed three different techniques for dealing with latency in networked games, though our review was by no means exhaustive. Also some of these methods are not mutually exclusive. For example, it is possible to combine optimistic replication with local perception filters to offset some of the drawbacks of a purely optimistic approach.

Eventually Consistent: Not What You Were Expecting?: This article looks at methods of quantifying consistency (or lack thereof) in eventually consistent storage systems. These methods are necessary for meaningful comparisons among different system configurations and workloads. First, the article defines eventual consistency more precisely and relates it to other notions of weak consistency. It then drills down into metrics, focusing on staleness, and surveys different techniques for predicting and measuring staleness. Finally, the relative merits of these techniques are evaluated, and any remaining open questions are identified.

jailhouse: a partitioning Hypervisor based on Linux. It is able to run bare-metal applications or (adapted) operating systems besides Linux. For this purpose it configures CPU and device virtualization features of the hardware platform in a way that none of these domains, called "cells" here, can interfere with each other in an unacceptable way.

To make use of all that bare metal you need a high level method of scheduling and running applications. Apps built on Mesos: Mesos makes it easy to develop distributed systems by providing high-level building blocks. This is a list of applications that take advantage of its scalability, fault-tolerance, and resource isolation.

Building fast Bayesian computing machines out of intentionally stochastic, digital parts: The brain interprets ambiguous sensory information faster and more reliably than modern computers, using neurons that are slower and less reliable than logic gates. But Bayesian inference, which underpins many computational models of perception and cognition, appears computationally challenging even given modern transistor speeds and energy budgets. The computational principles and structures needed to narrow this gap are unknown. Here we show how to build fast Bayesian computing machines using intentionally stochastic, digital parts, narrowing this efficiency gap by multiple orders of magnitude. We find that by connecting stochastic digital components according to simple mathematical rules, one can build massively parallel, low precision circuits that solve Bayesian inference problems and are compatible with the Poisson firing statistics of cortical neurons. We evaluate circuits for depth and motion perception, perceptual learning and causal reasoning, each performing inference over 10,000+ latent variables in real time — a 1,000x speed advantage over commodity microprocessors. These results suggest a new role for randomness in the engineering and reverse-engineering of intelligent computation.

Stuff The Internet Says On Scalability For February 21st, 2014

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale