hot links

Stuff The Internet Says On Scalability For August 23, 2013

High Scalability

23 Aug 2013 — 6 min read

Hey, it's HighScalability time:

(Parkour is to terrain as programming is to frameworks)

5x: AWS vs combined size of other cloud vendors; Every Second on The Internet: Why we need so many servers.
Quotable Quotes:
- @chaliy: Today I learned that I do not understand how #azure scaling works, instance scale does not affect requests/sec I can load.
- @Lariar: Note how crazy this is. An international launch would have been a huge deal. Now it's just another thing you do.
- smacktoward: The problem with relying on donations is that people don't make donations.
- @toddhoffious: Programming is a tool built by logical positivists to solve the problems of idealists and pragmatists. We have a fundamental mismatch here.
- @etherealmind: Me: "Weird, my phone data isn't working" Them: "They turned the 3G off at the tower because it interferes with the particle accelerator"
- John Carmack: In computer science, just about the only thing that’s really science is when you’re talking about algorithms. And optimization is an engineering. But those don’t actually occupy that much of the total time spent programming.
- @gappy3000: Ideas are assets. Code is a liability. So maximize ideas/code.
How can spiders and flies walk up walls? See for yourself with a fun DYI on How to: test Galileo's scaling laws. An idea that is simple yet profound in its implications: "the width of an object is doubled, the surface area is squared and the volume is cubed." It means size matters. Elephants can't dance and jump and insects can walk on water. Why is because the ratio of area to volume governs everything we do. You get to drop stuff from great heights and watch things explode (or not). What could be better?

When one support call can cost you all the profit you'll get from a customer for a year, minimizing support calls is key. Here's an interesting idea: Scaling Support: Why We Removed Our Phone Number From Our Website. They also installed a chat widget and rely on email. Results are good: We've seen a boost in speed of conversion, as potential customers don't have to wait as long to get their questions answered.

Maybe they should have thrown the drives in water to see if they float? NSA files: why the Guardian in London destroyed hard drives of leaked files. #witchcraft

This is one way to expand your addressable market...Facebook Leads an Effort to Lower Barriers to Internet Access

Nux has a good question on the news that Nginx is now offering a relatively high priced pro version: can anyone name one open core product that is really successful? This figuring how to make a living stuff is hard.

Great Packet Pusher explanation of tunneling and underlay networks. Traditional hierarchical switched datacenter architectures are fading in favor of switchless point to point tunnels. Tunnels are an overlay network on top of the underlying topology. vSwitches are being transforming from digital patch panels into full networking devices, routing packets directly into tunnels. Tunnels are setup directly between VMs. The core just sees point to point IP tunnels. Native packets are not switched at each hop. Packets just go down the tunnel making us completely free of the physical network. Topologies will become equal cost multipath CLOS trees so packets are load balanced across all paths at L2. Dependencies between switches fade away so switches can be upgraded/removed/etc without impacting the network. Much easier to manage and change. You don't have to worry about spanning tree problems, routing flaps, etc. This is the future of networking. Changes can be in minutes which means you can get home to play with your kids by 5 PM. Also, The OpenFlow Book.

I will not do a tech interview. Is there a moral dimension to the interviewing strategy described by ngoel36?: I used to work at Google. I saw a lot of good candidates get rejected. I myself was rejected multiple times before I got an offer. I was talking to my manager who was on the Hiring Committee about this dilemma, and at the end of the day the fact is that good companies don't give a sh*t about their false negative rate - only their net positives.

Great example of changing circumstances causing bit rot: 100x faster Postgres performance by changing 1 line.

Nick Zadrozny: Sharding and replication are a good line defense. That’s because elasticsearch maintains enough state when it’s running normally that if you lose a node, something happens with an index or a shard on one machine, it can just rebuild from another copy. That’s a great first line of defense and it’s a huge selling factor for when you’re going from staging or internal use case to actual live production. When you’re live and in production you definitely want replication as your first line of defense for maintaining data integrity.

Whenever there's a large enough population their will emerge different distributions that can be taken advantage of. Performance variance between AWS zones: What do we do with this? Well, if you want, you can optimize in which zone you create new instances. Since the difference in performance is noticeable it’s worth considering it. But please be careful of not overdoing it, uptime comes first. Skewing your instances heavily on a subset of zones can bring can make you more failure prone, there is always a sweet-spot, and in case of doubt, be conservative, sacrifice performance for reliability.

We aren't that far from a factory on your desk. OpenFab: A Programmable Pipeline for Multi-Material Fabrication: MIT has developed OpenFab, a programmable pipeline for synthesis of multi-material 3D printed objects that is inspired by RenderMan and modern GPU pipelines. The pipeline supports procedural evaluation of geometric detail and material composition, using shader-like fablets, allowing models to be specified easily and efficiently. They describe a streaming architecture for OpenFab; only a small fraction of the final volume is stored in memory and output is fed to the printer with little startup delay.

How the light gets out: Even insects and crustaceans have a basic version of this ability to focus on certain signals. Over time, though, it came under a more sophisticated kind of control — what is now called attention. Attention is a data-handling method, the brain’s way of rationing its processing resources.

Cultivating Hybrids: 4 Key Data Architectures for Scaling Infinitely. Good explanation of horizontal scale and why scaling out is hard. Looks at disk IO and network IO bottlenecks. As solutions there's different combinations of columnar storage, mapreduce, MPP, and in-memory databases.

Behind the humanistic curtain at HuffPost: Huffington Post Chooses MongoDB, Scala and Angular JS. They are moving away from their MySQL/PHP stack for their new content delivery system. Scala was chosen for it's Java roots plus it encourages developers to write good code. Angular JS requires less code. MongoDB handles the complex data types associated with their content.

An Open Source Watson? That would be cool. Contains lots of architectural and algorithmic detail.

Yep, TCP is UNreliable. You always need an app-level protocol over TCP. Always.

John Carmack discusses the art and science of software engineering: We know our code is living for, realistically, we’re looking at a decade. I tell people that there’s a good chance that whatever you’re writing here, if it’s not extremely game specific, may well exist a decade from now and it will have hundreds of programmers, looking at the code, using it, interacting with it in some way, and that’s quite a burden. I do think that it’s just and right to impose pretty severe restrictions on what we’ll let past analysis and what we’ll let into it, but there are large scale issues at the software API design levels and figuring out things there, that are artistic, that are craftsman like on there. And I wish that there were more quantifiable things to say about that. And I am spending a lot of time on this as we go forward.

Bitsets Match Regular Expressions, Compactly: This post describes how graph and automata theory can help compile a regular expression like “ab(cd|e)*fg” into the following asymptotically (linear-time) and practically (around 8 cycles/character on my E5-4617) efficient machine code. The technique is easily amenable to SSE or AVX-level vectorisation, and doesn’t rely on complicated bit slicing tricks nor on scanning multiple streams in parallel.

Tracing the Dynabook: A Study of Technocultural Transformations: This dissertation is a historical treatment of the Dynabook vision and its implementations in changing contexts over 35 years. It is an attempt to trace the development of a technocultural artifact: the Dynabook, itself partly an idealized vision and partly a series of actual technologies. It is thus a work of cultural history.

Stuff The Internet Says On Scalability For August 23, 2013

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale