hot links

Stuff The Internet Says On Scalability For October 17th, 2014

High Scalability

17 Oct 2014 — 5 min read

Hey, it's HighScalability time:

What could this be? Swarms of drones painting 3D light sculptures against the night sky!

Quotable Quotes:
- Visnja Zeljeznjak: Steve Jobs' product pricing formula: cost of materials x 3 + 33%
- Benedict Evans: We now have over 2bn iOS and Android devices on earth, and this will grow in the next few years to well over 3bn.
- @ClearStoryData: It's true! Avg beer drinker attracts 4.4% more Mosquitos than water drinker #Strataconf
- Leslie Lamport: The core idea of the problem of that notion of causality came about because of my familiarity with special relativity...where one event could causally effect another depended on weather or not information from one could physically reach the other.
- @laurelatoreilly: Fascinating session about cargo ships going dark to shift market prices #IoT #strataconf "your decisions are only as good as your data"
- @muratdemirbas: Distributed/decentralized coordination is expensive & hard to scale. Centralized coordination is cheap & scales easily using hierarchies.
- @froidianslip: ”Kafka is awesome. We heard it cures cancer." -- @gwenshap #Strataconf
- @timoreilly: RT @grapealope: The self-driving car has 6000 sensors, and takes readings at 4Hz. That's a lot of data. @MCSrivas #strataconf #MapR
- @froidianslip: Love the paraphrase borrowed from Ray Bradbury, "Any sufficiently complex configuration is indistinguishable from code." #Strataconf
- @matei_zaharia: Spark shatters MapReduce's 100 TB and 1 PB sort records... with 10x fewer nodes
- @msallstr: “Synchronous calls in this environment are the crystal meth of programming” @mjpt777 on the new reactive manifesto
- @postwait: “If you put them under enough stress, perfectly rational people will panic and start believing in science” #priceless
- Ilya Grigorik: It's great to see access from mobile is around 30% faster compared to last year.
- @ryandotsmith: Recently migrated an async system to SQS. Much simple. Tiny latency. Here is the code (maybe a gem?)

People just don't appreciate the power of messy. The problematic culture of "Worse is Better". There's an implied notion here that people can't recognize better when they see it. Better is not a platonic ideal. It can't be proved by argument. Better, like evolution, is something that works itself out in practice. Like evolution, Worse is Better is an algorithm for stepping through a possibility space by jumping from one working phenotype to the next more adapted working phenotype. And for many, that's better. Not Ideal, but Better.

The Times They Are a-Changin'. Docker and Microsoft partner to drive adoption of distributed applications. What's the goal? nickstinemates: Package your Windows app in a docker container, use same tooling you would otherwise use to deploy to a docker engine running on a Windows host. Package your Linux app in a docker container, use same tooling you would otherwise use to deploy to a docker engine running on a Linux host.

Leandro Pereira writes a fine autobiography in Life of a HTTP request, as seen by my toy web server. All the stages of life are there. Socket creation. Acceptance. Scheduling. Coroutines. Reading requests. Parsing requests. All the way to the reply and the death of the connection. A lot to learn if you want to look at the simplified internals of a service.

Wonderful talk: Call Me Maybe: Carly Rae Jepsen and the Perils of Network Partitions. Kyle Kingsbury takes a detailed look at different partition problems in different databases. There are split brains. Masters dying. Lost data. General network mayhem. It's great. The lesson: what's written down in the marketing documentation is not always what you get. Test your application and see what really happens. The world is not simple. A dumb solution where you understand the failure modes can be a good choice.

Good look at Cloud Foundry’s Container Technology: A Garden Overview.

If you are looking for tools to help run your site better then take a look: Cassandra Summit Recap: Diagnosing Problems in Production. A rich list with helpful explanations that's not just for Cassandra. Some of the tools referenced: Ops center, Metrics plugins, Munin, Nagios, Icinga, Statsd, Graphite, Grafana, Logstash, iostat, htop, iftop & netstat, dstat, strace, pcstat. Some of the problems addressed: clock skew, unreclaimed disk space, compaction, problems adding nodes, garbage collection, histograms, query tracing.

USENIX Technical Sessions are now available. Lots of good stuff: Arrakis: The Operating System is the Control Plane, IX: A Protected Dataplane Operating System for High Throughput and Low Latency, Salt: Combining ACID and BASE in a Distributed Database.

StorageMojo says: Scale and intelligence: lessons from warehouse-scale computing: Because of their scale Amazon, Google, Azure and Facebook can afford to put PhDs on problems that no enterprise would see a return for. The right answer for enterprises is to implement resilient scale-out architectures from committed vendors, rather than attempt to reinvent a warehouse-sized wheel. Focus scarce resources on evaluating best-of-breed solutions and the cultural changes required to best implement them.

Mathias Lafeldt has three good rules of infrastructure automation: Don’t blindly automate all the things, Use whatever tool works for you or your company, Care about your work.

It depends. Docker Containers Performance in VMware vSphere: Redis running in a VM has slightly lower performance than on a native OS because of the network virtualization overhead introduced by the hypervisor. When Redis is run in a Docker container on native, the throughput is significantly lower than native because of the overhead introduced by the Docker bridge NAT function. In the VM-Docker case, the performance drop compared to the Native-Docker case is almost exactly the same small amount as in the VM-Native comparison, again because of the network virtualization overhead. However, when Docker runs using host networking instead of its own internal bridge, near-native performance is observed for both the Docker on native hardware and Docker in VM cases, reaching 98% and 96% of the maximum throughput respectively.
2014 Complex Systems Summer School Proceedings are now available. You might like: Network and Conversation Analyses of Bitcoin, Disease Spreading on Ecological Multiplex Networks.

How to Promote Scalability with PF_RING ZC and n2disk: Nowadays even low-end CPUs come with at least 4/8 cores and people want to exploit all of them before buying a new machine. It is not uncommon to see people trying to squeeze on the same machine multiple applications (n2disk, nProbe, Snort, Suricata, etc.) that all need to analyze the same traffic, saving also money for network equipments for traffic mirroring (TAPs, etc.) while reducing complexity. Both PF_RING ZC and n2disk have been designed to fully exploit the resources provided by multiple cores, using zero-copy packet fanout/distribution across multiple threads/processes/VMs in the former, scattering packet processing on multiple cores in the latter.

Krishna Sankar with some good notes from Jeff Dean's talk on scalable predictive deeplearning. Some highlights: Build a system with simple algorithms and then throw lots of data – let the system build the abstractions; Google is interested in algorithms that get better with data; An effective recommendation system requires context ie. understand the user’s surroundings, previous behavior of the user, previous aggregated behavior of many other users and finally textual understanding.

Boot2Docker: a lightweight Linux distribution made specifically to run Docker containers. It runs completely from RAM, is a small ~24MB download and boots in ~5s (YMMV).

Kademlia: A Peer-to-peer Information System Based on the XOR Metric: We describe a peer-to-peer system which has provable consistency and performance in a fault-prone environment. Our system routes queries and locates nodes using a novel XOR-based metric topology that simplifies the algorithm and facilitates our proof. The topology has the property that every message exchanged conveys or reinforces useful contact information. The system exploits this information to send parallel, asynchronous query messages that tol- erate node failures without imposing timeout delays on users.

Stuff The Internet Says On Scalability For October 17th, 2014

High Scalability

Read more

Kafka 101

Capturing A Billion Emo(j)i-ons

Brief History of Scaling Uber

Behind AWS S3’s Massive Scale