Stuff The Internet Says On Scalability For October 25th, 2013

Hey, it's HighScalability time:

Test your sense of scale. Is this image of something microscopic or macroscopic? Find out.

  • $465m: Amount lost in 45 minutes due to a software bug. Where? Where else...the finance industry.
  • Quotable Quotes:
    • FCC: Fiber-to-the-home, on average, has the best performance in terms of latency, with 18 ms average during the peak period, with cable having 26 ms latency and DSL 44 ms latency.
    • @CompSciFact: "About 1,000 instructions is a reasonable upper limit for the complexity of problems now envisioned." -- John von Neumann, 1946
    • @anildash: got 20M unique visitors in 20 days, faster than Google+ launch. Took Pinterest 2 years & BuzzFeed 4 years to hit 20M.
    • Thomas A. Edison: I start where the last man left off.
    • @brycebaril: I've never had a tech conference toy with my emotions like this year's #realtimeconf

  • Great explanation of the Netflix people don't know, their CDN. Chaos Kong is Coming: A Look At The Global Cloud and CDN Powering Netflix:  Netflix sees about 2 billion requests per day to its API, which serves as the “front door” for devices requesting videos, and routes the requests to the back-end services that power Netflix. That activity generates about 70 to 80 billion data points each day that are logged by the system.

  • Didn’t Work in Tests, Launched Anyway. This is just getting silly. When have projects built and released like this ever worked? Especially under huge huge initial loads. Never (or close to). This stuff is complicated for many reasons on every level. To compare such a product with a website is the height of technical ignorance. I recall Captain Renault: I'm shocked, shocked to find that gambling is going on in here!

  • Filled with ennui? Wonder what's next? Reprogrammed bacterium speaks new language of life: "We now have an organism that has a new code, and we can reliably and efficiently open up the chemical diversity of proteins by introducing a whole new array of amino acids using UAG as the codon"

  • If the pervious entry doesn't do it for you how about this? UW engineers invent programming language to build synthetic DNA. It's called Chemical Reaction Networks, or DNA CRNs  for short. That's a programming language for programming DNA people. We live in the future. A one of the commenters, Johny, finally brings truth to the old saying it's not a bug it's a feature: IANAMB(molecular biologist). Given that limitation to my knowledge, I'd say a bug today could be a feature tomorrow. A bug in the Arctic is a feature in the Sahara. Everything can mutate randomly. Therefore, constant rechecking of simulated systems by subjecting to pseudo-random mutations would have to be an essential function of any programmed bio-chemical molecular simulation. 

  • Vampires are always about the blood. The very symbol of life. It appears IBM's computers have gone vampire, which is an interesting genre branding play, with their 'electronic blood' fed computers:  The human brain is 10,000 times more dense and efficient than any computer today. That's possible because it uses only one - extremely efficient - network of capillaries and blood vessels to transport heat and energy - all at the same time. Its new "redox flow" system pumps an electrolyte "blood" through a computer, carrying power in and taking heat out.

  • If the first HFT article didn't scare you, here's a few companion pieces you might like, Online Algorithms in High-frequency Trading and Barbarians at the Gateways: High-frequency Trading and Exchange Technology by Jacob Loveless on ACM Queue: The reality is that automated trading is the new marketplace, accounting for an estimated 77 percent of the volume of transactions in the U.K. market and 73 percent in the U.S. market. As a community, it's starting to push the limits of physics. Today it is possible to buy a custom ASIC (application- specific integrated circuit) to parse market data and send executions in 740 nanoseconds (or 0.00074 milliseconds).4 (Human reaction time to a visual stimulus is around 190 million nanoseconds.) < Really great details. It's an indepth look at the space.

  • Wonderful exploration of different strategies for solving the The Traveling Salesperson Problem by Peter Norvig. You look at many different algorithms, how to plot the data, a process for improving results, and there's test data and code. As good as it is, I was disappointed there's no quantum computer approach :-)

  • Interesting architecture choice: Yandex, the largest tech company in Russia, is using Docker for infrastructure virtualization and app isolation of its open-source PaaS system called Cocaine. To see why take a look at Lightweight Virtualization with Linux Containers and Docker. Containers have many advantages over VMs: they are lightweight and easier to manage and could become the perfect format for software delivery. Along the same lines RackSpace is using ZeroVM.

  • From the twitter stream it seems like a good time was had by all at #realtimeconf. Rare to see such gushing praise about a conference. Live were changed. Videos are now available.

  • Apple now uses hardware to subsidize free software. Google uses ads to fund free services. Amazon sells products to fund free software and cheaper hardware. Which strategy is best? Are there any other strategies?

  • Wonderful dive into the internals of low level queue algorithms. Multi-Producer Single Consumer Queues - From 2M ops/sec (6 producers, ABQ) to 50Mops/sec (lock free MPSC + backoff): This time we are looking at a slightly more common pattern, many producing threads put stuff on a queue which funnels into a single consumer thread.

  • Post-mortem of a Dead-on-Arrival SaaS Product: Launching without an audience means nobody shows up. It's a very true statement. One of my favorite sayings now, from Jason Cohen, is to get 30 people to fully commit to pay for your product before you even start coding it. (from bluedevil2k)

  • Publications by Barbara Liskov. It's all distributed systems all the time.

  • SSYNC:  a cross-platform synchronization suite; it works on x86_64, SPARC, and Tilera processors. SSYNC contains libslock, a library that abstracts lock algorithms behind a common interface and libssmp, a library with fine-tuned implementations of message passing for each of the supported platforms. SSYNC also includes microbenchmarks for measuring the latencies of the cache coherence, the locks, and the message passing, as well as ssht, i.e., a cache efficient hash table and TM2C, i.e., a highly portable transactional memory. 

  • Use those primary keys because InnoDB scalability issues due to tables without primary keys: This scalability issue is caused by the usage of tables without primary keys. This issue typically shows itself as contention on the InnoDB dict_sys mutex.

  • What instance types should you use? Here's a thorough look into that question that tests 175 AWS VMs across current instance types, in all three U.S. regions. IaaS Performance Benchmarks Part 2: AWS. Some findings: no a big difference when EBS-backed instances or instance-store-backed instances; there are differences in AZs so you'll have to performance-test everything you launch to get the best performance; performance in US-East is the best, but it's a wash generally, but US-East is the winner for high performance instances; the m3 family did really well in these benchmarks;  the m3.2xlarge ($1/hour), c1.xlarge ($0.68/hour) and m3.xlarge ($0.50/hour)  are best; it would be hard to justify using the m1.xlarge in almost any circumstance.

  • Copying data is bad, isn't it? Not in Erlang. Though I wonder if copying on virtual machines gives the same result. Erlangs message passing: The secret to the success of the BEAM VM is its choice of copying messages around. First of all, modern computers are good at copying data. Second, there is one type of data in the BEAM VM which is passed by reference: binaries

  • Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask. This paper presents the most exhaustive study of synchronization to date. We span multiple layers, from hardware cache-coherence protocols up to high-level concurrent software. We do so on different types of architectures, from single-socket – uniform and nonuniform – to multi-socket – directory and broadcastbased – many-cores. We draw a set of observations that, roughly speaking, imply that scalability of synchronization is mainly a property of the hardware.

  • Tango: Distributed Data Structures over a Shared Log: Tango provides developers with the abstraction of a replicated, in-memory data structure (such as a map or a tree) backed by a shared log. Tango objects are easy to build and use, replicating state via simple append and read operations on the shared log instead of complex distributed protocols; in the process, they obtain properties such as linearizability, persistence and high availability from the shared log.