Stuff The Internet Says On Scalability For January 17th, 2014

Hey, it's HighScalability time:

From the stunning Scale of the Universe - Interactive Flash Animation

  • $7 trillion: US spend on patrolling oil sea-lanes; 82 billion: files served by MaxCDN in 5 months
  • Quotable Quotes: 
    • @StephenFleming: "Money doesn’t solve scaling problems, but the actual solutions to scaling problems always cost money."
    • David Rosenthal: Robert Puttnam in Making Democracy Work and Bowling Alone has shown the vast difference in economic success between high-trust and low-trust societies.
    • @kylefox: That's a huge advantage of SaaS businesses: you can be liberal with refunds & goodwill credits w/o impacting the bottom line much.
    • Thomas B. Roberts: That’s the essence of science: Ask the impertinent question, and you are on your way to pertinent science.
    • Benjamin K. Bergen: Simulation is an iceberg. By consciously reflecting, as you just have been doing, you can see the tip—the intentional, conscious imagery. But many of the same brain processes are engaged, invisibly and unbeknownst to you, beneath the surface during much of your waking and sleeping life. Simulation is the creation of mental experiences of perception and action in the absence of their external manifestation.

  • Urbane apps are the future. 80% world population will be in cities by 2045

  • Knossos: Redis and linearizability. Kyle Kingsbury delivers an amazingly indepth model based analysis of "a hypothetical linearizable system built on top of Redis WAIT and a strong coordinator." The lesson: don't get Kyle mad.

  • If a dead startup had a spirit, this is what it would look like: About Everpix. A truly fine memorial. 

  • This is a strange thought from David Rosenthal on implementing DAWN (Durable Array of Wimpy Nodes) using MicroSD Card hardware. Flash memory is cheap, but making it in bulk requires putting more software on top to mask the unreliability that comes with scaled up processes. It turns out these cards require some non-trivial processing power, so much so that it could be use as a simple, reliable, self-managing archival storage system with a low total cost of operation.

  • Bit Twiddling Hacks by Sean Eron Anderson. A very extensive list of operations with code for when your bits need twiddling.

  • Nimrod: A New Approach to Metaprogramming: a statically typed programming language supporting unhygienic/hygienic and declarative/imperative AST-based macros. Good discussion on Hacker News. Concern is Rust already has this niche occupied, but others think it is a clean language that has a lot of promise. Randallsquared says "It's a lot more complicated than Go, but in return you get generics, templates, and AST macros." TylerE says its "Ocaml with better metaprogramming, an actually helpful compiler, and syntax that doesn't feel like having your eyes gouged out by a rusty spoon." Rayiner says compared to Dylan "It's much more flexible. Dylan's macros are based on pattern matching substitutions. 

  • Lightweight Indexing for Small Strings. Scott Vokes explores how to improve data compression for embedded systems where memory is scarce. His solution: It builds a list of offsets into the data, so that the scanner can jump from each byte to its previous occurrence (if any), skipping over everything along the way that couldn’t possibly match. 

  • Sweet description of one possible future. Unikernels: Rise of the Virtual Library Operating System: The goal of MirageOS is to restructure entire VMs—including all kernel and user-space code— into more modular components that are flexible, secure, and reusable in the style of a library operating system. 

  • Remember when choosing software wasn't like pledging allegiance to a nation state? From Google Apps to Office 365: Why my company ditched Google: Forget Spec Sheets: This is a Battle of the Ecosystems. 

  • asm.js AOT compilation and startup performance. Luke Wagner shows how the simple idea of asm.js is not so simple at all in practice when implemented in Firefox. Wonderful and interesting details on the compilation process. Ahead-of-time compilation, parallel compilation, async compilation, and caching all help to improve performance.

  • Aleksandar Bradic in Caveat emptor : Amazon EC2 Reserved Instance Marketplace shows reserved instances are not quite the slam dunk you hoped they would be. There are issues: RI Marketplace problems are not limited to simply not having enough buyers and sellers around. Market is highly fragmented - you can only buy and sell instances within the same region and can't really convert instance types. It is really a ecosystem of region-specific marketplaces, all of which highly illiquid, with each unique (region, instance type, duration, offering type) tuple representing a different commodity. There is no market maker either. You would expect that that Amazon might want to buy back your unused reservations at a discounted price, but that's not an option, at least not for now. After all, why give someone a fraction of their money back when you can keep all of it. 

  • If you are wondering what to build next Chetan Sharma has a good summary of CES. He walked 8 miles a day so you don't have to: The new growth areas are connected devices, tablets, wellness devices, connected auto, 3D printers, etc. Smart watch sales are expected to double in revenues in 2014. The wearables are expected to grow 25%.

  • The Exceptional Performance of Lil' Exception. Tremendous exploration by Aleksey Shipilёv of Java exceptions. The conclusion: if you don't want exceptions to ruin your performance you shouldn't have used them for regular control flow. All backed by science. 

  • So You Got Yourself a Load Balancer. Short and to the point on what you need to consider when using a load balancer: static image management, environment based URLs, session management, detecting who the client is, SSL termination, logging.

  • Scaling-out SQL Server disks and data files on Windows Azure Virtual Machines…a real-world example. Nice example of eliminating disk I/O bottlenecks: Scaling out your SQL Server data file reads and writes applies just as much to SQL Server on Windows Azure Virtual Machines as it does to your on-premises configurations.

  • Cache Rules Everything Around Me by Andy Pavlo, CMU: In this talk, I will present a new DBMS architecture, called “anti-caching,” that reverses the traditional hierarchy of disk-oriented systems to overcome this limitation. With an anti-caching system, all data initially resides in memory, and when memory is exhausted, the least-recently accessed records are collected and written to disk. We have implemented a prototype of our anti-caching proposal in the H-Store DBMS and compared it to a well-tuned disk-based DBMS optionally fronted by a distributed main memory cache. Our experiments show that as the size of the database increases, the anti-caching DBMS maintains a significant performance advantage over the disk-based systems. Based on these results, we contend that our anti-caching architecture is preferable over traditional, disk-oriented systems for any front-end application.

  • What is the ARINC653 Scheduler?: The primary goal of the ARINC 653 specification is the isolation or partitioning of domains.  The specification goes out of its way to prevent one domain from adversely affecting any other domain, and this goal extends to any contended resource, including but not limited to I/O bandwidth, CPU caching, branch prediction buffers, and CPU execution time.

  • Using Dust Clouds to Enhance Anonymous Communication: In this position paper, we introduce the concept of a dust cloud: a dynamic set of short-lived VMs that run on cloud computing platforms and act as Tor nodes. Users can join a dust cloud when they require anonymous communication, and incur charges only while participating.

  • Bitcloud: A Decentralized Application for Cloud Services Based on Proof of Bandwidth: The Bitcloud protocol is a decentralized application that provides the services of cloud storage and bandwidth sharing. Users will interact with this service in a variety of different ways, but the main idea behind the protocol is that people will be able to store data in the cloud in a way that limits censorship, surveillance, and centralization.