Stuff The Internet Says On Scalability For March 29, 2013

Hey, it's HighScalability time:


(Ukrainian daredevil scaling buildings)

  • 44.6 billion - Tumblr posts; 300 Gb/s - DDoS DNS amplification attacks; 100 million - Eventbrite tickets processed.
  • Quotable Quotes:
    • @tveskov: Alan Kay: “The past 30 years have been completely mundane. It’s all been scaling (of old technology) and Angry Birds” 
    • @phrawzty: OH "Complexity is accelerating. We must augment our ability to manage it." #monitorama
    • @stallent: new wave of apps that are bringing the server doing actual legit valuable work back into vogue know scaling is more than hard
    • @calvdee: Fending off a 300Gb/s DDoS attacks would constitute a feat of #highscalability 
    • @solarce: "A spider farted in Finland and screwed up my IOPS!" -- @lusis #monitorama
    • @cra: Listening to Shawn Pearce talk about scaling #git at Google with JGit... Android AOSP repos: 19.4GB, 2.5Mreq/day, 5.0TB/day #eclipsecon
    • @codemonkeyism: Sometime in the future Erlang people will realize scalability comes from async not actors and then hell freezes over.
    • George Dyson: Von Neumann and Morgenstern demonstrated how to arrive at reasonable solutions among otherwise hopeless combinatorics by means of a finite but unbounded series of coalitions that progressively simplify the search. 

  • InfoQ in China has graciously offered to translate some of my articles into Chinese. So here's Iron.io Moved From Ruby to Go: 28 Servers Cut and Colossal Clusterf**ks Prevented in the original Chinese.

  • To justify high margin devices you need user data to sync automagically between all devices a user owns, otherwise you would just leave everything on the cloud and stream it to much cheaper devices. But Apple is finding device syncing in a complex error prone world is not so simple: The promise of iCloud’s Core Data support is that it will solve all of the thorny issues of syncing a database by breaking up each change into a transaction log. Except it just doesn’t work. 

  • Food trucks are the cloud of the restaurant world. Dream friendly, flexible hours, low startup costs, making good money requires selling alcohol, and you can just pack up and move whenever you want. Oh, well, maybe not.

  • Heard this on the radio: Be responsible to the element; accountable to the mission - Daryl Woods, NASA. It so applies to software teams. NASA uses this motto to remind all the many thousands of people who work on tiny parts of an immensely complex shuttle project that their part counts, but no part is the most important part. So keep your morale up. Your part matters. A lot. It's important. But don't get a big head and destroy the over all project with your pride. Your part doesn't matter more. Without all the parts there's no mission. Which sounds exactly like building software.

  • Mind blown: there will be billions of machine learning models and data will stream directly into the models. No storage necessary. Models will build and continually update themselves and immediately take data to action. From Building Brains to Understand the World's Data by Jeff Hawkins. A wonderfully clear and informative presentation. Another good one: The future of intelligence machines is sparse distributed representations. This is how the brain does it.

  • Concurrency Models: Go vs Erlang. Nice summary by David Lehmann: IMO the intrinsic difference between channels (Golang) and actors (Erlang) is, that actors are a physical model of concurrency and channels are not. In a physical model you have an invisible medium that transports and stores your information and as a sender you need to know the address of your receiver. In Golang the sender doesn't know the address of the receiver but the channel that transports and stores information is known by both ends. I wouldn't call this similar at all.

  • Interesting shift: Paypal To Drop VMware From 80,000 Servers and Replace It With OpenStack. Good to see the players firm up and get some wood behind those arrows. 

  • PeerCDN is using WebRTC to build a peer-to-peer CDN from browsers visiting a site. Peers store static resources. When another user accesses one of those resource a peer is selected as the source for the content. More peers means more parallelism. A globally-distributed central tracker server coordinates which peers have which files. WebRTC is then used to make peer-to-peer connections between a site's visitors. If all the peer's disappear transfer falls back to a HTTP origin server using range requests. Issues with privacy, security, fairness, bandwidth caps, and parasitic resource usage abound, but it's interesting.

  • OmniTI with an excellent article series on Our Experiences with Chef: Adoption Challenges. Think of users in four different camps: black-box, bespoke, shared, bootstrappers...Flexibility is the enemy of standardization, and automation thrives in highly standardized environments. 

  • Cute trick: a web app that uses no resources through the magic of HTML5 local storage and serving static pages from S3. 

  • Common Pitfalls in Writing Lock-Free Algorithms: Concurrent threads can modify the same object, and even if some thread or set of threads stalls or stops completely in the middle of an operation, the remaining threads will carry on as if nothing were the matter. Any interleaving of operations still guarantees of forward progress. It seems like the holy grail of multi-threaded programming. The biggest drawbacks to using lock-free approaches are: Lock-free algorithms are not always practical; Writing lock-free code is difficult; Writing correct lock-free code is extremely difficult. 

  • Excellent LandsofAmerica.com – Elastic Data Warehouse – A Case Study of cost effective analytics on 1 billion rows of data using Hive over EMR. Rent vs buy for the win. It's hard to underestimate how imporant a low stress easy to use solution is as a driver for adoption.

  • I love Easter..eggs: Use Wolfram Alpha to Convert Obscure Technical Measurements Into Layman’s Terms

  • Stronger Semantics for Low-Latency Geo-Replicated Storage: The primary contributions of this work are enabling scalable causal consistency for the complex column family data model, as well as novel,nonblocking algorithms for both read-only and write-only transactions. Our evaluation shows that our system, Eiger, achieves low latency (single-ms), has throughput competitive with eventually-consistent and non-transactional Cassandra (less than 7% overhead for one of Facebook’s real-world workloads), and scales out to large clusters almost linearly (averaging 96% increases up to 128 server clusters).

  • Jeff Darcy asks Is Eventual Consistency Useful? And concludes with another question: Is non-eventual consistency useful? That might well be the more interesting question. The logic: Once you start thinking about how the real world works, eventual consistency pops up everywhere. It’s not some inferior cousin of strong consistency, some easy way out chosen only by lazy developers. It’s the way many important things work, and must work if they’re to work at all. 

  • Best Practices for the Design of Large-Scale Services on Windows Azure Cloud Services. Nicely detailed, will give you a very good feel for all of Azure's capabilities. It definitely has a more business oriented vibe compared to AWS. Some best practices: Scale-out not up; Everything has a Limit: Compose for Scale; Design for Availability; Design for Business Continuity; Density is Cost of Goods. It goes on to decribe Azure's different capabilities and costs.

  • Security concerns are the biggest argument on why computations can't be run on shared resources. That may be changing: Multi-Party Computation: From Theory to Practice: In this talk I will present the latest, practical, protocol called SPDZ (Speedz), which achieves much of its performance advantage from the use of Fully Homomorphic Encryption as a sub-procedure.

  • Ben Stopford contemplates The Return of Big Iron? Ability to model data is much more of a gating factor that raw size. A Late Bound Schema combines structured and unstructured approaches in a layered more nimble fashion. The approach: grid of machines, lage bound schema, shared immutable data, low latency and high throughput use cases, all data is observable, standardizes or raw interfaces. 

  • Compressed Sensing and Big Data. You can combine compression and measurement in one step, leading to reduction in data collection time, energy reduction, and it's prepping for the coming datapocolypse.

  • cassandra performance: Even though cassandra has tables and looks like a RDBMS it's not. Queries with multiple secondary index clauses will not perform as well as those with none.

  • Optimistic Cuckoo Hashing for concurrent, read-intensive applications: we substantially improve the throughput of the memcached distributed DRAM cache.  One of the core techniques we use is a new multiple-reader, single-writer concurrent variant of Cuckoo hashing that we call optimistic cuckoo hashing. It combines the refinement of a technique we introduced in SILT ("partial-key cuckoo hashing"), a new way of moving items during cuckoo insertion, and an optimistic variant of lock-striping to create a hash table that is extremely compact and supports extremely high read throughput, while still allowing one thread to update it at high speed (about 2M updates/second in our tests).

  • Debugging a race condition with client-side load balancing. A jaw rattling whodunit. A realistic portrayal of state being mutated from two different parts of the system at the same time. So don't do that.

  • RyanZAG:  I've had this exact same experience before - you build a neat core product that gets a decent core following, but then your team isn't sure where to go next. The first thing that always happens is that everyone has ideas for new features to bolt on that they've seen elsewhere. The discussion quickly moves from 'How do we make this better?' to 'How do we implement this feature?'. Engineering logic kicks in - this is a great challenge! We need to show how well our design works - during the design phase, we kept in X and Y which we can now leverage to complete this new feature! Once you've moved from thinking out what the customer wants to use to 'how do I make this thing work? should I cut a corner?', then you have already lost the customer's attention. The article has a great take away and I agree with it completely. Don't add features until you understand how and why the customer is going to use those features, and most importantly, if the customer will pay (or increase retention) for that feature.

  • kamaal: Let me tell you how real work happens and how productive people work. I am currently working with a very senior electronics guy in the night for some side projects. Watching him work shows me the path on how we programmers should be working. He will first build the base PCB with all his reusable designs and additional ones reading the documentation I rarely saw him Google anything in the past 15 days. If he has to ever get down to googling its generally to find some data sheet. Once that is done, he gets the components and meticulously builds the entire circuit on the PCB module by module. Every time he builds the module he tests if the inputs and outputs to it are as he designed on the PCB. It took a well whole month with each sitting spanning hours to get the whole PCB working. When he was done, the entire PCB worked like magic. The manufactured one's too. And there wasn't a single problem/bug. It was so spotlessly done. It looked like art work.