Stuff The Internet Says On Scalability For October 19, 2012

It's HighScalability Time:

  • @davilagrau: Youtube, GitHub,..., Are cloud services facing a entropic limit to scalability?
  • Async all the way down? The Tyranny of the Clock: The cost of logic and memory dominated Turing's thinking, but today, communication rather than logic should dominate our thinking. Clock-free design uses less than half, about 40%, as much energy per addition as its clocked counterpart. We can regain the efficiency of local decision making by revolting against the pervasive beat of an external clock.
  • Why Google Compute Engine for OpenStack. Smart move. Having OpenStack work inside a super charged cloud, in private clouds, and as a bridge between the two ought to be quite attractive to developers looking for some sort of ally for independence. All it will take are a few victories to cement new alliances.
  • 3 Lessons That Startups Can Learn From Facebook’s Failed Credits Experiment. I thought this was a great idea too. So what happened? FACEBOOK DID NOT ENCOURAGE SHARING — IF CONSUMERS DON’T HAVE A REASON TO SHARE, THEY WON’T; FACEBOOK NEVER MADE A CASE FOR CARING ABOUT CREDITS; FACEBOOK DISCOURAGED ITS PARTNERS (DEVELOPERS) FROM SUPPORTING CREDITS.
  • Some patterns for fast Python by Guido van Rossum: Avoid overengineering datastructures; Built-in datatypes are your friends; Be suspicious of function/method calls; Don't write Java (or C++, or Javascript, ...) in Python; Are you sure it's too slow? Profile before optimizing!; The universal speed-up is rewriting small bits of code in C. Do this only when all else fails. Great discussion in the comments.
  • Twitter lets another little birdy free from the nest. Twitter Open Sources Clutch: Clutch is an easy-to-integrate library for native iOS applications designed to help you develop faster, deploy instantly and run A/B tests.
  • Scalability is a systemic anomaly inherent to the programming of the matrix: The solution to scalability is like the spoon-boy story from The Matrix. It’s impossible to fix the scalability problem until you realize the truth that it’s you who bends. More hardware won’t fix scalability problems, instead you must change your software. Instead of writing web-apps using Apache’s threads, you need to write web-apps in an “asynchronous” or “event-driven” manner.
  • Even Stranger than Expected: a Systematic Look at EC2 I/O: RAID offered a substantial benefit for small operations (especially reads), but — surprisingly — not much for bulk transfers; For ephemeral storage, m1.medium was hardly better than m1.small, but m1.large and m1.xlarge show a substantial benefit; if you’re doing bulk writes on EBS, you probably need to worry about bad instances.
  • Scaling PostgreSQL at Braintree: Four Years of Evolution: A story of moving to MySQL to PostgreSQL using DRDB, introducing sharding, and the moving to streaming replication with PostgreSQL 9.1. There's a database cluster per shard, automated failover within a datacenter and manual failover between datacenters.
  • Slides from the Strange Loop 2012 conference are now available.
  • So the decentralized web will never fail? Has that been anyone's experience?
  • And all the Great Dreamforce ’12 Developer Content at Your Fingertips.
  • Lucene/Solr 4.0 comes with a new distributed indexing architecture and can offer real-time results
  • Great story on Symbian, a post-mortem: The S60 platform clearly tells the story of a great hardware company struggling to become (even a good) software company. I think it also tells a story of how hard it is to build expertise in software - that without a crucial mass of people, companies, products and projects in an area (in this case, specifically UI libraries and compilers) you just can't make. The Bay Area has that mass - Finland doesn't.
  • Orbitz Takes Off with Couchbase: Orbitz will share how they replaced their caching tier with Couchbase, using Couchbase Server extensively across their site to achieve astronomical results in terms of scalability, reliability, performance, and cost savings.
  • In short, R now scales: We have done principal components analyses via singular value decomposition with matrices as large as 100,000x100,000 in under an hour. We have solved systems of 70,000 equations in 70,000 unknowns. We have scaled computations from 2 to --- true to the title --- 12,000 cores.
  • Dropbox Caching in theory and practice. LRU is the winner.
  • Foursquare is Open-sourcing their dashboard for Apache Oozie.
  • The coming crisis in disk drives. StorageMojo sees difficult times ahead for disk drive manufacturers as consumers move away from PCs to mobile, which has cut off a major market for rust.
  • Google Throws Open Doors to Its Top-Secret Data Center, wherein we learn their coloring scheme is brought to you by skittles.
  • Parallel In-Place Merge: Merging sorted arrays in parallel and in place can be done very efficiently, using this algorithm. Comparisons with the performance of similar STL functions are included.
  • Generalized Resource Allocation for the Cloud: In this paper, we present a novel approach to resource allocation that permits the problem specification to evolve with ease. We have built Wrasse, a generic and extensible tool that cloud environments can use to solve their specific allocation problem. Wrasse provides a simple yet expressive specification language that captures a wide range of resource allocation problems
  • The Potential Dangers of Causal Consistencyand an Explicit Solution: Causal consistency results in a tension between visibility latency and throughput. Peak throughput remains constant as more datacenters are added to a convergent causally consistent system, and scaling throughput with datacenters requires quadratic hardware provisioning. To alleviate these concerns, we can decrease per-operation storage and processing costs via application-level explicit causality.