Stuff The Internet Says On Scalability For December 20th, 2013

Hey, it's HighScalability time (with so much cool info this week it will blow your mind):


Amazing microscope image of a carnivorous bladderwort

  • How many drones would it take to replace Santa? With a fleet of some 80 million or so F-16 drones the entire worldwide delivery could be completed in just over eight hours. Impressive, but a world without Rudolf is not a world I wish to contemplate.
  • Quotable Quotes:
    • @Loh: Always wanted to travel back in time to try fighting a younger version of yourself? Software development is the career for you!
    • @mraleph: often devs still approach performance of JS code as if they are riding a horse cart but the horse had long been replaced with fusion reactor
    • @peakscale: "The c3.large is 40% faster and has more than double the memory than the c1.medium but costs about the same"
    • @techmilind: Conversation with an ex-Yahoo, now at a Telecom company. Replaced $22M of Teradata by $450K of Hadoop in AWS. It's the economics, stupid!
    • Brett Slatkin: But Doom was a huge success anyways. Those things didn't matter. Carmack's team made the right vastly simplifying assumptions. It was worth it. This is how you apply the ethos of "worse is better" and not let the perfect be the enemy of the good.

  • OK, I understand why Gregor Rothfuss doesn't like how PHP chose function names to minimize hash collisions. It makes for strange names. But come on, isn't it sort of geeky cool too? Why should we let some arbitrary aesthetic sense create a world of baseless conformity? I love that this choice comes from such a pure technical ontology that it has transformed into artistic expression.

  • Et tu, Brute? NASA Drops OpenStack For Amazon Cloud. Ouch, that has to hurt. When a major contributor to an alliance defects it's not a good sign for the strength of the alliance. Of course paying off key players is a time honored way of crushing alliances.

  • Parallelism in other fields. Kevin Kelly, founding executive editor of Wired magazine, former editor/publisher of the Whole Earth Catalog, and recently published "Cool Tools: A Catalogue of Possibilities", shared on an episode of Triangulation how he used Elance to help self publish a huge 5 pound book in under a month. Elance is a tool to bid out work for a reasonable cost and get professional results back quickly. The design of the book was done by 6-8 elancers throughout the world in 25 days. 200 potential designers were paired down to 30. Out of the 30 the best were fed more and more pages until a small group formed to complete the work. Using the same strategy Elance was used to proof the book in 48 hours. He kickstarted a graphic novel and is crowd sourcing that as well. Illustrators were from Deviant Art. The colorist was from Argentina. A screen writer and other consultants were called in. It's a different world. Interesting guy and a great interview.

  • Sound familiar? Cortex evolving: New models of human whole-brain organization: A distributed network-type structure makes more sense than a strictly hierarchical system. They call their framework the “tethering hypothesis”, in which over time, cortical expansion changed mammalian brains from small, tightly knit and hierarchical structures to expanded parallel networks of large association areas “tethered” to one another. 

  • It's all about the logs. Jay Kreps with an amazing article extolling their vitues and uses: The Log: What every software engineer should know about real-time data's unifying abstraction. It's divided into four parts: Whis is a Log?, Data Integration, Logs & Real-time Stream Processing, and System Building. Logs are used in databases, distributed systems, big data, and in the data flow between systems. There's way too much to summarize. A must read. I think with all the talk about virtualization, logs have long been a way to impose a virtualization layer between code and a messy real world.

  • Michael Stonebraker predicts: One size fits none (sorry Cinderella), each vertical has a best fit database; There’s room for a lot of winners, legacy relational will shrink with a few winners in each category; NoSQL will come back down to earth, SQL is the clear winner; Oracle will feel the squeeze from SAP, don't care; Facebook will keep searching, possibly fruitlessly, for a MySQL replacement, Facebook doesn't care.

  • Netflix Presentation Videos from AWS Re:Invent 2013 are now available. 

  • Doing it for themselves. Tech Firms Push to Control Web's Pipes. Companies like Facebook and Google don't need Telcos. They have the money, the technology, the expertise, and the vision for what they really need. Good to see product driven innovation and not a strategy of maximizing revenues from already installed plant.

  • The Future of JavaScript MVC Frameworks: If you treat the browser as a remote rendering engine and stop treating it as a place to query and store crap, everything gets faster. Sound like something familiar? Yeah, computer graphics programming.

  • The revolution will not be monitized: We were promised jet packs and we got Facebook. Well, Kleiner Perkins tried to create jetpacks and it got fracked.

  • Primeburbia. Zach Klein observes the changes Amazon's fullfillment centers are making in the landscape: I save a satellite photo to commemorate each time I receive a package from a new Amazon facility. I’m following the development of fulfillment clusters that occupy hundreds of acres of what was often crop land in the exurban halo of our metropolises. Their scale compounded by their out-in-the-middle-of-nowhereness is a surreal landscape to behold, and they’re emerging as a new pattern of community-building in the U.S.

  • Sex in Title, and Other Stories: A LSHS (large-scale hierarchical system) tries to scale by scaling its "global mutable state," in other words data that everyone agrees on. It is a natural pattern for one, or two people. I don't send myself emails. As you add more people, however, the cost of changing shared state increases horribly. The only way to scale is to in fact ban the concept of global mutable state, and switch to a top-down hierarchical state. That leads to amazingly slow and inaccurate decision making, where essential knowledge can take so long to filter back up to the top that the whole LSHS is dead before a necessary decision can be taken. Bye-bye Nokia!

  • How do you scale up in the meat world? Robots. Interesting thoughts on why Google is getting into robots. Of course it's also a huge market that exploits algorithms and machine learning. 

  • Interesting idea, SMR, the stealth drive revolution, specialized purpose built disk drives for cold storage. 

  • Something you probably need to do: Welcome to the Show of CDN Monitoring: Act 2- How and How Not to Monitor CDNs: I am always surprised and shocked when I hear things like “We have a major problem. 25% of our CDN requests are really slow” based on data collected by 3 or 4 locations. Such an analysis is fundamentally wrong!

  • Netlix creates a ISP index where you can see the performance ranking of ISPs in Netflix's network. Google Fiber is the leader. Keep in mind with primetime viewing schedules Netflix sees a burst mode effect. Your results may vary.

  • Antirez on Some fun with Redis Cluster testing. Great write up on the bring up and testing process of a complicated new feature. It's important to understand Redis Cluster does not feature strong consistency, but it tries to minimize lost writes by looking at data age when electing a slave.  And I think there's nothing like automated deployment of systems tests to improve testing. Doing it by hand means a lot less testing gets done.

  • Great example of how to deal with and educate others a vulnerability: eBay:remote-code-execution.

  • If you want to understand ZooKeeper better there's now a book. Good interview:  Meet the Book Authors: Flavio Junqueira and Benjamin Reed on ZooKeeper.

  • A quick overview  of EZTABLE's architecture: AWS, one region, second AZ to recover from a region crash, S3, NFS, Glacier, Route 53 by now, Cloudfront with SSL, EMT, Mailgun and SES for email, Percona's MySQL, Redis, php-resque, Solr, Scribe, Node.js, Redis pub/sub for real-time messaging, Nginx, PHP: Apache2 with mod-php5, Static Files: Nginx, S3, CDN. For database scalability: change the model, don't use a database, High IO EBS and Raid 10, more powerful instance type, table partitioning, use a different server for vertical database, sharding, use NoSQL.

  • If you've ever done some stderr, stdout, stdin plumbing in C or from the shell, where did that brilliant idea came from? Here's the story: The Birth of Standard Error. It was modeled after an early typesetting machine. Text input was driven by a one way connection to a PDP-11. Output was rendered onto film. If there were any errors it would show up on the film so they split the errors out into their own file.

  • Google on Best practices for App Engine: memcache and eventual vs. strong consistency: use the atomic Memcache functions where possible, including incr() and decr(), and use the cas() function for coordinating concurrent access; Your application can get higher scalability and performance by leveraging eventual consistency, because your application won't have to wait for all the writes to complete before returning results.

  • Detecting Reddit Voting Rings Using This Weird Little Data Trick. Shortcuts are hackable. There's always a strategy.

  • There's a value in ignoring what people have done before and going your own way. That's what the Wright Brothers did: He suggested that Lilienthal's lift and drag tables were wrong. Once Wilbur had publicly stated they believed that Lilienthal's data was wrong, they had to find a way to determine the correct data. Setting out to collect accurate information with a wind tunnel constructed in the back room of their bicycle shop, they designed and operated wind tunnel balances...

  • Nice details on How I made a scalable social network in a bunch of hours with PHP + Redis. Includes sign-up, post wall status, private messages, comments, and friendship. 

  • eVOL Monkey describes scaling postgres with pgpool-II, a middleware that works between PostgreSQL servers and a PostgreSQL database that provides lots of goodies: connection pooling, replication, load balancing, connection limiting, parallel queries.  Nicely explained.

  • If you are having a hard time visualizing envisioning world peace, take a break visualize data structures at a fascinating site: Data Structure Visualizations. Nicely done and very helpful.

  • An Alternative Multi-Producer Approach. Michael Barker finds merit in the idea "that if you knew the number of producers that you have at initialisation time you can build a structure that significantly reduces contention."

  • A key enlightenment ideal: usefulness as a source of authority. Not lords, antiquity, or religion. Still a founding ethic to this day. It allows progress to bootstrap itself into change.

  • Tabular: A Schema-Driven Probabilistic Programming Language: We propose a new kind of probabilistic programming language for machine learning. We write programs simply by annotating existing relational schemas with probabilistic model expressions. 

  • Reliable Massively Parallel Symbolic Computing: Fault Tolerance for a Distributed Haskell: As the number of cores in many core systems grows exponentially, the number of failures is also predicted to grow exponentially. Hence massively parallel computations must be able to tolerate faults. Moreover new approaches to language design and system architecture are needed to address the resilience of massively parallel heterogeneous architectures. 

  • Damn this is cool. Message Passing Inference with Chemical Reaction Networks: Recent work on molecular programming has explored new possibilities for computational abstractions with biomolecules, including logic gates, neural networks, and linear systems. In the future such abstractions might enable nanoscale devices that can sense and control the world at a molecular scale.