Stuff The Internet Says On Scalability For March 8, 2013

Hey, it's HighScalability time:

  • Quotable Quotes:
    • @ibogost: Disabling features of SimCity due to ineffective central infrastructure is probably the most realistic simulation of the modern city.
    • antirez: The point is simply to show how SSDs can't be considered, currently, as a bit slower version of memory. Their performance characteristics are a lot more about, simply, "faster disks".
    • @jessenoller: I only use JavaScript so I can gain maximum scalability across multiple cores. Also unicorns. Paint thinner gingerbread
    • @liammclennan: high-scalability ruby. Why bother?
    • @scomma: Problem with BitCoin is not scalability, not even usability. It's whether someone will crack the algorithm and render BTC entirely useless.
    • @webclimber: Amazing how often I find myself explaining that scalability is not magical
    • @mvmsan: Flash as Primary Storage - Highest Cost, Lack of HA, scalability and management features #flash #SSD #CIO
    • @pneuman42: Game servers are the *worst* scalability problem. Most services start small and scale up over time, solving problems along the way
    • @jeffsussna: OH: "Amazon outages involve server auto-scaling failures. Microsoft outages involve credit card auto-renewal failures"
    • carlosthecharlie: Writing Map/Reduce jobs is like making debt payments on technical debt you don't yet owe
    • Thomas Fuchs: client-side processing and decoupling is detrimental to both the speed of development, and application performance
    • anonymous: *eye twitches* You maintain secondary indexes in dynamo db fields, managed in application code? Dude. DUDE!
  • LinkedIn: Secrecy Doesn't Scale. Winston Churchill: Truth is so precious that she should always be attended by a bodyguard of lies.

  • So eternal vigilance really can be crowdsourced: Bill Introduced to Re-Legalize Cellphone Unlocking.

  • Engaging discussion with George Dyson: Turing’s Cathedral and the Dawn of the Digital Universe. Template based addressing. DNA is searched by template. You don't have to know the exact location of a protein and the match doesn't have to be exact. Google is template searching for data. He thinks this template idea is a third revolution in computing. Much more flexible and robust. Because of errors you have to build architectures that are more flexible and can deal with ambiguity, which is what nature does. Google as an Oracle Machine. Alan Turing said machines will never be intelligent unless they are allowed to make mistakes. Deterministic computing is limited. A non-deterministic element, an Oracle is required. Machines need to learn by making mistakes, tolerating mistakes, a learning from mistakes. Google is made up deterministic machines. We humans are in Google's loop to act as the non-deterministic signal, as Oracle Machines. 

  • What will we do with millions of 8+ core mobile devices? There's WebRTC, true peer-to-peer in the browser. Take a look at PeerJS and SimpleWebRTC.js. It seems to me we can start to think of a sustainable model of computing using cooperating mobile devices rather than expensive backend servers.

  • Ted Nyman (GitHub) - Scaling Happiness talk: no managers. Culture and freedom perks aren't enough. To change culture you have to change structure. The structure is the absence of structure. Cultural arises naturally. 146 people work in 67 cities for GitHub. Without a mechanism for making decisions many good ideas will never happen at GitHub because consensus can't be reached. That's part of the tradeoff. Teams form naturally based on interests. Implication is you have to hire the right people. Technology creates order. Internally tooling creates structure and grooves process. You can't make anyone do anything. Consistency comes from things people want to use in the form of libraries and tooling. You have to accept mistakes. Autonomy is priceless. No one has quit in 5 years. Policies are set by everyone. Good things do die and that's the price you have to pay.

  • ukd1: Scale all the things: 'Setup a caching layer (Memcached, Redis, Varish)'...if only it was just that easy.Prep for scaling should be something like;1. testing to see when you'll need to 2. monitoring so you know when 3. planning before you reach date / time from #2 how you'll do it 4. implement 5. test you've done #4 properly 6. release 7. repeat the whole process. Work out a methodical way that makes sense. "Setup a caching layer" is not that.

  • Like the parallel in Disfluency where effort acts as a metacognitive alarm when something is more difficult than it should be. Works in programs too.

  • Sustainability in digital systems is interesting to think about...Scaling Up Systems to Make Cities More Sustainable:  scaling up savings into an entire “ecosystem,” so that buildings could leverage each other’s savings, was the way to go; there are no linear patterns for a building’s energy behavior; The interconnectedness of the buildings were more critical than the age or height or other characteristic of a building; if we increase density by 50 percent, we could reduce energy use by 20 percent.

  • Speaking of SimCity, ask them if scaling issues are a good problem to have. DRM related scaling issues at launch are causing a very unsimulated unhappiness for users. They are trying to compensate for scaling issues by adding more servers and removing features, which doesn't seem to be working. 

  • Which brings up Complex systems, oh how might ye fail? Complex systems are intrinsically hazardous, Complex systems are heavily and successfully defended against failure, Catastrophe requires multiple failures – single point failures are not enough, Complex systems contain changing mixtures of failures latent within them, Complex systems run in degraded mode.

  • TechOps Pre-launch Checklist for Web Apps. A handy list of tasks to complete before releasing an application. The major categories are: prepare for disaster, run basic security checks, prep for scaling. 

  • This can't be good - Hard Power Off Is Dead But Not Buried: experimental results reveal that thirteen out of the fifteen tested SSD devices exhibit surprising failure behaviors under power faults, including bit corruption, shorn writes, unserializable writes, meta-data corruption, and total device failure.

  • The great leap forward - How Basecamp Next got to be so damn fast without using much client-side UI. Excellent indepth discussion of some advanced website acceleration techniques: temporarily caches each page you’ve visited and simply asks for a new version in the background when you go back to it, cache TO THE MAX, Thou shall share a cache between pages, Thou shall share a cache between people, use infinite scrolling pages to send smaller chunks.

  • Amir Salihefendic with a good talk about Redis and it can be used to implement queues and other common programming tasks. Interesting is the use of bitmapist for advanced analytics, saving $2K a month.

  • Spent weeks tweaking JVM flags? This will ring so true...JVM performance optimization, Part 5: Is Java scalability an oxymoron?: it's JVM technology that limits the scalability of enterprise Java applications. 

  • There's a new O'Reilly book on Graph Databases. Written by Ian Robinson, Jim Webber, and Emil Eifrém. So far looks really good, as I would expect. And right now it's free!

  • Scaling Node.js Applications using the cluster module to create a network of processes which can share ports on one machine. Node-http-proxy or nginx are used to load balance across machines. Good explanation with example configuration files that are useful beyond node.js.

  • Ooyala on Scaling Cassandra for Big Data. They use it for fast Map/Reduce, Storm, resume enhancement, machine learning, and metrics.

  • High Availability at Braintree. Braintree is a payment gateway so uptime is important. World is divided into planned and unplanned failures. For planned failures: reduce maintenance windows, pre and post migrations, rolling deploys, PostgreSQL for fast DDL, and a proxy later to to pause traffic. Unplanned failures: built their own load balancer, LVS/IPVS, Pacemaker, BigBrother, LitmusPaper, BGP routes traffic through multiple ISPs and data centers, Pingdom to monitor, connect, connect to many networks, ISP outages usually partial, use processor proxies so they can load balance over these proxies and route around ISP connection issues, Mallory, let the system heal and retry, automate everything. Nice presentation.

  • Solving Problems Using Data Structures: data structure can be used to encapsulate implementation details and lead to nice clean code...the main motivation for object oriented code in the first place is encapsulation.

  • Details of the February 22nd 2013 Windows Azure Storage Disruption. Lesson: keep those those security certificates updated. It's always something. Also, Azure overtakes Amazon's cloud in performance test  

  • The cycle continues. Amazon is reducing DynamoDB pricing 85%. Release innovative product. Gain some experience. Drive down costs through economies of scale, hardware improvements, and algorithm improvements.