Stuff The Internet Says On Scalability For September 27, 2013

Hey, it's HighScalability time:


(The WINLAB at Rutgers, with software defined radios tied into GENI.)

  • 384 cores & 32TB of RAM: Oracle's SPARC M6 
  • Quotable Quotes:
    • @jennyinc: 2003: "I replaced you with a set of very small shell scripts." 2013: "I replaced your scripts with a six-figure enterprise DevOps platform."
    • @tomdale: OH: “Redis is so fast, why don’t we replace RAM with Redis?”
    • @petrillic: OH "Promises/futures are the one-night stands of architectural constructs" nice #strangeloop
    • @TwitterEng: "Java and Scala let Twitter readily share and modify its enormous codebase across a team of hundreds of developers." 

  • Lots of juicy numbers revealed at Structure:Europe: Netflix streams 114,000 years of video every month; Custom build Netflix boxes for its content-delivery network that contain between 100 and 150 terabytes of storage apiece; Netflix accounts for 35 percent of all web traffic; An unnamed social network recently purchased 85 petabytes of storage from EMC; Blogging platform WordPress now underpins nearly 20 percent of the world’s blogs; Felix Baumgartner’s record-breaking jump last year generated 8 million concurrent views on YouTube and consumed almost a quarter of the world’s bandwidth.

  • 6 hostage negotiation techniques that will get you what you want. Useful for any project. After all, aren't meetings hostage situations? The steps: active listening, empathy, rapport, influence, behavioural change. Be aware, if other hostages are also using these techniques then the world will explode.

  • This is how you know you have a lot of customers: By My Estimates, Apple’s iOS 7 Download Business Is Worth About $10-$12M To Akamai: Apple’s new iOS 7 download, on my phone at least, is 754MB. If Akamai delivered 500M downloads at 754MB they would have delivered a total of 377,000,000,000 MB, which converts to 377,000,000 GB. If Apple was paying $0.015 (one and half cents) per GB, the total value to Akamai would be $5.65M. 

  • Oliver Kennedy with an interesting programming version of the Sapir-Whorf hypothesis: "A text editor encourages people to think serially about their code. For parallel programs, however, this is a horrible idea." In the end program runtimes are so complex the only real simulator is in your head, I'm not sure changing the program representation would change this, but I don't know, it would be an interesting experiment.

  • Erlang all the things! Carlos Becker with a passionate defense of Ruby. You aren't Twitter and Twitter never said Ruby sucked anyway. If you want performance so bad why aren't you programming in C? If scalability is everything why aren't you using Erlang? The key is productivity. Think for yourself.

  • GPU acceleration is coming to Java: The speedups are phenomenal — ranging from 2x to 48x faster! And these benefits are possible in Java JDK 8 by taking advantage of existing CUDA libraries to accelerate the Java libraries for parallel operations.

  • Will security become the new reason given for building the walled garden even higher? Google to Encrypt ALL Keyword Searches: Say Goodbye to Keyword Data.

  • Oracle comes out of the row and disk closet, with a new in-memory   column store: Oracle Announces an In-Memory Technology, At an Ungodly Speed…And Cost. Curt Monash with some interesting Thoughts on in-memory columnar add-ons: "this technology should be viewed as applying to traditional business transaction data, much more than to — for example — web interaction logs, or other machine-generated data." Also, Back-to-Basics Weekend Reading - A Decomposition Storage Model

  • They never expect the dual faults inquisition. More On Gmail’s Delivery Delays. Dual network failure at the root of recent Gmail problems. Fix will be more capacity in the face of these errors (good luck with that) and better handling by the client when these problems happen.

  • Oleksiy (aka Alexey) Kovyrin put together an excellent reading list of Interesting Resources for Technical Operations Engineers, with suggestions on books, project management, conferences, sites, blogs, and podcasts. Relieved to see HighScalability made the cut.

  • Blippex tells Why we moved away from AWS. Less money and more speed, maybe better privacy. Good discussion on Hacker News.

  • So your neighbor isn't taking your fruit, your branches are in their yard. EC2 Neighbour Caught Stealing CPU: The steal time means you're trying to go over your allocated resources. AWS doesn't give you a dedicated amount of resources, it gives you access to more, so that when the host machine isn't under heavy load you can use more resources, which would otherwise be wasted.

  • Kyle Kingsbury with a great in-depth look NuoDB in Call me maybe: NuoDB: if you are considering using NuoDB, be advised that the project’s marketing and documentation may exceed its present capabilities. Try to enable the liveness detection code, and set up your own client timeouts to avoid propagating high latencies to other systems. Try to build back pressure hints into your clients to reduce the requests against NuoDB during failure; the latency storm which persists after the network recovers is proportional to the backlog of requests. Finally, be aware of the operational caveats mentioned earlier: monitor your nodes carefully, restart their storage and transaction managers as appropriate, and verify that newly started nodes have indeed joined the cluster before exposing them to clients.

  • Please password the Salt. Great discussion on a confusing aspect of security: Secure Salted Password Hashing. If such a simple thing has so many nuances then it should be clear why security isn't.

  • LinkedIn's Auto-Scaling with Apache Helix and Apache YARN: YARN automates service deployment, resource allocation, and code distribution. However, it leaves state management and fault-handling mostly to the application developer;Helix focuses on service operation, state management, reconfiguration and fault-handling. However, it relies on manual hardware provisioning and service deployment.

  • The Science Behind the Netflix Algorithms That Decide What You’ll Watch Next: Testing has shown that the predicted ratings aren’t actually super-useful, while what you’re actually playing is. We’re going from focusing exclusively on ratings and rating predictions to depending on a more complex ecosystem of algorithms.

  • Michael Bernstein with a trip to the deep side in Formalizing Concurrency, Distribution, and Mobility: Layers and layers of software currently represent what could be shed if assumptions were baked into the design of systems from the outset. Distributed systems practitioners could benefit from researching the various process calculi and their predecessors, to appreciate the deep connection between their work and the theoretical underpinnings which sometimes seem so distant.

  • Sophia: database based on Fractional Cascading ideas and a B-Tree. Sophia was specifically designed to improve the situation and get fast reads while still benefit from append-only design. Sophia's architecture combines a region in-memory index with a in-memory key index.

  • An Insider’s View of Mobile-First Design: Don’t Make These Mistakes: Fake it 'Til You Make it; Indicating Progress On Mobile Can Slow Things Dow; Don’t Divert the Mobile Attention Train.

  • Nitsan Wakart showing why optimization is not for sissies. Diving Deeper into Cache Coherency: Fast moving data should be in M state as much as possible. A read/write miss by any other thread competing for that line will lead to reverting to S/I which can have significant performance implications. The above example demonstrates how this can achieved by caching stale but usable copies locally to another thread.

  • One of biggest hurdles of sharing compute power at a low level of granularity, something below the VM, is security. Now there could be away around that. Detecting program-tampering in the cloud: For small and midsize organizations, the outsourcing of demanding computational tasks to the cloud — huge banks of computers accessible over the Internet — can be much more cost-effective than buying their own hardware. But it also poses a security risk: A malicious hacker could rent space on a cloud server and use it to launch programs that hijack legitimate applications, interfering with their execution.