Stuff The Internet Says On Scalability For February 28th, 2014

Hey, it's HighScalability time:

Plus ça change, plus c'est la même chose (full)

  • Quotable Quotes:
    • @ML_Hipster: A machine learning researcher, a crypto-currency expert, and an Erlang programmer walk into a bar. Facebook buys the bar for $27 billion.
    • OH: Network effects don't happen on toll roads.
    • Benedict Evans: Google is a vast machine learning engine... and it spent 10-15 years building that learning engine and feeding it data.
  • Mining Experiment: Running 600 Servers for a Year Yields 0.4 Bitcoin. Yes, this is a far superior way of doing things. Chew up the commons for marginal gain. It's like old times.
  • Game designers, forget the sardines and go hunt some whale. Swrve found: half of free-to-play games’ in-app purchases came from 0.15 percent of players. Only 1.5 percent of players of games in the Swrve network spent any money at all.
  • Google has a beta version of their cloud pricing calculator. The interface is a little funky with separate "Add to Estimate" sections, but the prices look good. 5 servers, with 2 cores, 7.5GB RAM, 24x7, 3TB storage, 100 million IOPS, 1TB snapshot storage, 1TB light Cloud SQL operations, 4TB cloud storage, all for $1,559.24 a month.
  • So scalability doesn't matter? After the WhatsApp acquisition here's a tweet from Telegram Messenger: 4 million users joined Telegram within the last 18 hours. We're doing our best, but the service is getting unstable due to high'll take some time to transport and install the new equipment.
  • Maybe content can make money rather than being cheap commodity chum for aggregators. Financial Times’ CTO John O’Donovan: We make more money from our content than from advertising which is a really interesting shift – we are pushing boundaries in terms of how we are getting our content into these different services and platforms.
  • The military wants to be a MoT (Military of Things), not people. Pentagon Plans to Shrink Army to Pre-World War II Level.
  • The Netflix and Comcast deal is not about net neutrality. The media got it wrong. To learn why read an article by Dan Rayburn: Here’s How The Comcast & Netflix Deal Is Structured, With Data & Numbers, and listen to Dan on a recent This Week in Google episode. This stuff is so complicated all will not become clear, but it will help you go that last mile to understanding: This is where a lot of the confusion starts as many are under the impression that ISPs like Comcast are suppose to allow any transit provider to push an unlimited amount of traffic into their network without any compensation. This isn’t a Comcast specific policy, but rather one that is standard for all ISPs.
  • A very well done 13 minute video on the Wolfram Language is the future of programming. Does it change everything? Think of WL as a programmable cross between Google's knowledge graph and IBM's Watson. Only Google's knowledge graph is closed and only profits Google. WL is something you can use. It ups the abstraction level for programming. It's: "Model Driven, Functional, Mathematical, Declarative, clean and simple, with lots of Knowledge. All in models." It's a symbolic knowledge based language where knowledge about computations and about the world is built into the language. You can process images, lay out networks, lookup stock prices, create interfaces, soving optimization problems. All these kind of features work together in a coherent way. It has a REPL loop on steroids. From an interactive prompt over the web. And it's fast. Computations, graphs, and networks just fly across the screen. It has features like how to get your friend network from Facebook and build a graph automagically. The graph isn't just a picture, it can be queryied. You can ask it how do your friends break into group? What is the most connections any friend has? You can also ask it to get the current image from your computer's camera. And again it's not just a pretty picture. You can run edge detection on the image, animate it, and much more. You can ask when the sun will set today. It also has a natural language interface. It solves the traveling salesman problem. Built-in maps. Parallelism is built into the language. The fact that it is symbolic means all its capabilities are composable. And much much more. But how much does it cost? Is it open? Can we crowdsource world  building and data sourcing? Cool != success.
  • Fabtabulous article explaining in great detail How Steve Perlman's "Revolutionary" Wireless Technology Works - and Why its a Bigger Deal than Anyone Realizes from Imran Akbar. It offers a solution to the spectrum crunch. The technique is not completely unique, but they've made it work before everyone else, solving many tricky engineering issues. The Shannon limit has not been broken, just side stepped. Industry adoption is always an issue. But the most revolutionary aspect of the technology is that it can be used to fulfill Tesla's idea of wireless power transmission. Let that sink in for a moment.
  • ZooKeeper Resilience at Pinterest: Here, you’ll learn how Pinterest uses ZooKeeper, the problems we’ve dealt with, and a creative solution to benefit from ZooKeeper in a fault-tolerant and highly resilient manner.
  • Something to consider. Ultimate cloud speed tests: Amazon vs. Google vs. Windows Azure. Google is fastest, Azure is slowest, Amazon is priciest.
  • Brendan D. Gregg has the Vallhala of Linux performance information on his home page. An amazing set of resources.
  • Crittercism: Scaling To Billions Of Requests Per Day On MongoDB. Good look at the topology changes necessary in the transition point from a large to very large system. Chunk balancing took a long time propagate as the number connections between mongos routers and mongod shard servers went into the thousands. The solution was to consolidate mongos routers onto a few hosts.
  • Communications as a bottleneck. Increase ZeroMQ performance by up to 2400%. The trick is a distributed memory system over a fast network (1GbE, InfiniBand, Universal Fast Sockets).
  • Ah,  memories. 64 Terabyte RAM computer from SGI. Good discussion of possible issues with high latencies due to a deep memory hierarchy when accessing non-local RAM, given that it's a composed as a cluster of blades with "Numalink 6 interconnect." A flat address space isn't the same is a non-distributed space.
  • Is this a thing now, making APIs keys readable only once and forever? Hashing API keys to improve security. But I lose my keys all the time!
  • App Engine 1.9.0 now available. Now with modules! Modules look like a cool structuring mechanism: a feature that lets developers factor large applications into logical components that can share stateful services and communicate in a secure fashion.
  • If you want a good overview of cell phone techology then this page is excellent: It would be useful to give an overview of the cell phone technology here as this is quite inline with our installation. Let's see how a cell phone works? What makes it different from a regular phone? What do all those confusing terms like PCS, GSM, CDMA and TDMA mean?
  • Delightful look at the past glories of protocols vs present day centralization by Brian Graham in The lost art of telnet: What would I really like to see? I wish for more hacker-inspired innovation in the protocol arena, where people just invent things only to push boundaries and see how something else might work.
  • On replication strategies, or the return of the long article by Ayende @ Rahien is very good overview of the subject. Covers master/slaves, primary/secondary, multi master, multi write partners, log shipping, oplog, divergent writes.
  • Cute and informative slide deck from Peter Bailis on Availability, Consistency, and Horizontally Scalable Data Management. This is different. The key to database correctness and scalability is the avoidance of coordination. Only coordinate when necessary. How do you know? By using invariants to prevent anomalies. The result is linear scaling. More here and here.
  • Long-Term Storage: There is one long-term storage medium that might eventually make sense. DNA is very dense, very stable in a shirtsleeve environment, and best of all it is very easy to make Lots Of Copies to Keep Stuff Safe.