Stuff The Internet Says On Scalability For October 10th, 2014

Hey, it's HighScalability time:


Social climber: Instagram explorer scales to new heights in New York.

  • 11 billion: world population in 2100; 10 petabytes: Size of Netflix data warehouse on S3; $600 Billion: the loss when a trader can't type; 3.2: 0-60 mph time of probably not my next car.
  • Quotable Quotes:
    • @kahrens_atl: Last week #NewRelic Insights wrote 618 billion events and ran 237 trillion queries with 9 millisecond response time #FS14
    • @sustrik: Imagine debugging on a quantum computer: Looking at the value of a variable changes its value. I hope I'll be out of business by then.
    • Arrival of the Fittest: Solving Evolution's Greatest Puzzle: Every cell contains thousands of such nanomachines, each of them dedicated to a different chemical reaction. And all their complex activities take place in a tiny space where the molecular building blocks of life are packed more tightly than a Tokyo subway at rush hour. Amazing.
    • Eric Schmidt: The simplest outcome is we're going to end up breaking the Internet," said Google's Schmidt. Foreign governments, he said, are "eventually going to say, we want our own Internet in our country because we want it to work our way, and we don't want the NSA and these other people in it.
    • Antirez: Basically it is neither a CP nor an AP system. In other words, Redis Cluster does not achieve the theoretical limits of what is possible with distributed systems, in order to gain certain real world properties.
    • @aliimam: Just so we can fathom the scale of 1B vs 1M: 1,000,000 seconds is 11.5 days. 1,000,000,000 seconds is 31.6 YEARS
    • @kayousterhout: 92% of catastrophic failures in distributed data-intensive systems caused by incorrect error handling https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-yuan.pdf … #osdi14
    • @DrQz: 'The purpose of computing is insight, not numbers.' (Hamming) Sometimes numbers ARE the insight so, make them accesible too. (Me)

  • Robert Scoble on the Gillmor Gang said that because of the crush of signups, ello had to throttle invites. Their single PostgreSQL server couldn't handle it captain.

  • Containers are getting much larger with new composite materials. Not that kind of container. Shipping containers. High oil costs have driven ships carrying 5000 containers to evolve. Now they can carry 18,000 and soon 19,000 containers!

  • If you've wanted to make a network game then this is a great start. Making Fast-Paced Multiplayer Networked Games is Hard: Fast-paced multiplayer games over the Internet are hard, but possible. First understanding your constraints then building within them is essential. I hope I have shed some light on what those constraints are and some of the techniques you can use to build within them. No doubt there are other ways out there and ways yet to be used. Each game is different and has its own set of priorities. Learning from what has been done before could help a great deal.

  • Arrival of the Fittest: Solving Evolution's Greatest Puzzle: Environmental change requires complexity, which begets robustness, which begets genotype networks, which enable innovations, the very kind that allow life to cope with environmental change, increase its complexity, and so on, in an ascending spiral of ever-increasing innovability...is the hidden architecture of life.

  • DynamoDB spins in with some interesting new features. An expanded free tier, native JSON support, global secondary indexes, you can store larger items, and more usable scaling targets. The logical first choice in a database? No, but now a more attractive choice.

  • Fascinating analysis of the economics of bitcoin and P2P networks in the face of hardware innovation. Economies of Scale in Peer-to-Peer Networks: if the network is to gain mass participation, the majority of participants cannot contribute significant resources to it; they don't have suitable resources to contribute. They will have to contribute cash. This in turn means that there must be exchanges, converting between the rewards for contributing resources and cash, allowing the mass of resource-poor participants to buy from the few resource-rich participants.

  • What good is a consistent snapshot? It prevents gremlins from messing with your pictures. Nasty little gremlins. 

  • No instant scaling, but if you don't need it here are some step-by-step instructions on how to Create your own Heroku on EC2 with Vagrant, Docker, and Dokku. Good discussion on HN. It's not really a full Heroku clone, but it has some good points.

  • A good question. How did you maintain data integrity between micro services?  The complicated answer: A range of strategies. Jobs with retries to ensure that transient failures affected the syncing process as little as possible. Monitoring to let us know if a more long term problem was preventing data being synced. Using change events as a trigger to go to the original source and retrieve the current data, rather than using the value associated with the change event. Being clear as to which system was the source of truth for each piece of data, and only allowing people to update the data in the source system, not the destination system (no two way sync for the same data.)

  • What does Netflix use for data analytics? Magic. Using Presto in our Big Data Platform on AWS: A small subset of the ETL output and some aggregated data is transferred to Teradata for interactive querying and reporting. On the other hand, we also have the need to do low latency interactive data exploration on our broader data set on S3. These are the use cases that Presto serves exceptionally well. 

  • Do you have a need for speed? How about Remote Direct Memory Access at under 2 microseconds? Benchmarking GPUDirect RDMA on Modern Server Platforms. Quite a nice benchmark and analysis.

  • To think it began with a tweet. History of Apache Storm and lessons learned: building a successful project requires a lot more than just producing good code that solves an important problem. Documentation, marketing, and community development are just as important. Especially in the early days, you have to be creative and think of clever ways to get the project established. Examples of how I did that were making use of the Twitter brand, starting the mailing list a few months before release, and doing a big hyped up release to maximize exposure. Additionally, there's a lot of tedious, time-consuming work involved in building a successful project, such as writing docs, answering the never-ending questions on the mailing list, and giving talks.

  • forestdb: A Fast Key-Value Storage Engine Based on Hierarchical B+-Tree Trie.

  • Recommended papers from RecSys 2014, the ACM Conference Series on Recommender Systems.

  • Eventually-Serializable Data Services: We present a new specification for distributed data services that trade-off immediate consistency guarantees for improved system availability and efficiency, while ensuring the long-term consistency of the data. An eventually-serializable data service maintains the operations requested in a partial order that gravitates over time towards a total order.

  • Avoiding catastrophic failure in correlated networks of networks: The stability of a system of networks relies on the relation between the internal structure of a network and its pattern of connections to other networks. Specifically, we demonstrate that if interconnections are provided by network hubs, and the connections between networks are moderately convergent, the system of networks is stable and robust to failure. We test this theoretical prediction on two independent experiments of functional brain networks (in task and resting states), which show that brain networks are connected with a topology that maximizes stability according to the theory.

  • The Browsemaps: Collaborative Filtering at LinkedIn: This paper presents LinkedIn’s horizontal collaborative filtering infrastructure, known as browsemaps. The platform enables rapid development, deployment, and computation of collaborative filtering recommendations for al- most any use case on LinkedIn. 

  • Arrakis: The Operating System is the Control Plane: We have designed and implemented a new operating system, Arrakis, that splits the traditional role of the kernel in two. Applications have direct access to virtualized I/O devices, allowing most I/O operations to skip the kernel entirely, while the kernel is re-engineered to provide network and disk protection without kernel mediation of every operation. We describe the hardware and software changes needed to take advantage of this new abstraction, and we illustrate its power by showing improvements of 2-5x
    in latency and 9x in throughput for a popular persistent NoSQL store relative to a well-tuned Linux implementation.