Stuff The Internet Says On Scalability For July 25th, 2014

Hey, it's HighScalability time:


It's systems all the way down. Bugs That Call Us Home.

  • 1 million users in just 4 days: Yo;  30 billion: Pinterest Pins
  • Quotable Quotes:
    • @GlennF: Amazon still dreams it is a startup, like a dog dreaming of chasing rabbits, twitching its legs while asleep.
    • @mfdii: Nobody knows how git works. We all just type in commands like monkeys trying to write Shakespeare. #devopsdays 
    • Benedict Evans: When you pull these strands together, smartphones don't just increase the size of the internet by 2x or 3x, but more like 5x or 10x. It's not just how many devices, but how different those devices are, that has the multiplier effect.
    • @Aaronontheweb: @codinghorror I broke this rule for myself last week. Spent 3 days fixing a problem that we finally solved by a $0.06/hour AWS bill increase
    • Physicist George Ellis: Barring something very unforeseen – the possible tests of the very large and the very small are coming towards the limits of whatever will be possible.
    • The Master Switch: Once the industry had concluded that its profits could be maximized if more people listened to fewer stations, the government, acting as if the business of America were only business, did the industry’s bidding, showing only the most feeble awareness of its consequences for the American ideal of free expression.

  • Ex-Googlers try to recreate Spanner with CockroachDB (awesome name!), which is A Scalable, Geo-Replicated, Transactional Datastore. The design is here and looks good. There's an article on Wired. Good discussion on HackerNews. A globally distributed transactional database it's not, yet, but it's early days yet. After all, they can only work in the dark.

  • Useful post on Handling 1 Billion requests a week with Symfony2. Symfony2 provides good performance and a nice development environment. HAProxy distributes to application servers. Varnish in every application’s server to keep high availability – without having a single point of failure (SPOF). Redis and MySQL for storing data. MySQL is mostly used as a third-tier cache layer (Varnish > Redis > MySQL) for non-expiring resources. 

  • The truest form of the Interest Graph on the net? Details on how Pinterest scales their data infrastructure to create a personalized discovery engine. 20 terabytes of new data each day. 10 petabytes of data in S3. 100 regular Mapreduce users run over 2,000 jobs each day through Qubole. 6 standing Hadoop clusters comprised of over 3,000 nodes. 

  • Just like Captain Kirk. Shifts In Algorithm DesignNow today, in the 21st century, we have a better way to attack problems. We change the problem, often to one that is more tractable and useful. In many situations solving the exact problem is not really what a practitioner needs. If computing X exactly requires too much time, then it is useless to compute it. A perfect example is the weather: computing tomorrow’s weather in a week’s time is clearly not very useful. The brilliance of the current approach is that we can change the problem. 

  • Wet Computing Could Put a Terabyte in a Tablespoon: Researchers from the University of Michigan and New York University demonstrated how plastic nanoparticles, deposited in a liquid, can form a one-bit cluster—the essential building block for information storage. It's called "wet computing," and the technique mimics other biological processes found in nature, like DNA in living cells.

  • Daniel Eloff: The world is not just going massively multicore, it's going heterogeneous core. The one core fits all model of programming is going away. Big performance and efficiency gains can be had from splitting your application among different types of specialized processors. Programmable hardware with FPGAs seems like a natural extension of this trend.

  • All the Queues right here for your perusal. 

  • Messages in the Deep. The fascinating story of underwater internet. We take for granted that we can reach across oceans with our internet connections. How did that happen? It's a great story of adventure and innovation.

  • Videos from Flocon 13 are now online.

  • Another case of the double faults. A few details on the BBC Online Outage, nothing specific unfortunately. Metadata servers and caching layers both failed at the same time. Note, they did have an emergency broadcast home page in case of failure. 

  • AWS SECURITY, FAILURE, AND POSTMORTEM. Eric Hotinger with an excellent list of strategies for keeping AWS secret and safe: Don’t just flip the switch to public; Never hardcode your credentials into the code; Use Users for fine-grained control; Use Groups to manage permissions of Users;  Grant the least amount of privileges at all times;  Use roles to share access between AWS components; Rotate password and credentials over time and regularly; use Cloud Watch, Cloud Trail, and SES.

  • Looks like a good data architecture reading list. I won't really know until I run some algorithms on it.

  • Why you should build an Immutable Infrastructure: Immutable infrastructure is comprised of immutable components that are replaced for every deployment, rather than being updated in-place. Those components are started from a common image that is built once per deployment and can be tested and validated. < Yields predictability, scalability, automated recovery.

  • DevOpsDays 2014 are coming onine

  • Great descritpion of Using GNU's GDB Debugger Memory Layout And The Stack. For those who want to go beyond a simple stack trace or dumping variables.

  • The Myth of the Sole Inventor: The point can be made more general: surveys of hundreds of significant new technologies show that almost all of them are invented simultaneously or nearly simultaneously by two or more teams working independently of each other. Invention appears in significant part to be a social, not an individual, phenomenon. Inventors build on the work of those who came before, and new ideas are often "in the air," or result from changes in market demand or the availability of new or cheaper starting materials. And in the few circumstances where that is not true – where inventions truly are "singletons" – it is often because of an accident or error in the experiment rather than a conscious effort to invent.

  • More quick links with Greg Linden. Fun stuff as usual.