Stuff The Internet Says On Scalability For April 22, 2011
Submitted for your reading pleasure on the day, deep breath, before Dr. Who invades the USA...
- The Great SkyNet Day Amazon Downtime roundup: Detailed thread on Hacker News; Who is affected by EC2?, Amazon Web Services Starting to Come Back Online but Problems Persist and Questions Unfold, AWS is down: Why the sky is falling, Amazon confirms the cause is not SkyNet, Mayan prophecy still in play; Major Amazon Outage Ripples Across Web; Amazon Server Troubles Take Down Reddit, Foursquare & HootSuite; Working around the EC2 outage; Many AWS Sites Recover, Some Face Longer Wait, Amazon.com’s real problem isn’t the outage, it’s the communication, Developer pain revealed on the Amazon Developer Forum.
- Poll results are in for How Much Do You Consider Scalability When Building a New Application? 15.6% say have a years worth of food at home and a bugout location selected, 34.86% say spend a few spare brain cycles, 26.61% say work first scale later, 19.27% are happy go lucky, 3.67% trust in the cloud for safety. Where are you?
- Unlocked Achievements: Dropbox Hits 25 Millions Users, 200 Million Files Per Day, Lucene's FuzzyQuery is 100 times faster, SimpleGeo frees 20 Million Places from carbonite, IRS says One Billion tax returns filed electronically
- Quotable Quotes:
- @DrErnie: Scalability is the ability to use abstraction; efficiency is the ability to avoid abstraction. Performance comes from knowing which to use.
- @iPadSweet: Scalability is about building wider roads, not about building faster cars. – Steve Swartz
- @Simon_Sara: Can you add sales without adding employees? That’s a good clue to scalability.
- 'This kind of "I broke things, so now I will jiggle things randomly until they unbreak" is not acceptable.' - Linus Torvalds
- @colsonwhitehead: The problem, then, is scalability. (I just had a happy thought.)
- @SNagdy: Scalability (both ways) and replicability are key in a sound business model.
- You know economic incentives are strange when it makes financial sense to drill through the center of the earth to link together financial trading systems. Jim Stogdill explores this idea in Quantum trading! And tunnels through the Earth!
- Excellent discussion on Cassandra Consistency Models in the NoSQL Databases list. When does a reader really read what a writer wrote? More goodness in this Cassandra Database Modeling thread.
- Robert Scoble with a beautiful Photo tour of Facebook’s new datacenter. Impressive in that way highly rational designs tend to be. Clear, ordered, symmetrical. Looking at all the caging I can't help but think those computers would like a jail break...
- Google App Engine or Amazon Elastic Beanstalk or CloudBees RUN@Cloud? Java PaaS shootout by Michael Yuan and Ringful Health, with a useful comparison between the platforms. If you are developing a new application and can live with GAE's constraints, GAE is an excellent and free choice. RUN@Cloud and Elastic Beanstalk are interexchangable runtimes at the application level.
- Node.js beats tornado and django. At least that's what Swizec Teller found in his Benchmarking node, tornado and django for concurrency. Conclusion: node is fast but a bit unpredictable, django fails a lot, tornado is mediocre.
- An early-adopter found HBase is not ready for Primetime. Ryan Rawson with a poignant wrap-up: To bring it back to the original point and a high level view, the fact is that HBase is not Oracle, nor MySQL. It doesn't have multiple decades, and furthermore distributed systems are inherently more
difficult (more failure cases) than single node DBs.
- High-performance Timing on Linux / Windows. Tom Distler is right, accurate high-performance timing is hard. Tom helps with a good look at the APIs on how to properly handle interval timing (i.e. accurately measuring the duration between 2 events). This is different than synchronizing different clocks or maintaining accurate wall time.
- How do large-scale sites and applications remain SQL-based? by by Michael Rys. Scale-out applications with SQL are being built using similar architectural principles as scale-out applications using NoSQL while providing more mature infrastructure for declarative query processing, optimizations, indexing, and data storage/high availability.
- Great story on how Lucene's FuzzyQuery became 100 times faster. It did not involve a butler, in the pantry with a candlestick, but it did involve a mystery paper describing a radical new algorithm, and obscure code base, and a race to make it all work.
- FourSquare's Kushal Dave with MongoDB strategies for the disk-averse. In the end, we settled on building locality at the application level. Every time we need to record the data for a given venue at a given hour, we align it to a five-hour period. Also learn about Building a recommendation engine, foursquare style. Plus some MongoDB Database Internals.
- The never ending thread and threads being evil: Why might threads be considered “evil”? Is the evil in the soul of the coder or the code itself?
- A new NoSQL Database: Orderly - a row key schema system (composite keys, etc) for use with HBase. The goal of this project is to produce extremely space efficient byte serializations for common data types while ensuring that the resulting byte array sorts correctly. More discussion here.
- A comprehensive study of
- Convergent and Commutative Replicated Data Types . This paper formalises asynchronous object replication, either state based or operation based, and provides a sufficient condition appropriate for each case.
- Charles Humble summarizes Coherence's New Elastic Data Feature: There's basically no performance delta between in memory and flash for us, thanks to some R&D we did around flash devices and Java's NIO capabilities. So we're able to drive solid state storage devices at a very, very high rate.
- Deploying a massively scalable recommender system with Apache Mahout. Sebastian Schelte shows us how to deploy Amazon's Item Based Recommender system for yourself on commodity infrastructure. Good exploration of the scalability issues involved in the solution: Increase of data in the similarity computation; Increase of requests to the live machines; Increase of data and users; Increase of items.
- Herstory: Fascinating 60 minute interview with Grace Hopper, developer of the first compiler and the programming language COBOL. (reddit, youtube.com) - A ship in a port is safe. But that is not what ships are built for.
- A 100 core processor anyone? Tilera has one for you by eliminating the dependence on a bus, and instead puts a non-blocking, cut-through switch on each processor core, which connects it to a two dimensional on-chip mesh network called iMesh™ (Intelligent Mesh). Sounds cool.