Hot Scalability Links for July 17, 2010
And by hot I also mean temperature. Summer has arrived. It's sizzling here in Silicon Valley. Thank you air conditioning!
- Scale the web by appointing a Crawler Czar? Tom Foremski has the idea that Google should open up their index so sites wouldn't have to endure the constant pounding by ravenous crawler bots. Don MacAskill of SmugMug estimates 50% of our web server CPU resources are spent serving crawlers. What a waste. How this would all work with real-time feeds, paid feeds (Twitter, movies, ...), etc. is unknown, but does it make sense for all that money to be spent on extracting the same data over and over again?
- Tweets of Gold:
- jamesurquhart: Key to applications is architecture. Key for infrastructure supporting archs is configurability. Configurability==features.
- tjake: People who choose their datastore based oh hearsay and not their own evaluation are doomed.
- b6n: No global lock ever goes unpunished.
- MichaelSurtees: scalability, systems & process feed each other right?
- jamesgolick: Statements like: "NoSQL database systems are designed for scalability." make me sad.
- agastiya: Focus on stability and features first, scalability and manageability second, per-unit performance last of all. This is a quote from Jeff Darcy
- bizarroargv0: I find all this talk of distributed systems a little unsettling. When it comes to data, sometimes you just want a powerful master.
- "You can never think about scale too early," says Facebook VP of Technical Operations Jonathan Heiliger. Others say scaling is a problem that's good to have, don't worry, be happy, but Jonathan thinks planning ahead has some value: "We’ve thought really far ahead but we’ve also punted on really critical things that we needed to do. Now we’re under the gun rather than being able to do them on our own time.”
- Why does Quora use MySQL as the data store rather than NoSQLs such as Cassandra, MongoDB, CouchDB, etc? Possible reasons: MySQL works, NoSQL is a fad, NoSQL is unstable, Quora isn't Facebook so scale-up works.
- Map rendering on EC2 by Andy Allan. EC2 is, more or less, exactly not what you want from a tileserver. Expensive to run, slow disks. So why is it popular? First off is buzzwords – cloud, scalable and so on.
- Scalability. If we wanted to let any of our eight million active users sort their comments history by score, we'd need $20,000 in dedicated servers. I love how Reddit is so honest about the cost of features in their system and how they balance what to do and what not to do based on the ROI.
- In defence of SQL at Seldo.com. So go forth, use your OMADS, keep an RDBMS in your back pocket, and stop being so mean to poor old SQL.
- Google Pregel Graph Processing by Ricky Ho. Pregel can be thought as a generalized parallel graph transformation framework. I think the Pregel model is general enough for a large portion of classical graph algorithm.
- Royans Tharakan summarizes a talk by Theo Schlossnagle: Thoughts on scalable web operations. Covers: optimization, tools, cookies, datastores, automation, revision control, networking, caching, people, systems, and moderation.
- Amazon opens up a high performance compute grid, but it will cost you says Miha Ahronovitz. More perspective on the release by James Hamilton.
- Wordnik passes 9 billion record mark with MongoDB. More on Hacker News.
- Going to a silicon beach this summer? The folks at GigaOM have a summer reading list for you. The "Upside of Irrationality” is on my wish list.