Stuff The Internet Says On Scalability For September 30, 2011

You deserve a HighScalability today:

  • Tumblr > Wikipedia
  • Potent quotables:
    • @tokutek : Yelp generates close to 400 GB of compressed logs per day according to @petersirota of Amazon #Strataconf #BigData. More at From Under the Desk to the Cloud
    • @LHK_ITRG : Massive scalability: 80,000 users on a single AppSense server. I think that should do...
    • @solarce : OH: "Automation is a great way to distribute failure across the system" #surgecon
    • palominodb : #surgeconf - DataDog presenting on their "Data Mullet" All SQL in front, NoSQL party in the back. Classic.
    • Ryan Dahl : I hate almost all software
  • Software Design Glossary. Apparently Kent Beck didn't get the memo, only algorithms matter now, software engineering is dead. In case you don't feel that way, Kent wrote a short glossary of important software design concepts. Also, Screaming Architecture by Bob Martin.
  • Improving Percona Server performance with Flashcache on the Virident tachIOn Drive. If you are thinking of using Flashcache as storage for MySQL, Percona found: When the workload is read-only and the working set fits into the flash drive, Flashcache provides a significant improvement in performance over a RAID device; When the workload has a significant random write component, Flashcache provides a significant performance improvement over RAID when the working set fits into the buffer pool. The poor write performance may be because of Facebook's tuning for their read-heavy load and the dm layer in Linux may have problems. Also, Intel 320 SSD write performance: write performance on Intel 320 SSD is decreasing in time, and actually it is quite unpredictable.
  • So why use flash at all? I asked Baron Schwartz this question and he said because RAM is too costly and uses too much power to use in large configurations. Disks may be slow, but they're cheap.  Flash is a good bridge between disks and RAM for some workloads.
  • Will VMs and security move to the chip, thus radically flattening the layers in the application stack? Chris Hoff (no relation that I know of) thinks so. More at Flying Cars & Why Hypervisor Is A Ride-On Lawnmower. There's really no reason for an OS anymore. Apps can run directly on the VM.
  • Checkout the slidecks for the Talk Cloudy to Me mini-conference from last Saturday. This event is the second one day cloud event put on by Sebastian Stadil, founder of Scalr, as part of the Cloud Computing Meetup group, also created by Sebastian. Sebastian has become a master at running these mini-conference style events. Really a quality job by him and his dedicated crew. These events are free, sponsored by various vendors; they are short, 11-5; the food is good, Thai; the venue is nice, eBay; they are on topic, with cloud and other speakers giving 30-45 minute talks. More on the event when the video comes out. 
  • Optimizing your CouchDB Calls by 99%. Tim Anglade found that the average response time for their entire Ruby userbase is 2–3x higher than in other languages. Part of the problem is using CouchDB as a drop-in MySQL replacement. To fix: 1) Use only the basic methods of CouchRest or a more basic library, it adds a lot of cruft 2) Load appropriate JSON library to prevent Ruby IO 3) Keep your database local 4) If you can use HTTP instead of HTTPs 5) Use MessagePack, which is a binary JSON. 6) Object Mappers are bad 7) Move logic to the database 8) Use a full-text search system to offload queries 9) Stale = OK.
  • Lots of the presentations from the 27th IEEE (MSST2011) Symposium on Massive Storage Systems and Technologies. Very many uses of the words petabyte, exascale, cloud storage, lessons learned, and I/O challenges.
  • The ever gracious Stacey Higginbotham writes on Scaling lessons from Google’s CIO. It's hard to churn out articles and keep the writing quality high, but Stacey manages to do it. Lessons: 1) Talk - put all people on a project in the same room, 2) Hubris - lose it, base assumption on evidence not arrogance, 3) Explore - with every project you are entering an unknown world, pack appropriately.
  • Bret Taylor gets cookin' with an open graph demo app build using python, tornado, S3, and of course Facebook's new open graph API. Looks tasty. I'll be curious to see how all this works out.
  • Chris Dixon pulls up a chair and shares some lessons learned: get rejected more; climb the right hill; create an amazing toy; grow that toy into something big that transforms an important industry. I have a hunch this just might work.
  • Dimensions to use to compare NoSQL data stores. Huan Liu with a nice breakdown of how to look at NoSQL by looking at Data Models, Consistency, Atomic test-and-set, Secondary index, Manageability, Latency vs. durability, Read vs. write performance, Dynamic scaling, Auto failover, Auto load balancing, Compression support, Range scan, and Failure scenarios.
  • GitHub scales their employees. No, that's not it, GitHub pronounces their method of managing people scales. That's better. At 40 people GitHub still works without: enforcing hours, hiring managers, telling people what to work on, and planning features.
  • How We Built Our Real-Time, Location-Based Urban Geofencing Game. Amber Case, co-founder of Geoloqi, with an awesome report on a how they built an app that uses the real-world as a gameboard using their API, Socket.io, Node.js, Redis, and Sinatra Synchrony. Definitely worth a look if you are thinking of making a geo + mobile app.
  • Reliable distributed storage. LinkedIn's Alex Feinberg shows how Project Voldemort has helped solve LinkedIn's  big data problems.
  • Book Review: Scalability Rules. Good review of a good book.
  • Visualizing a week of check-ins at Foursquare. No matter how hard I try these things always look like how nuclear strikes are shown in movies. Repeated explosions dot the world until the future is so bright you have to wear shades. But that's just me. Foursquare's numbers and reach are quite impressive.
  • Update: Cachismo, a memory cache which can be used as an replacement for memcached, has been implemented  in c and made open source.
  • DataStax taking Cassandra 1.0 to the next level by adding compression, Off-heap row cache, self-tuning, and faster disk space reclamation.
  • The new mafia is your computer: The Inside Story of the Kelihos Botnet Takedown
  • Curt Monash asks: Are there any remaining reasons to put new OLTP applications on disk? If you’re planning an OLTP system with a many-year lifespan today, of course you should assume solid-state storage.