Hot Scalability Links for Aug 6, 2010

  • Twitter Sees Its 20 Billionth Tweet writes  Marshall Kirkpatrick of ReadWriteWeb.
  • Startups die for not having customers, so STOP thinking about how to scale. Alessandro Orsi says focusing on the architecture and scaling possibilities of your app for millions of users is just plain dumb...concentrate on marketing...concentrate on user experience. Alessandro is perfectly correct, but this isn't the year the 2000 when the default architecture that is easy is also not scalable and when sites were built from scratch one painful user at a time.  Today neither is tue. In the era of social networks, where Facebook has 500 million users, successful applications can and often do spike to millions of users seemingly overnight. And you have to have some architecture. With today's tool-chains you don't have to choose easy and non-scalable. There are other options. Of course, it's all pointless without customers and that is what you need to worry about, but it's a false choice in this era to think that's all you have to worry about.
  • Node.js: JavaScript on the Server. Ryan Dahl talks about how to handle thousands of connections with server side JavaScript. It seems a little strange to still be talking about this same kind of stuff--event loops, async vs sync, thread pools, processes vs threads, etc--after 20 years, but Ryan does a really good job framing the issues. In the end applications are about state machines, so those nasty abstractions arise somewhere. You can't hide behind event callbacks, it's never enough.
  • Tech Talks presented at the North American Faculty Summit. Includes: Storage Architecture and Challenges, Cloud Computing and Software Security, Engineering Private Spaces Online, Defeating the Password Anti-Pattern with Open Standards, Security at Scale, Anatomy of a Large-Scale Social Search Engine.
  • A Retrospective on SEDA. Matt Welsh takes a look back on his very influential paper on large scale distributed system architectures and what he would do differently. Achieving good, robust performance across a wide range of loads is the real challenge.
  • Database Scalability Patterns by Robert Treat. Awesome coverage of Vertical Scaling; Horizontal Partitioning; Horizontal Scaling; Read Slaves; Multi-Master; Vertical Partitioning; Federated Data Storage; Database Life-cycle; OLAP vs OLTP; application type; Cloud; tools.
  • MongoDB Schema Design. Alex Popescu collects a list of NoSQL data modeling sources.
  • Google Wave and Network Effects. Most interesting discussion from Dare Obasanjo on how in a social network world using invite scarcity to grow a user base fails because users can't port their network over. Without your peeps who will you talk to? Strangers?
  • Beyond Locks and Messages: The Future of Concurrent Programming by Bartosz Milewski. Threads are out (demoted to latency controlling status), tasks (and semi-implicit parallelism) are in. Message passing is out (demoted to implementation detail), shared address space is in. Locks are out (demoted to low-level status), transactional memory is in.
  • Seattle Hadoop Day on August 14th. There's a killer line-up of speakers from Facebook, BackType, Amazon, and more. There's also several hours of intensive, hands-on training.  And it's by and for the community.
  • The Pathologies of Big Data by Adam Jacobs. Scale up your datasets enough and all your apps will come undone. What are the typical problems and where do the bottlenecks generally surface?