Hot Scalability Links for April 30, 2010

  • I  Want a New Data Store. Jeremy Zawodny of Craigslist wants a new  database, one that can do what it should: perform alter table operations faster, has  efficient queries when most of the data is on disk and not in RAM, and  matches their data that now looks more document oriented than  relational. A lot of people willing to help.
  • Computer Science Unplugged. An extensive collection of free resources that teach principles  of Computer Science such as binary numbers, algorithms and data compression through engaging games and puzzles that use cards, string, crayons and  lots of running around. And it's free! Fascinating Interview  with Tim Bell on teaching complex computing concepts, creating makers not just users, and how to change schools. From O'Reilly Radar.
  • Akamai’s  Network Now Pushes Terabits of Data  Every Second. Akamai handles 12  million requests per second, logs more than 500  billion  requests for content per day, and sends 3.45  terabits per second of data.
  • Google’s  MapReduce Programming Model — Revisited.  We reverse-engineer  the seminal papers on MapReduce and Sawzall, and we capture our  findings as an executable specification.
  • Facebook  Flashcache. James Hamilton describes "a simple write  back persistent block cache designed to accelerate reads and writes  from slower rotational media by caching data in SSD."
  • Pigz – parallel gzip OMG. John Allspaw plays with a single core and multicore version of gzip. With one core zipping a 418m log file took 12.4 seconds. On a 16 core machine with parrallel gzip it to took 1.6 seconds.
  • Hadoop Meetup Videos. Videos on Using Hadoop to fight spam at Yahoo! Mail, Hive/HBase integration, Public Terabyte Dataset Project - Web crawling with Amazon's EMR.
  • How  TokuDB Fractal Tree Indexes Work by Bradley C. Kuszmaul.  Fractal Trees are functionally equivalent to B-trees, but run significantly faster.  Fractal Trees convert random I/O, which involves painfully slow disk seeks, into sequential I/O, which provides up to two orders of magnitude more performance.
  • Scaling  writes in MySQL by Philip Tellis. After partitioning (12 partitions per day, 2 hours of data per partition) we were able to sustain an insert rate of around 8500 rows per second.
  • Attempts at Analyzing 19 million documents using MongoDB map/reduce by Steve Eichert. We still feel that Mongo will be a great place to persist the output of  all of our Map/Reduce steps, however, we don’t feel that it’s well  suited to the type of analysis that we want to do.
  • TR10,  and a Bloom/BOOM FAQ. BOOM is the name of a research project based at Berkeley, which seeks to  enable programmers to build Orders Of Magnitude bigger systems in  O.O.M. less code. Our focus is on enabling developers to easily harness  the power of many computers at once, e.g. in the setting of cloud  computing.
  • Storing log messages in Hadoop by Peter Dikant. Using hadooop to store 20 TB of data.
  • Ceph: The Distributed File System Creature from the Object Lagoon by Jeffrey B. Layton.  Ceph is a distributed parallel file system promising  scalability and performance, something that NFS lacks.
  • Horizontal Scalability via Transient, Shardable, and Share-Nothing  Resources by Adam Wiggins. Now is the time of horizontal scalability achieved by using resources  that are transient, shardable and share nothing with other resources. He  gives as example several applications and a language: memcached,  CouchDB, Hadoop, Redis, Varnish, RabbitMQ, Erlang, detailing how each  one applies those principles.

If you would like to advertise a product, job, or event, please contact us for more information.