advertise
Wednesday
Jan192011

Sponsored Post: Percona, Appirio, Newrelic, Cloudkick, Membase, EA, Joyent, CloudSigma, ManageEngine, Site24x7

Who's Hiring?

Fun and Informative Events

  • Percona Live to be held in San Francisco February 16th, 2011. A one day event run by the experts behind the MySQL Performance Blog.
  • A new round of Membase meetups have been planned for January 2011 for San Diego, Denver, Seattle, Vancouver and Chicago.
  • O'Reilly' Strata Making Data Work Conference on February 1-3, 2011 Santa Clara, CA. Strata is a new conference from O'Reilly, focusing on the business and practice of data.

Cool Products and Services

Tuesday
Jan182011

Paper: Relational Cloud: A Database-as-a-Service for the Cloud

The Relational Cloud Project is an effort by a group of researchers at MIT to investigate technologies and challenges related to Database-as-a-Service within cloud-computing. They are trying to figure out how the advantages of the DaaS (Database-as-a-Service) model, that we've seen arise in other areas like OLAP and NoSQL, can be applied to relational databases. The DaaS advantages as they see them are: 1) predictable costs, proportional to the quality of service and actual workloads, 2) lower technical complexity, thanks to a unified and simplified service access interface, and 3) virtually infinite resources ready at hand. An interesting description of their approach is explained in the paper Relational Cloud: A Database-as-a-Service for the Cloud. From the abstract:

Click to read more ...

Friday
Jan142011

Stuff The Internet Says On Scalability For January 14, 2011

 Submitted for your reading pleasure...

  • On the new year Twitter set a record with 6,939 Tweets Per Second (TPS). Cool video visualizing New Year's Eve Tweet data across the world. 
  • Marko Rodriguez in Memoirs of a Graph Addict: Despair to Redemption tells a stirring tale of how graph programming saved the world from certain destruction by realizing Aritstotle's dream of an eudaimonia-driven society. Could a relational database do that?
  • The never never ending battle of good versus evil has nothing on programmers arguing about bracket policies or sync vs async programming models. In this node.js thread, I love async, but I can't code like this, the battle continues. In the end programmers desire async, but leave the bar with sync.
  • Quotable Quotes
    • @AmyDeLong: Walked into a starbucks and overheard 3 separate discussions all on scalability. #firstworldproblems #onlyinsf
    • @chvest: You may not need "high" scalability, but you should still consider your growth rates and prepare.

Click to read more ...

Tuesday
Jan112011

Google Megastore - 3 Billion Writes and 20 Billion Read Transactions Daily

A giant step into the fully distributed future has been taken by the Google App Engine team with the release of their High Replication Datastore. The HRD is targeted at mission critical applications that require data replicated to at least three datacenters, full ACID semantics for entity groups, and lower consistency guarantees across entity groups.

This is a major accomplishment. Few organizations can implement a true multi-datacenter datastore. Other than SimpleDB, how many other publicly accessible database services can operate out of multiple datacenters? Now that capability can be had by anyone. But there is a price, literally and otherwise. Because the HRD uses three times the resources as Google App Engine's Master/Slave datastatore, it will cost three times as much. And because it is a distributed database, with all that implies in the CAP sense, developers will have to be very careful in how they architect their applications because as costs increased, reliability increased, complexity has increased, and performance has decreased. This is why HRD is targeted ay mission critical applications, you gotta want it, otherwise the Master/Slave datastore makes a lot more sense.

The technical details behind the HRD are described in this paper, Megastore: Providing Scalable, Highly Available Storage for Interactive Services. This is a wonderfully written and accessible paper, chocked full of useful and interesting details. James Hamilton wrote an excellent summary of the paper in Google Megastore: The Data Engine Behind GAE. There are also a few useful threads in Google Groups that go into some more details about how it works, costs, and performance (the original announcement, performance comparison).

Some Megastore highlights:

Click to read more ...

Monday
Jan102011

Riak's Bitcask - A Log-Structured Hash Table for Fast Key/Value Data

How would you implement a key-value storage system if you were starting from scratch? The approach Basho settled on with Bitcask, their new backend for Riak, is an interesting combination of using RAM to store a hash map of file pointers to values and a log-structured file system for efficient writes.  In this excellent Changelog interview, some folks from Basho describe Bitcask in more detail.

The essential Bitcask:

Click to read more ...

Thursday
Jan062011

BankSimple Mini-Architecture - Using a Next Generation Toolchain

I know people are always interested in what others are using to build their systems. Alex Payne, CTO of the new startup BankSimple, gives us a quick hit on their toolchain choices in this Quora thread. BankSimple positions itself as a customer-focused alternative to online banking. You may remember Alex from the early days of Twitter. Alex was always helpful to me on Twitter's programmer support list, so I really wish them well. Alex is also a bit of an outside the box thinker, which is reflected in some of their choices:

Click to read more ...

Tuesday
Jan042011

Map-Reduce With Ruby Using Hadoop

Map-Reduce With Hadoop Using Ruby A demonstration, with repeatable steps, of how to quickly fire-up a Hadoop cluster on Amazon EC2, load data onto the HDFS (Hadoop Distributed File-System), write map-reduce scripts in Ruby and use them to run a map-reduce job on your Hadoop cluster. You will not need to ssh into the cluster, as all tasks are run from your local machine. Below I am using my MacBook Pro as my local machine, but the steps I have provided should be reproducible on other platforms running bash and Java.

Tuesday
Jan042011

Sponsored Post: Newrelic, Cloudkick, Strata, EA, Joyent, CloudSigma, ManageEngine, Site24x7

Who's Hiring?

Fun and Informative Events

  • A new round of Membase meetups have been planned for January 2011 for San Diego, Denver, Seattle, Vancouver and Chicago.
  • O'Reilly' Strata Making Data Work Conference on February 1-3, 2011 Santa Clara, CA. Strata is a new conference from O'Reilly, focusing on the business and practice of data.

Cool Products and Services

  • Newrelic - What are you doing to ensure the performance of your apps?
  • Cloudkick - monitor & manage your servers better with a FREE Cloudkick developer account.
  • Join two game developers in a Joyent-sponsored webinar, You’ve Got Game: Planning for the Successful Scale and Performance of Your Cloud-based Game http://bit.ly/eAPt2s
  • CloudSigma. Instantly scalable European cloud servers.
  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

Click to read more ...

Monday
Jan032011

Stuff The Internet Says On Scalability For January 3, 2010

 Submitted for your reading pleasure...

  • Quotable Quotes
    • @hofmanndavid: Performance and scalability anxiety makes developers want to catch the flying butterflies
    • @tivrfoa: "Scalability solutions aren't magic. They involve partitioning, indexing and replication." Twitter engineer
    • Alan Perlis: Fools ignore complexity; pragmatists suffer it; experts avoid it; geniuses remove it.
  • CIO update: Post-mortem on the Skype outage. Interesting tale of a cascading collapse in complex, distributed, interactive systems. For more background see the highly illuminating Explaining Supernodes by Dan York.
  • RethinkDB and SSD Databases. SSD was not a revolution by Kevin Burton. What’s really shocking to me, is that while SSD and flash storage is very exciting, it wasn’t as revolutionary in 2010 as I would have liked to have seen.
  • The case for Datastore-Side-Scripting. Russell Sullivan predicts real-time web applications are going in the direction of being entirely event driven, from client (WebSockets) to web-server (Node.js) to datastore (Redisql). And to complete the even driven chain is datastore-side-scripting.
  • Developments that could change everything...

    Click to read more ...

Friday
Dec312010

Facebook in 20 Minutes: 2.7M Photos, 10.2M Comments, 4.6M Messages

To celebrate the new year Facebook has shared the results of a little end of the year introspection. It has been a fecund year for Facebook:

  • 43,869,800 changed their status to single
  • 3,025,791 changed their status to "it's complicated"
  • 28,460,516 changed their status to in a relationship
  • 5,974,574 changed their status to engaged
  • 36,774,801 changes their status to married

If these numbers are simply to large to grasp, it doesn't get any better when you look at happens in a mere 20 minutes:

Click to read more ...