- Life beyond Distributed Transactions: an Apostate’s Opinion by Pat Helland. In particular, we focus on the implications that fall out of assuming we cannot have large-scale distributed transactions.
- Tragedy of the Commons, and Cold Starts - Cold application starts on Google App Engine kill your application's responsiveness.
- Intel’s 1M IOPS desktop SSD setup by Kevin Burton. What do you get when you take 7 Intel SSDs and throw them in a desktop? 1M IOPS
- Videos from NoSQL Berlin sessions. Nicely done talks on CAP, MongoDB, Redis, 4th generation object databases, CouchDB, and Riak.
- Designs, Lessons and Advice from Building Large Distributed Systems by Jeff Dean of Google describing how they do their thing. Here are some glosses on the talk by Greg Linden and James Hamilton. You really can't do better than Greg and James.
- Advice from Google on Large Distributed Systems by Greg Linden. A nice summary of Jeff Dean's talk. A standard Google server appears to have about 16G RAM and 2T of disk; Things will crash. Deal with it!; When designing for scale, you should design for expected load, ensure it still works at x10, but don't worry about scaling to x100.
- Jeff Dean: Design Lessons and Advice from Building Large Scale Distributed Systems by James Hamilton. A data center wide storage hierarchy; Failure Inevitable; Excellent set of distributed systems rules of thumb; Typical first year for a new cluster; GFS Usage at Google; Working on next generation Big Table system called Spanner.
This excellent survey of the field was written by Ian Thomas Varley as part of his Master of Science in Engineering program.
The aim of this paper is to explore the conceptual design space of non-relational databases as compared to traditional relational databases. It is clear that the design needs of the two paradigms are different, but how fundamental are the differences, and what strategies can we use to transition our conceptual designs from one to the other?
Digg has been researching ways to scale our database infrastructure for some time now. We’ve adopted a traditional vertically partitioned master-slave configuration with MySQL, and also investigated sharding MySQL with IDDB. Ultimately, these solutions left us wanting. In the case of the traditional architecture, the lack of redundancy on the write masters is painful, and both approaches have significant management overhead to keep running.
Since it was already necessary to abandon data normalization and consistency to make these approaches work, we felt comfortable looking at more exotic, non-relational data stores. After considering HBase, Hypertable, Cassandra, Tokyo Cabinet/Tyrant, Voldemort, and Dynomite, we settled on Cassandra.
Each system has its own strengths and weaknesses, but Cassandra has a good blend of everything. It offers column-oriented data storage, so you have a bit more structure than plain key/value stores. It operates in a distributed, highly available, peer-to-peer cluster. While it’s currently lacking some core features, it gets us closer to where we want to be than the other solutions.
GemStone's website recently recieved a major facelift over at www.gemstone.com. I felt that the users of this site might find our detailed description of how we solve the hardest problems in data management interesting. This can be viewed at: http://www.gemstone.com/hardest-problems (PDF available for download).
Matt, from the ever excellent MySQL Performance Blog, decided to run a test using a simple scenario drawn from his client experience in the gaming space. The scenario: read a row based on a primary key, update the row, write it to disk, and use the row to lookup another row. Matt ran three different tests explained in a series of three different articles: MySQL and MySQL + Memcached, Memcached Only, and Tokyo Tyrant.
The lovingly compiled details along with many cool graphs are in the articles, but in general the lessons learned are:
Companies earnings outstrip forecasts, consumer confidence is retuning and city bonuses are back. What does this mean for business? Growth! After the recent years of cost cutting in IT budgets, there is the sudden fear induced from increased demand. Pre-existing trouble points in IT infrastructures that have lain dormant will suddenly be exposed. Monthly reporting and real time analytics will suffer as data grows. IT departments across the land will be crying out “The engine canna take no more captain”. What can be done?
When you are on the bleeding edge of scale like Facebook is, you run into some interesting problems. As of 2008 Facebook had over 800 memcached servers supplying over 28 terabytes of cache. With those staggering numbers it's a fair bet to think they've seen their share of Dr. House worthy memcached problems.
Jeff Rothschild, Vice President of Technology at Facebook, describes one such problem they've dubbed the Multiget Hole.
You fall into the multiget hole when memcached servers are CPU bound, adding more memcached servers seems like the right way to add more capacity so more requests can be served, but against all logic adding servers doesn't help serve more requests. This puts you in a hole that adding more servers can't dig you out of. What's the treatment?
Caching/data-grids are going through a similar evolution to databases. As with databases, we started by using caching as an embedded service to the application. Now we are in the phase where we need to be able to share the data between multiple applications, or in cases where we don’t want to share the data, we need to be able to share the resources for managing the data, while keeping a high degree of isolation.
The demand for these sort of requirements becomes much more common with SOA or SaaS-based applications. As we approach the next generation of middleware and data-centers, it becomes clear that we cannot move to the next wave of virtualization and cloud computing without a strong security and isolation solution that is built-in to all layers of our application and middleware.
Disk-oriented approaches to online storage are becoming increasingly problematic: they do not scale grace-fully to meet the needs of large-scale Web applications, and improvements in disk capacity have far out-stripped improvements in access latency and bandwidth. This paper argues for a new approach to datacenter storage called RAMCloud, where information is kept entirely in DRAM and large-scale systems are created by aggregating the main memories of thousands of commodity servers. We believe that RAMClouds can provide durable and available storage with 100-1000x the throughput of disk-based systems and 100-1000x lower access latency. The combination of low latency and large scale will enable a new breed of data-intensive applications.
The essence of my work is coming into daily contact with innovative technologies. A recent example was at the request of a partner company who wanted to answer- which one of these tools will best solve my virtualized datacenter headache? After initial analysis all the products could be classified as tools that troubleshoot VM sprawl, but there was no universally accepted term for them. The most descriptive term that I found was Virtual Resource Manager (VRM) from DynamicOps. As I delved deeper into their workings, the distinction between VRMs and Private Clouds became blurred. What are the differences?