Paper: No Relation: The Mixed Blessings of Non-Relational Databases

This excellent survey of the field was written by Ian Thomas Varley as part of his Master of Science in Engineering program.

The aim of this paper is to explore the conceptual design space of non-relational databases as compared to traditional relational databases. It is clear that the design needs of the two paradigms are different, but how fundamental are the differences, and what strategies can we use to transition our conceptual designs from one to the other?

There are a few things to like about this paper. A running a example is used to show the different ways to model data depending on which type of solution you are targeting, especially covering how many-to-many relationships are modeled, data integrity, and how to support optional attributes. There's also a brief survey of some of the major systems.The most interesting section of the report is where it tackles the problem of design for non-relational systems. The approach has two different phases: design questions and design strategies.The questions you should ask yourself about your problem are:

  1. What degree of normalization is sensible?
  2. Which entities participate in transactions together?
  3. Where are areas of high contention?
  4. What are the history requirements of the application?
  5. Is Eventual Consistency an option?
  6. Does a Hash Table already model your problem?
  7. Is the Entity/Attribute/Value pattern inherent in the data?
  8. Are there hierarchical or recursive relationships in the data?
  9. Are there natural functional boundaries to partition along?
  10. Are there compounding factors that might influence your design?

With a hefty amount of self-reflection behind you, not it's time to follow a few strategies:

  1. Logical Model First
  2. Consider Several Physical Approaches
  3. Keep It Simple
  4. Play It Safe
  5. Show Your True Consistency
  6. Stick To The Map (Reduce)
  7. Evolve Gracefully

The summary ends up on a good note I think. Key-value systems may be just a feature of a larger database management system instead of standalone product:

This author would advocate, therefore, that the developments exemplified by nonrelational databases should not remain an outside challenger to the legacy of relational databases, but should instead be researched, understood, and eventually, incorporated into a unified model. There's nothing to say that implementation as a key/value store shouldn't be part of the suite of implementation choices for a database whose data is structured relationally; likewise, there is room in the world of relational databases for the conceptual data design advantages offered by non-relational databases; the option to use optimistic concurrency control, to keep multiple versions of a cell per the columnar database model, to accept and support semi-structured (or run-time structured) data efficiently, to maintain multiple simultaneous values for a cell, and to scale across a cluster using some sort of ancestry or grouping relationship—these would all be conceptually coherent additions to the relational database world, provided the mathematical model for their incorporation is sound, and the configuration of the options is transparent and cohesive.