This excellent survey of the field was written by Ian Thomas Varley as part of his Master of Science in Engineering program.
The aim of this paper is to explore the conceptual design space of non-relational databases as compared to traditional relational databases. It is clear that the design needs of the two paradigms are different, but how fundamental are the differences, and what strategies can we use to transition our conceptual designs from one to the other?
- What degree of normalization is sensible?
- Which entities participate in transactions together?
- Where are areas of high contention?
- What are the history requirements of the application?
- Is Eventual Consistency an option?
- Does a Hash Table already model your problem?
- Is the Entity/Attribute/Value pattern inherent in the data?
- Are there hierarchical or recursive relationships in the data?
- Are there natural functional boundaries to partition along?
- Are there compounding factors that might influence your design?
With a hefty amount of self-reflection behind you, not it's time to follow a few strategies:
- Logical Model First
- Consider Several Physical Approaches
- Keep It Simple
- Play It Safe
- Show Your True Consistency
- Stick To The Map (Reduce)
- Evolve Gracefully
The summary ends up on a good note I think. Key-value systems may be just a feature of a larger database management system instead of standalone product:
This author would advocate, therefore, that the developments exemplified by nonrelational databases should not remain an outside challenger to the legacy of relational databases, but should instead be researched, understood, and eventually, incorporated into a unified model. There's nothing to say that implementation as a key/value store shouldn't be part of the suite of implementation choices for a database whose data is structured relationally; likewise, there is room in the world of relational databases for the conceptual data design advantages offered by non-relational databases; the option to use optimistic concurrency control, to keep multiple versions of a cell per the columnar database model, to accept and support semi-structured (or run-time structured) data efficiently, to maintain multiple simultaneous values for a cell, and to scale across a cluster using some sort of ancestry or grouping relationship—these would all be conceptually coherent additions to the relational database world, provided the mathematical model for their incorporation is sound, and the configuration of the options is transparent and cohesive.