Antirez: You Need to Think in Terms of Organizing Your Data for Fetching

Salvatore Sanfilippo wrote a response to Michel Martens' An Open Minded Reader. There's nothing in the post or response that's controversial. I was just struck at what a clear explication the conversation was on all the effort that goes into optimizing read paths. We optimize reads through denormalisation, a crazy quilt of caching layers, key-value databases, clustering of related tables, SSD/RAM, DHTs, moving functions to storage, secondary indexes, separating OLAP from OLTP, etc etc. We often focus so much on specific techniques that we can forget the bigger picture of what's going on. This little exchange made me look again at the forest, not just the trees.

Michel Martens:

What does it mean to use Redis as a traditional database? If it means to save all your data and expect to retrieve it later in new and creative ways, then we have to agree that better tools are available. It is one of Redis tradeoffs: you have to think in advance how you will want to get your data back. Another tradeoff has to do with space: Redis is not a good fit for Big Data. It's not even a good fit for Medium Data. You are in charge of making good use of the available memory, and there's still no elegant way to work around that limitation.

Antirez responds with:

Not at all, the whole idea of its data model, and part of the fact that it will be so fast to retrieve your data, is that you need to think in terms of organising your data for fetching. You need to design with the query patterns in mind. In short most of the times your data inside Redis is stored in a way that is natural to query for your use case, that's why is so fast apart from being in memory, there is no query analysis, optimisation, data reordering. You are just operating on data structures via primitive capabilities offered by those data structures. End of the story.
The reality is that fancy queries are an awesome SQL capability (so incredible that it was hard for all us to escape this warm and comfortable paradigm), but not at scale. So anyway if your data needs to be composed to be served, you are not in good waters.

Both posts are well worh reading.