Book

The DynamoDB Book: An Interview With Alex DeBrie On His New Book

You know nothing about DynamoDB. At least that’s what I realized the first time I heard Rick Houlihan give his now infamous talk at AWS re:Invent 2018 on Amazon DynamoDB Deep Dive: Advanced Design Patterns for DynamoDB.

In that talk Rick revealed for the first time the inner arcana of single-table design. Minds were blown. Weaknesses were revealed. Futures were changed.

As a mere novice in the ways of DynamoDB I realized there were many levels of understanding needed before one could become a true AWS Data Hero. For that we need a guide.

Our guide on the Hero’s Journey that is mastering DynamoDB is a wise young wizard named Alex DeBrie. Alex wrote what you might consider to be the Gnostic Gospels of DynamoDB: The DynamoDB Book.

You will know something after reading this book

But it's more than just a book. You can’t buy it on Amazon. Instead, Alex uses Gumroad to offer packages at three different price points along with a team option. Each level provides additional content:

Basic package ($79). A 450 page book and six cheatsheets.
Plus package ($129). 60-page Analytics supplement guide. 60-page Operations supplement guide. Five deployable code implementations.
Premium package ($249). Video walkthroughs of every chapter.

Use the code "HIGHSCALABILITY" at checkout and you’ll shave off $20 for Basic, $30 for Plus, and $50 for Premium. You’re welcome.

When Alex asked for advice about what to include in the book, I made a big deal about including complete working code examples. Alex delivered. So I thought it was only fitting that I buy the Premium package.

But it’s so much money, you might complain. As someone who has made not much money writing a book I have some idea how much effort Alex put into creating these materials. It’s a lot of work. As a profession we must be willing to pay for our tools. So as a professional programmer wanting to create professional products—I’m very happy with the result. Here’s why.

What’s great about SQL is you can find an answer to damn near any question you have. Not so with DynamoDB. Working code examples are rare.

I remember Rick talking about using “between” when making queries. Never heard of such a thing and it took me forever to find a complete working code example of what he was talking about. Don’t even talk about the documentation. What you find are ridiculous command line examples like “aws dynamodb query…” What use is the command line for this sort of stuff?

Show the code! And Alex does.

I think you’ll find the 5 code examples—Big Time Deals, E-Commerce, Github Migration, Github Node, and Session Store Node—worth the price of admission. First, they tackle complex domains so they might already solve a problem you’re having. That’s gold. Second, you’ll learn a lot by just reading the code. As a not so great Javascript developer I know I learned a lot.

What about the Holy Grail of single-table design? If you have that sneaking suspicion you’re using DynamoDB incorrectly and you’ve tried to learn single-table design before, but got frustrated, Alex has your back. The book is filled with clear examples backed by strategies for various use cases.

Against my expectations the videos turned out to be very useful. Alex’s tone was great. It was just like he was next to me explaining what he was doing. And that’s exactly what you want from a teaching product.

Here's my email interview with Alex DeBrie on The DynamoDB Book. Enjoy.

Please tell us who you are and what you've brought to show and tell today?

I’m Alex DeBrie. I recently wrote and self-published The DynamoDB Book, which is the most thorough guide to data modeling with NoSQL and DynamoDB (or so my mom tells me).

I see from your colorful costume that you are an AWS Data Hero. Can you tell us a little about the hero’s journey that it took to understand DynamoDB?

My hero’s journey is like many others -- it couldn’t have happened without a mentor (Rick Houlihan), a road of trials (many failed data models), and finally, a return with the elixir (The DynamoDB Book).

Joking aside, my journey was haphazard with a lot of false starts. I started using DynamoDB because it worked so well with serverless applications. After a year or two of thinking I was pretty good at this NoSQL thing, Rick Houlihan’s talk at AWS re:Invent disabused me of that idea. I spent a month diving deep into DynamoDB and wrote DynamoDBGuide.com, which is a friendlier look at the DynamoDB documentation. That took on legs of its own and people started reaching out with more and more DynamoDB questions. I wrote a few articles, gave some talks at AWS conferences, and here we are today.

After spending a good chunk of your life writing this book, please summarize it in just a few sentences so people will know why it should matter to them.

The DynamoDB Book is a comprehensive guide to data modeling with DynamoDB. A lot of folks think DynamoDB is just a key-value store, or that you can’t model relationships in DynamoDB. That’s complete nonsense, and the book demonstrates how mistaken those folks are. We cover common strategies and patterns like handling one-to-many relationships, many-to-many relationships, and aggregations. It also includes five full walkthrough examples to put it into practice.

DynamoDB the good parts. What’s in the list?

There are so many ways in which DynamoDB is a huge improvement on existing databases that I never want to go back. First off, the pricing model. You pay for read and write operations directly, rather than guessing at how your application usage will translate into CPU and RAM resources. You can even do completely pay-per-use billing with DynamoDB -- what other database can do that?

I also love how well it fits in modern architectures. It feels just like using any other AWS service. You provision it with infrastructure-as-code. You manage permissions with AWS IAM. You interact with it via the AWS SDK. All these things are highly desirable to me.

I got into DynamoDB because of how well it worked with serverless compute. Traditional databases weren’t really meant for thousands of tiny compute instances spinning up, making connections, and disappearing. DynamoDB handles it perfectly.

But my favorite part is probably how hands-off it is if you do it right. DynamoDB basically won’t let you write a query that won’t scale. This feels limiting at first, but it’s nice to know you won’t have to put out a fire due to a SQL query that is getting slower and slower as your data set grows. If your query is efficient today with 1GB of data, it will be efficient tomorrow with 10TB of data.

I’ve even gotten to the point where I really enjoy the data modeling. There’s a learning curve for sure. But once you know what you’re doing, it’s kind of fun to design a model.

Single-table design is what sets DynamoDB apart. Do you remember when you first heard Rick Houlihan turn the world upside down by explaining single-table design? How did you feel when you realized you had no idea how DynamoDB really worked?

It’s impossible to forget the first time you hear a Rick Houlihan talk. Mine was in December 2017. I would listen to sessions from AWS re:Invent on my commute to work. The first 20 minutes, I was nodding along. Then I thought there was some mistake. Put all of your items into a single table? Absurd!

When I got to a computer, I watched the talk with the video and slides. Sure enough, everything I knew about DynamoDB was wrong.

I probably watched that video 10 times during the week between Christmas and New Years. I ended up writing everything I learned into a little resource at DynamoDBGuide.com. This really kick-started my work with DynamoDB.

In your book there are five strategies to handle one-to-many relationships and four strategies to handle many-to-many relationships. Do you feel sorry for relational databases because they have only one (FK)?

I’ve been writing to my legislators to fix this gross imbalance between DynamoDB and relational databases. No luck so far.

You say modeling with a RDBMS is like a science. DynamoDB not so much. This is my biggest criticism of DynamoDB. It’s coding by convention. You have to encode all these strategies by hand. Why doesn’t DynamoDB make them first class parts of the system?

I’m actually more bullish on the potential here than I was a few months ago. A big part of the problem is just getting more eyeballs looking at it and brains thinking about it. The DynamoDB community was pretty niche for a while, but it’s really starting to gain momentum. This is partly because Rick has done a great job evangelizing the possibilities and partly because a lot of developers dove into DynamoDB due to how well it works with serverless.

One of the reasons I wanted to write the book is to help build common language and standardize common patterns. A lot of NoSQL / DynamoDB design to this point has been ad-hoc work by people who *sort of* understand DynamoDB. But this can be standardized and applied more consistently. Once we have a shared understanding, it will be easier to build abstractions on top.

I don’t think we’ll ever get to the science of RDBMS, but it will be a lot easier than it is now.

Should you consider it a personal failure worthy of shame and banishment when you must resort to multiple queries?

Yes.

If this describes you, please email me so I can coordinate your exile.

Writing books is normally a losing proposition for the author, so I’m curious about how you’ve chosen to monetize your content. You’re not just selling a book on Amazon. Instead, you’re selling three different priced packages on Gumroad Books, plus a team license option. Each package has value added materials as an incentive. Can you talk about your thought process and how it’s working out for you?

Great question! I leaned on other people’s experience in this area to help me out. Adam Wathan in particular was very helpful, and I followed his article on his first book launch as closely as I could. I’ll share the major tips that helped me in hopes that it will encourage others to walk the same path.

If you want to self-publish, think about distribution. How will people hear about you and your book? And what will convince them to fork over their hard-earned money to buy what you’ve written? You have to build trust in your community.

For me, this was more than two years of writing about DynamoDB, engaging with the community, giving talks, etc. I didn’t set out initially with the intention to make money by writing a book -- that came much later. But it’s really hard to overstate how important it is to “be helpful on the internet.” (H/T to Patrick McKenzie / patio11). Teach people what you know, answer people’s questions, get involved in the community, and you can go a long way. This is true regardless of whether your ultimate goal is to sell a book or a course. I’ve had so many opportunities come up as a result of writing things on the internet.

The distribution aspect is going to be the biggest difference between self-publishing and using something like O’Reilly, and it will be the biggest difference between success and failure of your self-published book.

It might feel like it’s an impossible task to build an audience, or that it’s too late to get started. That’s nonsense. Five years ago, I was a corporate lawyer. I live in Nebraska. I have four kids, so I don’t have a ton of free time. You can do this if you want to. Don’t hesitate to email me or DM me on Twitter if you want help / advice / motivation.

It makes me cringe to share numbers about the book launch, especially given the financial pressures people are feeling from COVID-19 right now. However, it was very motivating to hear real numbers from other people such as Adam Wathan, Daniel Vassallo, etc. as I was working on this book, so I’ll share a bit. The book sold a little over $80k in the first four days with the launch discount. As I’m writing this, the book was released two weeks ago, and I’m just about to cross $110k.

The advice above is the most important thing to get right but some other things will help. Two others that I’ll note. First, make the best damn content you can. Hopefully you’ve got an initial audience that will buy the book, but then you’ll be relying on word-of-mouth, customer ratings, and testimonials. Better content will make this easier. Plus, you’ll feel better about it.

Second, give a few packages to allow people to segment themselves. I had three packages: a “Basic” package with the PDF and some printable cheatsheets; a “Plus” package that included some example code repositories and 120 pages of supplemental content, and a “Premium” package that included video walkthroughs for each of the chapters. Some people love learning with video, and they’ll pay a premium for it. Others won’t be able to justify the expense. But give people options!

If you had one database to take to a desert island which would it be and why?

Couchbase, so I’d have a comfortable place to sit.

Be honest, don’t you miss the simplicity of “select count(*) from table” to get the count of an item? How is implementing aggregations in a Lambda function a good solution?

I absolutely miss it! But it helps to think about why you’re giving that up. I mentioned earlier that DynamoDB won’t let you write a query that won’t scale. Unbounded aggregations are scary from a scale perspective. What if you have a billion records? A trillion records? That count(*) operation that worked so well in the test environment is going to get slower and slower in production until it’s untenable.

It’s a bit of a bummer that you need to implement this in a Lambda function, as you mention, or increment in your initial write. But it will scale forever without any performance degradation. That’s a tradeoff I’ll happily make.

What do you think databases will be like in 5 years? This can’t be all there is, can it?

I don’t expect physics to change drastically in the next five years, so there’s always going to be a tension between the flexibility of a relational database that allows for ad-hoc queries and the performance of a NoSQL database that requires you to model your data up front.

We’ll continue to see performance improvements from relational databases from better hardware, better design, etc. We’ll probably see something that actually works well with serverless compute as compared to the various spackle solutions today.

We’re still in the early days for NoSQL in terms of education. That’s the #1 blocker to NoSQL adoption. Everyone knows and understands the relational model. The most popular NoSQL database in 5 years will be the one that invests properly in education now. (Psst. If you’re reading this and are in charge of a NoSQL database, hit me up. I can help with this.)

A quote from your book: “While I empathize with your concerns, I don’t find this a sufficient excuse not to learn single-table design.” How did you become so heartless?

My children have taught me that empathy is a weakness that will be exploited.

I need help only a bonafide Data Hero can give. I’m using Aurora on a new project and I feel like I’m cheating on DynamoDB. What should I do?

How can you live with this guilt?

Nah, I think Aurora is an amazing piece of technology, and I’m not anti-RDBMS in any way. Personally, I’ve gotten to the point where I can’t give up all the great things about DynamoDB, and it doesn’t feel much harder to model in DynamoDB than it does with a relational database. But I won’t begrudge anyone that uses a relational database.

Change is the only constant. We need to migrate from one database schema to another as we iteratively grow and scale out. Poor migration capabilities have been a criticism of DynamoDB. And I’ve noticed Rick saying lately migrations are no harder in DynamoDB than in a RDBMS. You go into great lengths about how to accomplish migrations in DynamoDB. But isn’t this all very error prone and imprecise? When you’re using Lambda + streams or launching EMR jobs to restructure data, how do you know your data is correct after the transformations? There are no transactions, there’s no way to know if events or data are ever dropped. This turns a database into a probabilistic system where you’re never sure the data is correct and you have no way of proving or even showing it’s correct. How do you defend DynamoDB’s honor against these criticisms?

I think you can feel pretty confident in migrations if you model your data properly. But you really need to buy into the system. Overload your keys and indexes. Use generic secondary index names and attributes. Assemble item collections based on your access patterns.

If you do that, then migrations are much easier. It’s basically the same pattern every time: run a Scan operation, identify the items to update, and decorate them with new attributes. Once you’ve done two or three, it’s old hat.

Give me your Ripple Of Evil. What happens without DynamoDB?

Let’s go back further and imagine no Dynamo, which was an in-house database developed by current AWS CTO Werner Vogels, among others, at Amazon.com around 2005, as outlined by The Dynamo Paper. DynamoDB was eventually made into an AWS service based on the learnings from Dynamo.

Without Dynamo, Amazon.com isn’t able to scale past its size on Black Friday / Cyber Monday of 2004. By 2006, you have to take a ticket and wait in line to access Amazon.com. By 2010, Amazon shoppers are determined by a random lottery. In 2020, we still have to wait a whole week to receive something we ordered from the internet (the horror!)

Because the scaling challenges of Amazon.com are taking up the brain space of the best engineers, we don’t get Amazon Web Services. All of our favorite apps don’t exist -- no Netflix, no Instagram, no Chat Roulette. Google Cloud is the dominant cloud provider, which means developers are regularly scrambling to replace whatever service Google has decided to deprecate today.

Werner Vogels isn’t the CTO of AWS and is instead the drummer for a Dutch death metal band. Rick Houlihan is a MySQL DBA who can often be heard muttering “there has to be a better way” under his breath. I am living in my parents’ basement as The DynamoDB Book sells 2 copies -- one to my grandparents and one to a confused teenager looking for a book on explosives.

I found an article listing the 15 best books on DynamoDB. Scandalously, your book is not in the list (yet). Let’s set the record straight. Is your book the best book on DynamoDB or the greatest book on DynamoDB?

Yes.

Now that you have some free time again, what are you going to do with your life?

I’ve been trying to work a little less to catch up on time with family. I’m also doing some consulting here and there, so that keeps me busy.

As I was writing The DynamoDB Book, I fancied myself a Harper Lee: write one book and retire on top. I didn’t think I’d have the energy to do another.

Somehow I already have the itch to create more teaching material. I’ve started outlining another course and am beginning work on it. Now I fear I fancy myself a John Grisham (though I hope always to prioritize quality over quantity).

This is part one of a two part interview. Please tune in next week for the rest of the story...