Customer: - Name - Country Product: - Code - Name - Description Purchases: - Reference to Product Entity - Reference to Customer Entity - Date of orderAnyone from a relational background would look at this schema and give it a big thumbs up. With a little effort we can imagine the original physical purchase order that has now been normalized into three different tables. To recreate the original purchase order a join on purchases, produce and customer is needed. Read speed is not optimized, safety is optimized. Here’s what the same schema looks like optimized for reading:
Purchase: - Customer Name - Customer Country - Product Code - Product Name - Purchase Order Number - Date Of OrderThe three original tables have been folded into one entity. Now a purchase order can be read in one get operation. No join necessary. Notice how the entity looks more like an original purchase order. It is also what would probably be cached and is what our model would probably look like. But what if you want to update a product name or a customer name? Those attributes are duplicated in all entities. Here’s where the protection offered by the relational model comes in. Only one entity needs updating in a normalized model. In BigTable you have to remember everywhere a customer name and product name and change every instance to new values. It’s not a simple, safe, or reliable approach. But it does optimize for read speed and scalability. For an application with a high proportion of updates to reads this approach wouldn’t make sense. But on the web reads usually dominate. How often do you really change a customer name or a product name? Seldom. How often do you read them? All the time. Designing to scale for reads and taking the pain on writes takes some getting used to. It’s a massive change to standard relational tactics. But this is what it takes to scale web applications, even if it feels a little strange at first.