advertise
« Database War Stories #3: Flickr | Main | Paper: Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web »
Tuesday
Mar182008

Database Design 101

I am working on the design for my database and can't seem to come up with a firm schema. I am torn between normalizing the data and dealing with the overhead of joins and denormalizing it for easy sharding. The data is essentially music information per user: UserID, Artist, Album, Song. This lends itself nicely to be normalized and have separate User, Artist, Album and Song databases with a table full of INTs to tie them together. This will be in a mostly read based environment and with about 80% being searches of data by artist album or song. By the time I begin the query for artist, album or song I will already have a list of UserID's to limit the search by. The problem is that the tables can get unmanageably large pretty quickly and my plan was to shard off users once it got too big. Given this simple data relationship what are the pros and cons of normalizing the data vs denormalizing it? Should I go with 4 separate, normalized tables or one 4 column table? Perhaps it might be best to write the data in both formats at first and see what query speed is like once the tables fill up... Another potential issue would be the fact that inserts will be coming in batches of about 500 - 2000+ per user at a time which will be pretty intensive to pull off for the normalized table as there will need to be quite a few selects for each insert due to the fact that the artist, album or song may already be in the database or it may not requiring an insert.

What do you all think?

Reader Comments (1)

I would start with a normalized database until your performance monitoring curves and your user adoption curves look like they collide in destruction. Then I would try adding object caches and let your database worry about writes. Then I would do something more complex. But don't paralyze yourself now with design fear when it's not needed. You can make changes later.

November 29, 1990 | Unregistered CommenterTodd Hoff

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>