Now, don’t get too excited and grab your pitchforks and torches. Clearly, it is extremely possible to overdo something in the design, but very often normalization takes the rap as being the culprit. In my “Database Design Fundamentals” presentation, one of my favorite things to do is ask “What is the most important normal form?” 9 out of 10 times, someone answers “Third”. When I ask what they have against fourth, the usually say that it makes the database work too slow. But when they find out that most Third Normal Form databases are already in Fifth Normal Form, well, this fact has never actually slowed down any database that I know of.
Under-normalizing is a very common problem, which is pretty much self explanatory. Most people don’t really understand it, or even really care about it. The basics of it are very straightforward, but the finer points (and why they are so important) are lost on many people who happen to be cobbling together some form of storage for the state of their objects (what we typically like to think of as a relational database.)
But I am not even sure that by definition you could over-normalize. I will submit to you that almost every so-called over normalized is more precisely “over designed” and is actually just an architect taking a set of requirements and expanding them to meet what they felt the requirements should be. As an architect, I have a very large bag of tricks that I like to pull from to create the “perfect” solution. The problem is, users rarely want or even need “perfect”. Yet we always get really excited to start adding to what the user asks for. Normalization can only be done in the context of the requirements. If you defined every relationship in your requirements as 1-1, you could implement a fully normalized database in one table.. Every customer makes one order for one product. Done. Clearly reality is rarely so clean, and if the requirements stated this, you would not be doing your job unless you said “Whoa, are you..sure?”
A great example of how databases get out of hand came during Audrey Hammond’s (@datachix2) presentation at SQL Saturday in Atlanta. Her example was of a personal movie review system that her (hopefully theoretical, slightly sadistic and certainly inappropriate time using) boss had given her to do. She came to a final example design that seemed to match the requirements that she had presented in one of her early slides. Her question, how would you change the model. Several of the people in the room had suggestions to improve the model, but most of the people were expanding the model to include stuff that they wanted to see in the model (I will admit, I wanted to do the same thing :). If we had kept going, we would have ended up with a data model that could be used to replace the guts of the Internet Movie Database (imdb.com, one of my favorite sites.)
The problem was, while the ideas that had been good ones if we were trying to brainstorm items for the requirements, they were beyond what had been agreed upon for the implementation. The goal to create a personal movie review system is not exactly the same as the needs of the boss who want to keep up with a list of movies he has watched and a list of the important people in the movie that he wants to keep up with.
Under the name of normalization, this sort of thing happens often. We add more and more tables to our model to take the design to larger and larger proportions implementing more and more things that seem like a great idea, but aren’t what the client wants. This is when the cries of over normalization begin to go up and cursing the name of Codd as if he was the person who over-engineered your database.
I clearly don’t want to make it sound like as an architect you shouldn’t push to provide a solution that gets it right when the requirements are, shall we say, weak. Requirements are very often not thought through, and it is your job to recognize this and be certain that the requirements are changed to to reflect this, so you can create systems that implement what the user needs. Taking the user’s requirements as the final word for what they want is another poor practice. Users know what they do, not how to create databases (presumably that would be your job.) My rule of thumb is that:
1. The requirements dictate the database design
2. The requirements and the relational engine dictate the implementation
So unless you understand the requirements, you can’t design the optimal database, and if you don’t understand SQL Server, you are not going to end up with an optimal implementation. Over-normalize? No. Over-engineer? Definitely.