THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Louis Davidson

Chapter 5–Normalization

I won’t try to kid you. I love this subject. When you finally see the power and simplicity of it all, it is like a 200 watt halogen light bulb going off over your head in a cavern where you previously only had one of those little led lights on a key chain. In fact, it is the simplicity of normalization that makes it so much fun.  Of course, knowledge of normalization isn’t the only thing that makes you a good database designer and realistically, most people do some level of normalization without even knowing it. I dare you to find a database that has only one table with one column that expresses more than one concept. Even a newbie with an excel workbook will start to add worksheets as it becomes more difficult to manage their data in that workbook. However, understanding why you do that natural process is half of the battle.

The approach I will take in this chapter is to break down things into 4 major sections:

  • Entity and Attribute Shape – Basically this encompasses the definition of what makes a table a table and not a different data structure, and First Normal Form
  • Attribute Interdependency – Essentially the Boyce Codd Normal Form, which is basically ensuring tables have all keys defined and all non-key attributes reference the entire key set of all keys.  This encompasses Second and Third Normal Form). I had a person comment to me that the Second and Third Normal Forms were confusing, and after some reflection, I think it has to do with the way we implement databases today. Second has to do with non-key attributes referencing other non-key attributes, and Third is about non-key attributes referencing primary key attributes. But the trend is to have a surrogate primary key, so it just doesn’t click. In Boyce Code, it is about any key, rather than specifically the primary key. And a surrogate key isn’t really a key in and of itself, but a generated value that stands in for the key for ease of use and performance reasons.
  • Multivalued Dependencies – This references Fourth and Fifth Normal forms in a practical manner that isn’t confusing to the average programmer.
  • Denormalization – Selectively choosing to not implement a rule of normalization to achieve some purpose, often for a (hopefully real) performance issue. This is not to be confused with unnormalized, which is just going with what you have because it works on your machine for the size of data and the questions you are asking of it. Also known as normalcy.

One of my desires in each edition of the book is to tackle the higher normal forms in more and better manners that a person like me can understand. I am pretty thick at times, and a lot of concepts just go beyond me in their mathematical notation laden versions. Not that I am against that sort of thing, it is just that I don’t really get it until I get a simplistic example to follow.  And even worse, too many writers gloss over the details of complex topics by simply puking up the wording from the original writer and moving along, not even giving examples. I feel that if you are going to bring up a topic, you cover it, give examples, and explain it. It isn’t always practical and often I wish I didn’t feel that way (and I wish my editors didn’t feel that way too.)

In this edition, I am going to be changing up a bit of my material on First and Fourth, and Fifth Normal forms (moving some of the non-key multi-valued dependencies stuff to first normal form, since really what I describe there is technically first normal form’s requirement to have single valued attributes, though I have seen it described in multiple places as fourth…Again, kind of confusing). 

Along the way, I have been doing some research to see if I can find a better way to describe the Fourth and Fifth normal form. I wanted to list a lot of great sites about normalization, but I can’t really find anything out there other than William Kent’s paper here: http://www.bkent.net/Doc/simple5.htm. It was written back in 1982 (September to be exact), but it was the best overall thing I could find. Most other sites were too terse to glean more than an idea from, not enough to put out there for reading.

 

After I finish chapter 5, it will be time to make some major structure choices about the book, and decide if I want to dig into the new pattern/modeling chapters, or finish the rehash. The rehash is kind of mundane, though a decent amount of it will require touch-ups to cover the new Denali changes to… well, not sure what is or isn’t NDA, but suffice it to say… there may or may not be changes!

Published Monday, March 21, 2011 8:48 PM by drsql

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Thomas LeBlanc said:

Looking foward to more info on 4th and 5th. I think if you do a good enough job with 1 thru 3, usually you do not see designs that are not 4th or 5th.

Thanks,

TheSmilingDBA

March 21, 2011 9:01 PM
 

drsql said:

Definitely true of Fifth Normal Form, but Fourth is slightly less true, particularly when a newbie designs a database that needs complex requirements.

Look at the example on wikipedia:

Restaurant, Pizza Variety, Delivery Area

This seems like a reasonable key if you don't take the time to consider what this means. For this to be a correct table, you need to be saying that a restaurant sells a given pizza variety in a given delivery area. So all three attributes of this key need to be dependent upon each other.  But usually this isn't the case.

I do admit that if you are good at modeling, fourth normal form is usually pretty natural to spot because you would have identified that a restaurant was the primary thing of importance and made multivalue attribute tables for the other items naturally.

Of course, the real thing to note here is that I don't regard normalization so much as a process as it is a sanity check for a good design. And when a three or more part key pops up in my model, I use 4th and 5th techniques to determine if they are wrong...

Thanks!

March 21, 2011 9:22 PM
 

RichB said:

So have you read Data and Reality?

Same author you link to.

March 22, 2011 6:33 AM
 

drsql said:

RichB, haven't yet, but just ordered it, thanks!  Have you?  Is is worth it?

March 22, 2011 11:15 AM
 

RichB said:

I enjoyed it.  Can't remember who recommended it - for some reason I think it may have been Mr Celko.

For a book written so long ago, it still frequently seems to cover a lot of persistently 'current' topics, as long as you can mentally remap the units of measurement!

March 22, 2011 12:29 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Links to my other sites

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement