THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Louis Davidson

When is denormalization the best thing?

Oh no, you may be thinking, he has gone off the deep end.  Preparing for his talk on normalization at PASS this year he has finally cracked and said "to heck with it, just denormalize..." If you thought that, shame on you, and minus 10 points for you.  No, there is one very prominent place where most any database architect will tell you to denormalize.  If you have done any work on a data warehouse, you will know that the principals that the data warehouse has at it's foundation is simple. 

An example is a calendar table.  A proper calendar table is filled with denormalizations.  The natural key of the table is the date value.  From there you can easily calculate the month, the year, what day of the week it is, if this is Christmas, etc.  Each of these values is functionally dependent on the value of the date.  But no one would suggest that your date table have the one column and you calculate the other values on the fly.  That is exactly what the concept was created to beat: having to recalculate over and over these values. 

So what is the principal that allows this seemingly evil concept of denormalization as a rule, not an exception to the rule? When the data will NEVER EVER change.  The best case here is places where the definition of NEVER is not simply driving by a user.   I have this saying about users that mimics that of one of my favorite TV characters, a doctor named House, played by Hugh Laurie.  "Everybody lies."    Asking if a patient has ever cheated on his spouse is completely useless.  "Of course I did, sir.  Now please, give her half my stuff."

Ok, I am not quite that cynical, as I don't think they do it on purpose, I just don't think that people think.  Most users get about as involved with the process of creating a system as I do when my wife wants to clean house and she starts making all sorts of plans and I am trying to watch football.  I agree to most any sort of process she describes, just so we can get the conversation over with.  Most users feel the same way.  They want to do "real" work.  And who blames them.  All this to say that the definition of never changes should be mostly based on real world realities, not user hearsay.

In the example of a calendar table, there will never be a reason to change the fact that October 12, 2006 (should I say 20001012?) is a Thursday.  It will always be a Thursday, unless our country is invaded and taken over by some other culture that uses different names for days.  (And at that point it won't matter will it?)

A friend blogger wrote an article about how to return prime numbers from SQL here: http://sqlservercode.blogspot.com/2006/09/return-all-78498-prime-numbers-between.html.  Frankly, I was impressed by some of the responses I saw around about this topic.  For example, this post: http://blogs.technet.com/wardpond/archive/2006/09/23/458591.aspx and these too: http://robfarley.blogspot.com/2006/09/primes.html, and http://robfarley.blogspot.com/2006/09/more-on-primes.html.  But the answer I came up with is far faster than any of theirs for a production system (not nearly as cool, and takes far less brains, lucky for me.)

create table primeNumbers
(
    i   int primary key
)
insert into primeNumbers
<fill in whatever cooler faster prime number generation code you can find.  Personally I tried it and I pretty much failed, certainly in comparison to these attempts.>

The point of this blog is that once loaded you calculate them and the numbers are stored, you never have to recalculate, unless someone comes up with some new prime numbers, which is pretty much unlikely.  So if you need a prime number, the value is there and ready to use. 

Edit: made it more clear that the point was not that the insert statements were faster that the other possible ways to create primeNumbers, only that you should probably only use it once to load a table with prime numbers.

Crossposted to http://drsql.spaces.live.com

Published Tuesday, October 10, 2006 7:52 PM by drsql
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Ward Pond said:

Hi Louis..

Many thanks for your kind words, and welcome to the prime number party! :)

In reference to your example above, while I don't doubt that the raw INSERT statements would perform faster than any of the other alternatives, how are you generating the 78,498 INSERT statements this approach requires?
October 10, 2006 10:11 PM
 

drsql said:

Sorry, I didn't realize that I had to approve comments... You were perfectly right.  My insert statements were actually meant to be a placeholder for all of the other algorithms.  (that was why I left all of the kind words about your (and the other) posts :)
October 18, 2006 5:59 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Links to my other sites

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement