THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Davide Mauri

A place for my thoughts and experiences the Microsoft Data Platform

Denormalize with consciousness (aka Constraint are not an option)

Today I had to help a customer to make their upgraded invoicing system to work correctly…and as quite often, the source of the problem was a bad denormalization.

From now on, each time I can do it without revelaing too much of the customer, I’ll try to write a post so to it will be clear to anyone (developers in particular) what are the dangers of denormalizing without taking care of consistency.

Denormalization is not bad “a-priori”, so there are situations in which it can help performances and so it make sense in these cases. Plese note that I’m talking of denormalization, which is the conscious decision to denormalize after having normalized the database schema because actual system cannot give use the needed performances. I’m absolutely not talking of un-normalized situations. Unnormalizated database are simply bad, dot. They suffer of a pletora of problems among which, of course, the consistency of data is one of the biggest. So this post is useful in both situations, but please keep in mind that they are two completely different things!  Smile

Now, if you decide to denormalize, you simply cannot avoid to implement a constratint that enforces the integrity of the information you’re going to duplicate. And it cannot be an application level constraint, but it must be a database-level constraint….otherwise you can find yourself in the following situation: you have a column “InvoiceType” that holds the information about the type of the entity; If it’s an invoce it holds “I” and if it’s a pre-invoce (a sort of notice that an invoice will be issued) it holds “P”. Also a column named “InvoiceNumber” exists, and it stores the – guess? – invoce number. But of course pre-invoices doesn’t have any number so for them this column will always be zero.

Now – you’re probably seeing where I’m going – what happen if you have a row with InvoiceType = “P” and InvoiceNumber = 1234? You’re in trouble. Well, actualy the company is in trouble, since no-one can tell if this is an Invoice or a Pre-Invoce. And of course this affects the Gross Margin and you can bet the Boss won’t be happy to know this!

This should have never happened….but despite everything it happened. A human error, a bug in the (ugly) application….it happened. But it would have never ever  happened if a simple CHECK CONSTRAINT would have been used (as you can see I’m not discussing that the table is not normalized).

The situation created by such omission results in a more complex ETL Phase for the BI solution we’ve built, plus some additional work by a person who job is simply to check and solve the identified anomalies.

The conclusion is that, if denormalization is the choosen way, constraint to preserve information integrity must be put in place. Put it in another way: if you decide to denormalize you’re taking the responsibility to keep information consistent. And this is not an option. Apply some basic logic, the result is that constrains are not an option! (You think they will slow you down during insert, update and delete? They may do…but that’s what you decided when you started to denormalize!)

Published Monday, August 2, 2010 7:08 PM by Davide Mauri

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS



Charlie Rubin said:

It's best to keep your transaction (OLTP) system normalized, then every so often run a stored proc or SSIS job to create a denormalized version for reporting purposes (OLAP).

August 4, 2010 10:19 PM

Davide Mauri said:

@Charlie: Exactly! :)

August 5, 2010 2:11 AM

Jason West said:

Great article!

We've been having similar conversation here. One item of interest is the consequences of normalizing to possibilities versus normalizing history. Lot's of brownfields here where the inputs are normalized to the entities without any versioning. Net effect is being able to rewrite history. So we've been replacomg the hard constraints to possibilities and adding normalization to what happened.

What do you think? Any experience there?

August 9, 2010 4:03 PM

Davide Mauri said:

@Jason, just to make sure I understand the problem you depict correctly, let me do an example.

Let's take an Invoce. After the Invoice has been generated, no-one should be able to change, say, customer data since from now on is "writte in stone".

But if the invoice has a relationshop with the Customer table, what happen if you change customer attributes values?

If this is the question...the aswer is a little bit long...and deserves it's own post :) I'll write it during this summer vacation. :)

August 16, 2010 11:36 AM

Davide Mauri said:

Normalization should be very popular and widespread. Unfortunately I can say that the 99% of developers

August 30, 2010 1:46 PM

Leave a Comment


About Davide Mauri

Director of Software Development & Cloud Infrastructure @ Sensoria, an innovative smart garments and wearable company. After more than 15 year playing with the Microsoft Data Platform, with a specific focus on High Performance databases, Business Intelligence, Data Science and Data Architectures, he's now applying all his skills to IoT, defining architectures to crunch numbers, create nice user experiences and provide meaningful insights, all leveraging Microsoft Azure cloud. MVP on Data Platform since 2006 he has a very strong background development and love both the ER model and OO principles. He is also a fan of Agile Methodology and Automation, which he tries to apply everywhere he can, to make sure that "people think, machines do".

This Blog


Privacy Statement