THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SELECT Hints, Tips, Tricks FROM Hugo Kornelis WHERE RDBMS = 'SQL Server'

NULL - The database's black hole

This blog has moved! You can find this content at the following new location:

https://SQLServerFast.com/blog/hugo/2007/07/null-the-databases-black-hole/

Published Friday, July 6, 2007 12:20 AM by Hugo Kornelis

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

James Luetkehoelter said:

I find this a very interesting discussion of NULL. Well done. I especially like pointing out the difference between "Birthday" and "Why Birthday is missing" as being separate attributes. That being said, I'm not sure I agree. I'm still chewing on this, but I love angle you've taken (catchy title too :) ).

July 6, 2007 9:33 AM
 

Denis Gobo said:

Talking about Birthdays, here is something interesting: Birthday paradox

In probability theory, the birthday paradox states that in a group of 23 (or more) randomly chosen people, there is more than 50% probability that some pair of them will have the same birthday. For 57 or more people, the probability is more than 99%

http://en.wikipedia.org/wiki/Birthday_paradox

July 6, 2007 9:45 AM
 

Hugo Kornelis said:

Thanks for your comments, all.

James - if your chewing ends in you still disagreeing with the seperation of the two attributes, then I'd love to hear (read) your arguments for that position!

July 6, 2007 10:15 AM
 

James Luetkehoelter said:

Actually the chewing is more overall - I would agree with the separation of attributes. After chewing on it (resulting in very little sleep and dreams about playing poker with Codd, Date and Larry Ellison - we ended up throwing Ellison out of the game), I believe I have my finger on what gave me pause. It results in me babbling significantly to explain, so I'm going to respond fully in separate post.

I still really, really your discussion of things. You took theory and moved it to real world very nicely.

July 7, 2007 11:00 AM
 

Hugo Kornelis said:

Hi James,

Next time you find yourself in a poker game with Codd, Date, and Ellison, throw them all and keep their chips for yourself.

I'm really looking forward to your post. Maybe I ought to say that I'm sorry for making you lose sleep, but fact is, I am not - it's always a good thing to get people to think!

July 8, 2007 12:41 PM
 

James Luetkehoelter said:

Agreed! I think there's nothing better than discussing something controversial. Wonderful, I have a fellow argue-r to talk to! Post will be up this afternoon(for me).

July 9, 2007 11:28 AM
 

David Portas said:

Hugo, your Customers table certainly isn't in 2NF.

A functional dependecy x -> y over r must satisfy that x = y forall tuples in r  - not the case if the table contains nulls. (For example see "Theory of R. DB." (Maier) p42 for the formal definitions and some important consequences).

Normalization of optional attributes can be done by decomposition. Fairly obviously, the Birthday and ReasonNoBirthday attributes belong in separate tables since they refer to different entities - "Customer with birthday" and "Customer without birthday".

July 9, 2007 12:31 PM
 

David Portas said:

Oops! Obviously the second sentence I wrote was rubbish! The requirement is that an FD satisfies the predicate T1{x} = T1{y} implies T2{x} = T2{y} for every pair of tuples T1 and T2. Therefore, no such FD is satisfied if either x or y are null.

July 9, 2007 12:38 PM
 

David Portas said:

Grrr! I'll try one more time: T1{x} = T2{x} implies T1{y} = T2{y}. But don't take my word for it :)

July 9, 2007 12:55 PM
 

Hugo Kornelis said:

Hi David,

Thanks for your comments. However, I don't think I can agree. I had to do some googling in order to find formal definitions for 2NF, since I am used to working with a modeling method that skips normalisation and takes the model straight into 5NF. I found some contradicting ways to formulate the requirements for 2NF, but no real disagreements.

I'll go with the definition on Wikipedia: "a 1NF table is in 2NF if and only if, given any candidate key and any attribute that is not a constituent of a candidate key, the non-key attribute depends upon the whole of the candidate key rather than just a part of it".

I did not google for a definition of functional dependency, since I know that Y is functionally dependent on X if for any given value of X, there is at most one value of Y.

By this definition of functional dependency, both Birthday and ReasonNoBirthday are definitely functional dependent on CustomerID, since for each CustomerID, there will be either exacly one Birthday or none at all (NULL), and exactly one ReasonNoBirthday or none at all (NULL).

The Customers table has one candidate key: CustomerID. It has two non-key columns: Birthday and ReasonNoBirthday. Both are functionally dependent on the candidate key, ergo the table is in second normal form.

Unless I am really being led astray as to the 2NF definition or as to the definition of functional dependency, but I'm sure you'll tell me if I am!

July 9, 2007 1:29 PM
 

Alex Kuznetsov said:

Hey Hugo,

I think you are making excellent points, I loved your post! Just wanted to note that I think that Birthday is functionally dependent on both CustomerID and ReasonNoBirthday . If ReasonNoBirthday is not null, then Birthday must be NULL. In some cases ReasonNoBirthday is also functionally dependent on both CustomerID and Birthday - it is when you must specify ReasonNoBirthday when you omit Birthday. However, I think in some cases you can omit both Birthday and ReasonNoBirthday - it depends on your business rules. What do you think?

July 10, 2007 10:04 AM
 

Hugo Kornelis said:

Hi Alex,

Birthday would be functionally dependent on both CustomerID and ReasonNoBirthday if you have to specify both before you can tell for sure what (if any) the birthday is. That is clearly not the case - given a CustomerID, I can tell you without a trace of a doubt (assuming I trust the contents of my DB) whether a birthday is stored for this customer and what it is.

An example of functional dependency on more than one attribute would be the AmountOrdered being dependant on the combination of OrderID and OrderLineNo. If you only know the OrderID, you can't tell me the AmountOrdered (unless you're lucky and it's a one-line order), and neither can you tell me the AmountOrdered if I supply OrderLineNo but not OrderID. You really need both before you can tell me the AmountOrdered.

Sure, there is a relationship between Birthday and ReasonNoBirthday (as captured in the CHECK constraint), but it's not a relationship of functional dependancy.

July 10, 2007 4:57 PM
 

James Luetkehoelter said:

Strange title, yes. After reading Hugo Kornelis ' excellent post on NULLs , I found myself getting very

July 16, 2007 11:15 PM
 

James Luetkehoelter said:

Strange title, yes. After reading Hugo Kornelis ' excellent post on NULLs , I found myself getting very

July 16, 2007 11:36 PM
 

sql said:

The quote "any expression in SQL returns NULL if any of its input arguments is NULL (with the notable exception of a few functions that are specifically conceived for NULL handling)." appears a bit misleading? AFAIK sum, min, max, and other statistical functions were Not "specifically conceived for NULL handling": Consider SELECT

SUM(NUMERIC_SCALE) AS SumIsNullInsensitive

,STDEV(NUMERIC_SCALE) AS STDEVIsNullInsensitive

,MIN(NUMERIC_SCALE) AS MinIsNullInsensitive

,Max(NUMERIC_SCALE) AS MaxIsNullInsensitive

FROM INFORMATION_SCHEMA.columns

HTH

July 17, 2007 2:38 PM
 

Database in Depth said:

July 17, 2007 2:47 PM
 

Hugo Kornelis said:

Hi "sql",

You make a fair point. I admit that I forgot to include aggregate functions in my post.

Since aggregate functions are quite simple, I won't write a seperate post on them. Their treatment of NULLs is very simple - they just dsiregard them. So if you have a table with three rows, with values 10, 90, and NULL in an integer column, then the results of MIN(column), MAX(column), SUM(column), AVG(column), and COUNT(column) will be 10, 90, 100, 50, and 2 - just as if the row with the NULL value doesn't exist at all.

If the elimitanion of NULL values results in no rows being left, COUNT(..) will return 0 and all other aggregate functions will return NULL (just as when they are run on an empty table).

July 19, 2007 2:25 PM
 

SELECT Hints, Tips, Tricks FROM Hugo Kornelis WHERE RDBMS = 'SQL Server' said:

Two months ago, I posted the first two parts of a series about NULL. After that, I went quiet. Much to

September 22, 2007 4:35 PM
 

Hugo Kornelis said:

The fourth (and final) part of this series about  NULL is now available at http://sqlblog.com/blogs/hugo_kornelis/archive/2007/09/30/what-if-null-if-null-is-null-null-null-is-null.aspx.

October 1, 2007 2:17 AM
 

Denis Gobo said:

This is part two of a three part series. Part one was about the phone interview , this part is about

December 10, 2007 3:02 PM
 

Denis Gobo said:

A year in review, The 21 + 1 best blog posts on SQLBlog Best posts according to me, it might have been

December 27, 2007 4:11 PM
 

Phillip Hamlyn said:

Returning to the Dates of Birth - I've taken to seperating Day, Month and Year into seperate fields in SQL Server instead of treating Date Of Birth as a Date. Why ? My analysis has shown that in most systems containing public or customer data that a higher than expected proportion of people are being recorded as being born on the 1st of any specific month (5-10% more than statistically probable, depending on the type of system). Unless its absolutely required otherwise I've got into the habit of making this three nullable fields - this means that when someone simply gives their age, (or possibly guesses at their own date of birth - common for those that have no cerain birth record) we can record without forced inaccuracy, exactly the information given. A simple calculated field will give an implied date of birth if neccessary, but it prevents the operator being forced to enter inaccurate data just because the data type demands it.

January 4, 2008 11:19 AM
 

Gints Plivna said:

>I've taken to seperating Day, Month and Year into seperate fields in SQL Server

>instead of treating Date Of Birth as a Date.

I usually use in such cases one birth date and 2 flags:

UnknownDate (T, F)

UnknownMonth (T, F)

and of course UnknownMonth=T implies UnknownDate=T

March 26, 2008 8:37 AM
 

abubakar said:

plz tell me its true or false:

true expression or null expression will result in NULL

June 2, 2008 7:59 AM
 

Kim J said:

Did I just read an episode of Big Bang Theory? Geez you guys are smart.  I just wanted to know the difference between NULL and an empty field.  

May 30, 2013 3:10 PM
 

jigar said:

i understand it a litel bit, but i think tht i will understand it if you'll give me some examples

August 23, 2014 3:41 AM

Leave a Comment

(required) 
(required) 
Submit

About Hugo Kornelis

Hugo is co-founder and R&D lead of perFact BV, a Dutch company that strives to improve analysis methods and to develop computer-aided tools that will generate completely functional applications from the analysis deliverable. The chosen platform for this development is SQL Server. In his spare time, Hugo likes to visit the SQL Server newsgroups, in order to share and enhance his knowledge of SQL Server.
Privacy Statement