THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

James Luetkehoelter

Nearly any SQL topic presented at times in a slightly eclectic manner.

NULL, Black Holes, and Birthdays: Part I

Strange title, yes. After reading Hugo Kornelis' excellent post on NULLs, I found myself getting very little sleep as I pondered the points he made. If you haven't already read it, do so, now prefereably. The two main points he brought out were (correct me if I'm wrong Hugo):

1) Unknown (among others) is not the best working academic definition of NULL - "Is Missing" fits how we would naturally deal with it conceptual, as does SQL Server. There are not different meanings or "flavors" of NULL - it just means data is missing - no reason given or necessary.

2) NULL actually implies two separate attributes - A true implementation of NULL (Unknown, Is Missing) implies another describing the reason for NULL. He used the idea of having a column "Birthday" that would be set to NULL at times. We really need to know why it was set to NULL - it was ignored, someone was offended at having to reveal their age, it wasn't known at the time of entry, etc. All are legitimate reasons why the column is left NULL, and it may be information that is important to capture. However if we did want to capture "reasons for no birthday", we'd have to create a separate column hold that information. If we do this for every NULLable field, it becomes very impractable (especially when performing queries that include a WHERE statement).

3) If we use "Is Missing" as our practical definition of NULL, it is eliminates the implication of why something "Is Missing" - we're making no claim other than that. And since we avoid the need for an explanation of why something is missing, it makes queries involving these "NULL" predicates much easier to deal with practically and intuitively.

I have a slightly different, odd take on NULL after reading this. Unfortunately this will be propeller-head academia discussion, but I will jump to the practical in part II, I promise :). Hugo and I are basically saying the same thing, I'm just putting a different spin on it that "tastes" better to me than using the definition of "Is Missing".

Hugo also made a comparison between a NULL in a database and black hole (not a real comparison, but more as a literary device within the article, but I think this brings out an important point).

A black hole, in theoretical terms, is not even similar to NULL. A black hole has theoretical attributes. It is the greatest source of gravity conceivable since nothing can escape once it gets too close to the black hole. The concept of an "event horizon", the point at which light can no longer escape the gravitational pull. At the center of the black hole is a "singularity", matter packed to maximum density. There are even real, measurable properties of a black hole. One is usually detected by astrophysicists by an unknown gravitational impact within a region of stars. There are even more properties of black holes, theoritical and practical, that are so complicated they make my brain hurt.

The problem why we have two implied attributes with a Birthday column when the column is NULL has nothing to do with the definition of NULL; the problem is that we insist on trying to associate a definition to NULL in the first place. As soon as we use the phrases "Unknown" or "IsMissing", our minds naturally impose the question: Why is this column "unknown" or "missing"? Once we do that, there is a laundry list of boundary conditions that could apply - it could have been ignored, simply missed, deliberately not entered, N/A. As Hugo discusses, if we need to capture those boundary conditions we have a real problem when it comes to database design.

For me a more accurate, though difficult to comprehend definition of NULL is - "no semantic meaning", or "no attribute". NULL doesn't even have a data type. It isn't even strictly associated with a column at a practical level (it is recorded in a NULL bitmap near the header of the row). The problem is the minute we try to describe NULL, we automatically imply other attributes due to our need to ask "why?"

Personally I like to think of NULL in terms of three-value logic (my background is in Philosophy and Mathematics, can you tell?). Your options are yes/no/unknown. That's it. When it comes to usage, unknowns combined with another value is still an unknown. And this is how the SQL engine works (with some tweaking exceptions).

 NULL - no attribute. Nothing else.

Which brings us to the crux of the problem. In everyday usage, the definition of and usage of NULL is so varied, having any level of consistency is extremely difficult. However, here's where I through in my conversially claim:

In practical usage, NULL can be used to mean whatever the business needs it to mean, as long as that meaning is consistent.

I have seen the strangest behaviours when it comes to NULL that it almost the database purist in me cry. But, after thinking about it for over a decade, I'm conceding the point to the practical usage. More on this in part II of this post.

Hugo definitely brings out a great point about the separation of attributes when it comes to NULL and "reasons why it is NULL". My take is that the latter is imposed after the fact when we try to interpret what NULL represents. It isn't another attribute of NULL - it is our attempt at understanding why something is NULL (and recording that information). In academic terms, I see NULL as nothing, no attribute. In practical terms, it's fair game as to what NULL represents.

More to come on the practical (but not necessarily proper) usage of NULL. 

Published Monday, July 16, 2007 11:14 PM by James Luetkehoelter

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Hugo Kornelis said:

Hi James,

Thanks for your interesting post. Since you asked me to correct you if you're wrong about my two main points, I'll start there :-)

The first point is indeed that NULL means "no data here", and nothing more. But I disagree with what you describe as my second point. NULL does not imply two attributes. NULL implies that an attribute value is missing, noting more. A second attribute can be necessary, but only if the business needs to store the reason why the first attribute has no value - so this second attribute is not implied by NULL, but by the combination of the two facts that (a) the first attrbiute allows NULL, and (b) the business needs to store the reason for missing data.

I also disagree that this becomes impractable. In most cases, the reason why the data is missing is completely irrelevant for the application and the business, so there's no need to store it. In many other cases, there can be only a single reason why a column is NULL, so again no need to create a new attribute for it. The number of cases where the reason for missing data actually has to be stored is fairly minimal. And in those cases, normalizing the reason to it's own column is not impractable at all - it's very practable, especially when querying the data.

You are, of course, absolutely right that the comparison between NULL and black holes falls flat on it's face when giving it more than half a minute thought. I just figured that it would make for a catchy title and was satisfied to find sufficent parallels to warrant the comparison. :-)

I'm a bit worried about your statement that NULL means "no attribute". To make sure we don't misunderstand each other, let me explain two different terms that are both sometimes shortened to "attribute". An "attribute type" is part of the data model and describes a set of similar values in generic terms - e.g. each birthday has to be a valid date, but we do allow birthdays to be missing for a customer. An "attrbiute value" is the specific value of an attribute for one occurance of an entity (or one row in a table) - e.g. the birthday of Ms. Jones is 7th february, 1953.

Now, if you intended to write that NULL means "no attribute value", I agree. If you think that NULL means "no attribute type", I disagree completely!! (And your statement that NULL has no data type makes me fear that you might indeed mean the latter). FWIW, NULL does have a data type. Run the following code in SSMS if you don't believe me; set the output to text rather than grid to see how various datatypes are all displayed differently. Then uncomment one of the last two lines to see that different data types of NULL can even cause errors!

DECLARE @int int, @datetime datetime, @uniqueidentifier uniqueidentifier;

SELECT @int, @datetime, @uniqueidentifier;

IF @int = @datetime PRINT 'Equal!' ELSE PRINT 'Different!';

--IF @int = @uniqueidentifier PRINT 'Equal!' ELSE PRINT 'Different!';

--IF CAST(NULL AS int) = CAST(NULL AS uniqueidentifier) PRINT 'Equal!' ELSE PRINT 'Different!';

Fortunately, I find myself in agreement with you once you come to the bottom line of your post: "In practical usage, NULL can be used to mean whatever the business needs it to mean, as long as that meaning is consistent". Indeed. In many cases, business accept (or even WANT) data to be missing in some rows. If there's only one reason, that's fine. If there might be more than one reason, it's still fine, as long as either the exact reason is unimportant, or the exact reason is stored - in a different column!

I'm looking forward to your post on practical usage of NULL, and in finding out how much it overlaps or contradicts my other posts on NULL.

July 18, 2007 4:44 AM
 

James Luetkehoelter said:

Thanks for the comments and corrections Hugo. Yes, I did mean "no attribute value". I should have been clearer on that. That's a distinction I really didn't make. I NULL tuple would have a data type - it just wouldn't be apparent until the tuple was populated. That better?

As far as NULL implying two attributes, I both agree and disagree with your disagreement :) Yes, NULL itself does not imply both NULL and "why is it NULL". I crossed over into the business world without due warning :) What I was trying to get across is that in our attempts to define NULL ("is missing", "nothing here", "unknown", etc), the *definitions* "nudge" us into the need for knowing more about the NULL column. Was it ignored? Was it forgotten? Did someone object to entering data? I think those are just natural questions brought on by trying to apply some sort of semantic meaning to NULL by means of a definition. It's kind of a linquistic faux pas as Ludwig Wittgenstien would have said (OK, another line of argumentation there for another day). That's what I was trying to get at.

As far as practicality, I think if followed as a general rule, yes, it becomes impractical (and after looking over your original post, you did not indeed discuss practicality - apologies). In exceptional cases where that sort of boundary information is required, by all means it should be included. Maybe instead of a NULL bitmap for a row header with have a boundary bitmap, with options like NULL, N/A,AngryUser,Forgotten,Skipped, etc :)

This is fun! I look forward to more of your posts as well :) I don't think we'll overlap too much as I'm a bit, well, strange in my thinking (started PhD program in Philosophy, 'nuff said).

July 18, 2007 5:21 AM

Leave a Comment

(required) 
(required) 
Submit

About James Luetkehoelter

I am passionate about what I do - which is DBA, development, IT and IT business consulting. If you don't know me, haven't met me or have never heard me speak, I'm a little on the eccentric side. One attendee recently described me as being "over the top". Yup, that about says it - because I only speak on topics that I'm passionate about.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement