THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Paul Nielsen

www.SQLServerBible.com

Will the Real Null Please Stand Up? Part 1

Thinking more about null, I seems to me that the primary problem with nulls is that the theoretical definition of null is very different than the pragmatic use of null.

 

We all learned in DB101 that Null means "unknown" and this is why Null = Null is false.

 

In practice however, there’s nothing unknown about Null at all.

 

Nulls primarily show up in two places: Nullable Columns, and the result of Left Outer Joins for rows without a match.

 

In nullable columns a null doesn’t mean “unknown”. We know there’s no value yet entered for the column. Within the “closed system” of the database it’s an empty value. The value of a null in this case is that we know nothing has been entered. In contrast, a surrogate null value of zero or an empty string can be confusing because real data is indicating nothing has been entered. Does it mean zero, or has the value not yet been entered?

 

A left outer join between customers and orders returns a null OrderID for any customers without an order. Does this mean we have no idea how many orders the customer has placed? No clue at all if the customer has 0, 1, or 500 orders? Not at all. We know exactly how many orders the customer has placed: Zero. It’s a known value. Null means empty.

 

The problem with null is the inconsistency between the theory-based tests of equality and the practical use of null. SQL is based on the theoretical definition of null - unknown, while database developers actually use null to indicate empty.

 

For this reason, we'll never resolve the null debate until the industry accepts null as empty, or it creates an empty flag. Until then the best workaround is to forget the theoretical definition of null and accept that in practice, null means the known value of empty.

 

Published Sunday, November 11, 2007 6:09 PM by Paul Nielsen
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Hugo Kornelis said:

Hi Paul,

>>We all learned in DB101 that Null means "unknown" and this is why Null = Null is false. <<

You must have had terrible bad luck with the teacher for your DB101 then, because there are no less that two fallacies in that single statement.

1) Null does NOT mean "unknown". Null is defined in the SQL standards as "a special value that is used to indicate the absence of any data value". So in the language of normal humans, Null does not mean "Unknown", but "Nothing to see here, move on folks".  More on that below.

2) Null = Null is not false. This is Unknown, a special value in three-valued logic that complements the more common values of True and False.

You might wish to read my series of blog posts about Null:

- What NULL is, what it isn't, and why even Codd and Date were wrong: http://sqlblog.com/blogs/hugo_kornelis/archive/2007/07/06/null-ndash-the-database-rsquo-s-black-hole.aspx

- What is three-valued logic, and why is it required when NULL is allowed: http://sqlblog.com/blogs/hugo_kornelis/archive/2007/07/17/the-logic-of-three-valued-logic.aspx

- Some apparently odd side effects of three-valued logic that are not so odd at all when you stop and think aboout it: http://sqlblog.com/blogs/hugo_kornelis/archive/2007/09/22/dr-unknown-or-how-i-learned-to-stop-worrying-and-love-the-null.aspx

- Functions you can use to deal with NULL without being bitten by Unknown: http://sqlblog.com/blogs/hugo_kornelis/archive/2007/09/30/what-if-null-if-null-is-null-null-null-is-null.aspx

>>In practice however<<

In practice, we sometimes do andd sometimes do not know the reason why there is no data in a particular cell in the table.

For instance, in a table that holds the history of prices of our products, with a DateValidFrom and a DateValidTo, a Null in the latter column does not mean that the DateValidTo is unknown - it means that that row describes the current price and that the DateValidTo is therefore not applicable.

For instance, in a table that holds information about persons, a Null in the column for MiddleName *does* mean that the MiddleName is unknown. If it were known, it would have been in the table; if we knew that a person doesn't have a middle name, the table would have held an empty string.

For instance, in a table that holds information about customers, be they real persons or companies, where the designers have chosen not to implement the subtypes "real person" and "company" in seperate tables, a Null in the BirthDate column *could* mean that the birthdate is inapplicable because a company isn't born - or it could mean either that or that some person didn't tell us when he/she was born. Only the data modeler and the subject experts can answer that, since only they know if the company allows persons to not disclose their birthdate.

>>In nullable columns a null doesn’t mean “unknown”. We know there’s no value yet entered for the column.<<

Exactly. And that is exactly in accordance to the theoretical definition of Null: "no value here". There only is a discrepancy between theory and practice for those who get the theoretic definition wrong (like the sorry excuse for a teacher that you apparently had for DB101).

>>A left outer join between customers and orders returns a null OrderID for any customers without an order. Does this mean we have no idea how many orders the customer has placed?<<

You are confusing the OrderID column with the "NumberOfOrders" column in the result set. If you had written a query to get the number of orders per customer, you would have gotten the value 0 (zero), not Null. The number of orders is both known and present in the result set of the query (assuming you didn't err when wriiting the query, of course).

But the fact that the number of orders for a customer is 0 is also the very reason that the OrderID column can only be Null and nothing else. The customer has not placed any orders, therefor there *IS* no OrderNo.

Roel Oost posted a great explanation on a Dutch web forum just yesterday, that I am going to steal. I have a square box, exactly the size of a piece of paper. There currently is no paper in the box. You ask me what is writtten on the piece of paper. I can't answer that question, since the paper itself is missing.

>>The problem with null is the inconsistency between the theory-based tests of equality and the practical use of null. SQL is based on the theoretical definition of null - unknown, while database developers actually use null to indicate empty.<<

As I said, the theoretical definition is not "unknown", but "no data here". Much more on that in the blog posted I mentioned above.

The practical use is much broader than just "empty". I don't even think that "empty" is indeed the practical use at all - I think that in many cases, the practical use is "no data here", exactly as the theoretical definition.

The major inconsistency is between what too many people *think* the theoretic definition is, often inspired by book authors that either don't understand the real meaning of Null, or find that three-valued logic is easier to explain by defining Null as "Unknown".

And then, there are of course way too many developers and designer who don't get it and abuse Null in all sorts of ways. But that is a give, for any feature that any database (or aby other development platform) offers, unfortunately. <sigh>

Best regards,

Hugo Kornelis

November 12, 2007 3:32 AM
 

Jeff said:

Great comment, Hugo, I could not have said it better myself.

November 15, 2007 10:45 AM
 

Phillip Senn said:

I get uncomfortable if the only people that can answer a simple question are the data modeler and the subject expert.

Three-valued logic confuses many a developer

1. IF NULL =  NULL THEN PRINT 'Equal'

2. IF NULL <> NULL THEN PRINT 'Not equal'

Both return the same no result, while

3. IF NULL IS NULL THEN PRINT 'Null'

Will execute.

In my mind, a null in a field has a different meaning than a null returned by an outer join.

That might even lead to a four-valued logic.

Personally, I would rather deal with surrogate nulls when working with applications programmers.

And God forbid a null shows up in a report where I have to explain the concept to management.

Wiki describes null as 'nothing' or without value.

"What does it mean?" one might ask.

"Nothing!" is the reply.

"What does nothing mean?"

"It means 'without value'".

While I can't fault Hugo's logic, the less often I have to explain null to someone the better.  Perhaps if all the I/O were encapsulated so that the average programmer never sees null, I would take a second look.

Until then, I would rather deal with the problems of surrogates.

The cure is sometimes worse than the disease.

November 15, 2007 9:51 PM
 

Hugo Kornelis said:

Hi Philip,

The point you make is at the same time both very convincing, and completely ridiculous (please read on before being offended, this is not directed at you personally!).

If you find yourself working with developers who don't get the concept of NULL as it is used in databases, then I can very much understand that you get fed up of trying to explain it to them and prefer to hide the NULLs from the developers. Even if that comes at the price of having a less perfect design.

What I really don't get is how on earth people have come to think that databases are simple and that everyone should get it. IT management usually doesn't even blink before sending their developers to a course on Ajax, Silverlight, PHP, C#, and all the other fance shmancy new-fangled stuff. And with good reason, since they can't be expected to just magically understand that stuff - the need to have their education before they can be expected to work with it.

So why do we have such a hard time convincing the world that developers need educaation on working with an RDBMS as well? Why does everybody seem to expect that a developer that is good with reading and writing sequential files will just as easily adapt to working with a database? Every developer who has to work with SQL on a regular basis *should* get some proper education on SQL and the relational model first. Working with a relational database is *not* simple. Try it without education, and you *will* hurt yourself.

>>And God forbid a null shows up in a report where I have to explain the concept to management<<

Yet that is exactly what WILL happpen if reports are coded by developers wiithout the required understanding of relational databases. And what will NOT happen if they get educated before they start coding.

Best, Hugo

November 16, 2007 2:41 AM
 

Paul Nielsen said:

Hi Hugo,

Managers* don't educate on database design because it's easy to click on a diagramming tool and create a table, or click in SSMS to create columns, keys, and indexes. Managers* don't understand the principles of data architecture so they revert to the level of "anyone can click on a tool" mentality.

In this sense Access did great harm to the data modeler, "Yep, my brother-in-law built a database last night in the basement and he thinks a foriegn key is the one he stole from the hotel in Mexico - so how hard can it be?"

On the question of the definition of null, from the first sentence of the SQL Null article on Wikipedia, "Null is a special marker used to indicate that a data value is unknown in the Structured Query Language (SQL)." (and no I didn't just wiki it there ;-) I do believe the short answer tot he question of null is; Unknown. The complete answer is a few to a couple dozen reasons why that data is unknown.

My original point was that null is not really unknown, it's more complicated than that, and that when we code we treat missing data as empty within the closed environment of the database. And in that regard I think we agree.

and, seeing how you blasted my null comment, I'm indeed encouraged that you'll be a incredible tech editor.

-Paul

* there may very well be a wise old IT manager somewhere who understands the lifetime cost of adhoc SQL and values db design above cold fusion code, I just haven't had the pleasure of meeting her. But, how hard it is to find good IT management is another blog...

November 16, 2007 10:49 AM
 

Joe Celko said:

In addition to the errors Hugo caught, look at the GROUPING() function in the new OLAP operators -- GROUPING SET, ROLLUP and CUBE.  it tells you if the NULL is original or created data.  

November 18, 2007 2:55 PM

Leave a Comment

(required) 
(required) 
Submit

About Paul Nielsen

Paul Nielsen believes SQL is the romance language of data. As such he’s a hands-on database developer, Microsoft SQL Server MVP, trainer, and author of SQL Server Bible series (Wiley). As a data architect, he developed the concepts of Smart Database Design and Nordic – an open source O/R dbms for SQL Server. He lives in Colorado Springs.

This Blog

Syndication

News

news item test
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement