In my previous post on NULLs I discussed (rather replied to Hugo Kornelis) some more academic concepts and definitions when it comes to NULL. It is very sad for me to admit that, while interesting to me and many others, that sort of discussion is all but useless the instant you walk into the practical world.
Near the end of my last post, I made a potentially controversial statement:
It doesn't matter what the theoretical or academic definition of NULL is, nor how a given RDMBS uses it. All that matters is how the business interprets and uses NULL.
I know that statement will make a handful of those in the database world very angry (or dismiss me as ignorant of a true relational database), but I stand by it 100%. Once you step into the real world, NULL usage is, well, NULL. It either isn't used at all as a NULL, or it is used as a NULL but in appropriately. At least 80% of the functioning systems I see suffer from some sort of inconsistency when it comes to NULL. I'll just refer to Chaotic NULL from this point on.
While Chaotic NULL sounds negative, it doesn't have to be. Yes, there are very odd occurances of NULL or NULL-like behaviour in practical usage, and some of that usage sometimes has me utter outloud (often to my own detriment) "What the...who was the genious that came up with this so-called 'design'?" But in general, there was some real reason that generated the use of a Chaotic NULL. As long as the system retains some sense of data consistency, common usage and understanding of NULL, we as data professionals just need bite our tongues and accept the state of things.
While we can't sit in an ivory tower and still work in the real world with an actual off-the-shelf or hand-made RDBMS system that implements some version of ANSI SQL, we can at least come to some sort of consensus on what we mean by NULL with any given *instance* of a database. Unfortunately, I find that consensus is a foreign term to many a developer and dba. What results is what I tend to refer to as the Chaotic NULL.
When I first encountered situations of Chaotic NULL, I was first very frustrated and tried desparately to clarify the proper usage of NULL, both from the DBA level and the application (as well as the users of that application). Needless to say, this was an excercise in futility. It took a few years to sink in, but I began to realize that my goal shouldn't be to get my clients to understand NULL as I do, in an abstract sense, but to get them to work with NULL consistently. It doesn't matter what they want it to represent (as long as the understand the implications of how SQL Server), or whatever RDBMS interprets them). It can represent unknown or is missing or zero or an empty string or no data found, etc. Without consistent usage though, bad things tend to happen.
Here are some examples of Chaotic NULL that I've encountered in the past. The fact that they are violating the "essence" of NULL is, in my opinion, ultimately irrelevant. Eliminating the Chaotic NULL in each case simply requires consistent use. Often easier said than done. Here are some of the situations I've encountered (again and again and again...):
DBA Uses Null, No One Else Does
The DBA stomps around the office shouting in vain "doesn't anybody know what NULL means?" He's designed the database for an application with NULL in mind, being sure to use it to fully support any application need (perhaps 3-value logic) or reporting needs (NULLs not counting in an aggregation), yet still no one follows his lead, and the database is blamed for application shortcomings.
That was me at one point. It sounds like simply a communication shortcoming, but the problem was I was the only with a clear (although maybe not quite correct) understanding of NULL. Developers would work with NULLs differently within their own snippets of code. Those designing reports for management would sometimes include NULLs in calculations ("IsNull(attribute,0)"), some wouldn't. Management would make business decisions based on differing data. In some cases these had actual, negative fiscal effects on the business. The problems is partly one of communication and isolation in development and business (a rant for another day), but if there was at least a consistent "incorrect" view of NULL used, at least the data would be consistently "incorrect" from what the DBA (ok, I) intended. That at leasts makes business functionality possible, even if it is slightly skewed. Then it becomes a data quality issue, which can be addressed. What I experienced was literally unfixable given the environment.
Sound familiar to anyone?
No one realizes NULL is even used (or how to use it)
An application has a data backend that defaults NULLs as much as possible, or uses NULLs to allow for default values. Usually this is due to a very separate database design process that is separated from the the business rule and presentation layer development process. A table would typically be defined someone like this:
CREATE TABLE [dbo].[Address](
[AddressID] [int] IDENTITY(1,1) NOT NULL,
[AddressLine1] [varchar](30) NULL,
[AddressLine2] [varchar](30) NULL,
[City] [varchar](30) NULL,
[StateProvince] [varchar](30) NULL,
[Country] [varchar](30) NULL,
[MailCode] [varchar](30) NULL,
[AddressType] [varchar](1) DEFAULT 'P' NULL)
Anyone notice a problem with the AddressType definition? Will the "P" (Primary address) every be populated by default? Nope, the field will always be NULL. A DEFAULT won't fire unless unless the attribute is defined as NOT NULL.
So what happens with the rest of the table? Well, everyone knows (that no data is actually required) other than the identity column. Without any application-level constraints, someone could literally create an address entry with only an identity entry. The result, as the data slowly gets filled in, are placeholder rows, which are notorious for all sorts of leaf-level fragmentation. Simply not understanding how the database and NULLs work (and that they exist) can quickly result in poor performance as data is entered.
Everyone uses NULL differently
This one is my favorite. I commonly ask students when teaching what NULL actually means. As Hugo Kornelis pointed out, there are any number of variations, some no where near what the theoretically and practical (as far as how SQL Server treats NULL):
"An empty string"
"An empty set" (usually this is accompied by little or no understanding of Set Theory)
Etc., etc., etc., etc...
These definitions cause a problem when arguing it when at a database design layer. Imagine the disaster that ensues when the users of an application interpret form output, reports, etc., each interpreting the results differently. Does it happen? Unfortunately, yes, more often then that. This leads to incredible data quality issues, not to mention extremely poor business decisions given what interpretation "wins out".
Someone expected NULLs to be reflected on reports
We all know this, but not all database "consumers" do: with the exception of COUNT(*), all TSQL aggregate functions ignore NULL rows. This is especially important with AVG, SUM, etc, which are ubiquitous on reports given to management. This is fine if everyone understands that the NULLs are included as 0s in the calculation, or if they are avoided. However, if this isn't made clear at the time the report is analyzed and reviewed, we again run the risk of very, very strange business decisions.
NULL changes meaning throughout its lifetime
Some organizations practice job rotation - everyone plays a different role in the company for a certain period of time (usually 1.5-2 years). This is typically done at the management layer, but in large organizations the same can be done with application development. I find this to be the most frustrating situation in which to find myself.
Sometimes NULL means "not required". Sometimes it means "unknown". Perhaps it just means "is missing". The problem is that due to job rotation or turnover, I have to know exactly what my predecessors meant when they used NULL. Technically NULL should have no semantic meaning other than "missing value" - no implied attribute as to why it is missing, just that it is missing. In practice that is rarely the case.
I once had to refactor a database that included Null in many tables. In some instances I was able to infer that what was meant was simple "not required". In other instances, it wasn't very clear. This meant sitting down with the data entry people to see exactly how they used the application, and what *they* meant when they left a field blank (leaving it at a Null value). This took months, and luckily the result was a very distinct timeline as to the changing meaning of Null, tied almost exactly to the times that DBA/developers left the organizations and a neophyte was hired. It was almost like dealing with a slowly changing dimension in an Analysis Services database...
The word "NULL" = NULL
I'm not kidding, I've actually encountered it again and again. All database fields are defined as NOT NULL but VARCHAR data types. Anytime NULL is appropriate the application layer inserts the word (literally) 'NULL'. I'd like to say that I've only seen this occur when a database is actually created from the application design where no DBA has input, but I've also seen this happen where the DEFAULT for a column is set to the word "NULL".
Now when working with the application itself, there may not be any data integrity issues, or confusion as to what it means when the word "NULL" appears. This probably isn't the most optimal performance scenario, but as long as there is consistent use and understanding of this hard-coded "NULL", from an information accuracy perspective, all is well.
It wouldn't have been that great a problem, except there were legitimate NULLs within the database, along with these pseudo-Nulls. Again, I had to sit with the data entry people to see exactly what they intented when entering data and how that aligned with the "Null" and the legitmate NULLs. A lot of tedious work, but in the end a clear business rule emerged, and I was able to translate exactly what was meant by the text-based "NULL"s and the actual NULLs.
With as complex as NULL can be when used properly (see some of Hugo's posts on the matter), adding this additional layer of "custom" Null usage - the Chaotic Null, and you have a nightmare on your hands. But the bottom line is that it doesn't matter how we view Null academically - it is in the actual practice of database design that we must live and breath Null. Never forget that.