THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SELECT Hints, Tips, Tricks FROM Hugo Kornelis WHERE RDBMS = 'SQL Server'

The logic of three-valued logic

In my previous post, I explained what NULL does and does not mean, how and why the rule of NULL propagation forces any expression involving NULL to result in NULL. I also mentioned that comparisons involving NULL do not result in NULL but, rather, in a new “truth value” called “Unknown”. But there’s a lot more to be written about Unknown and it’s effect on logical expressions, and that’s exactly what this blog post (and the next one as well) will be about.

 

Most software developers will be intimately familiar with what is popularly known as Boolean logic but should actually be called a two-valued Boolean algebra – a Boolean algebra that is defined on the two truth values True and False, and the three operators AND, OR, and NOT. However, with the introduction of Unknown as a third truth value, some rules of Boolean algebra are violated, so the original two-valued Boolean algebra has to be replaced by a new, non-Boolean three-valued algebra for logical operations. I’ll assume that most readers already know the basics of this three-valued logic, but I’ll briefly cover them anyway (in this post) before moving on to the more interesting side effects (mostly in a future post).

 

Truth tables for AND, OR, and NOT

 

Since we have already established that a proposition such as WHERE Age > 35 can result in Unknown, the logical next step is to define how this affects the result of propositions such as WHERE Age > 35 AND Country = 'USA', WHERE Age > 35 OR Country = 'USA', and WHERE NOT (Age > 35). In other words, we have to determine how the new truth value of Unknown affects the truth tables for the AND, OR, and NOT operators. (Note that some sources, including Books Online, label these operators as “Boolean” operators. This is not correct, since the T-SQL versions of these operators operate on three-valued logic rather than the Boolean two-valued logic.)

 

I believe that simple, easy-to-follow examples are the best way to illustrate a complex subject, so that is exactly how I will explain the logic behind the truth tables for AND, OR, and NOT – by using a simple example. I’ll just stick with my previous example that revolves around my undisclosed age, since that saves me the hassle of not disclosing other personal information.

 

Let’s look at the AND operator first. In two-valued logic, True AND True evaluates to True, whereas True AND False, False AND True, and False AND False all evaluate to False. The additional truth value of Unknown means that we have to fill in five new cells, for True AND Unknown, False AND Unknown, Unknown AND True, Unknown AND False, and Unknown AND Unknown.

 

WHERE Age > 35 AND Country = 'USA'

Many readers will probably already know that I’m Dutch (and those that didn’t, do now), so my country of residence is definitely not the USA. If someone points a gun to your head and demands that you tell him or her whether or not it is true that Hugo Kornelis is over 35 years of age and living in the USA what will your answer be? You’ll probably answer that this proposition is false – since you now know that I’m not living in the USA, you don’t need to know my age anymore; regardless of my age, the proposition as a whole can only be false.

Congratulations – you have just filled in two of the missing cells in the truth table for AND: both Unknown AND False and False AND Unknown evaluate to False.

 

WHERE Age > 35 AND Country = 'Nederland'

Assuming you know that “Nederland” is Dutch for “the Netherlands”, you’ll find this proposition harder to assess. Okay, so the second part is true. But the proposition as a whole is only true if I’m also over 35, which you obviously don’t know. So if once again someone puts that proverbial gun against your head, you can only reply that you really don’t know, and hope that this is enough to convince the shooter not to pull the trigger.

For the truth table, we can now fill in that Unknown AND True (and True AND Unknown) evaluate to Unknown.

 

WHERE Age > 35 AND Age < 45

This leaves us with just one missing cell in the truth table for AND, for Unknown AND Unknown. The proposition above illustrates this one. Since you don’t know my age, you don’t know if I’m over 35, nor if I’m under 45 – so you have obviously no way of telling whether I’m both over 35 and under 45.

The only logical outcome of Unknown AND Unknown is Unknown.

 

The truth table of OR also starts with the four cells know from two-valued logic: True OR True, True OR False, and False or True evaluate to True, and False OR False evaluates to False. The same five new cells that were added to the table for AND are added here as well, and we’ll use similar examples to populate them.

 

WHERE Age > 35 OR Country = 'USA'

Since I still haven’t moved to the USA, the result of this proposition is True if I’m over 35, or False if I’m not. Since you don’t know my age, you don’t know the result of this proposition.

Unknown OR False and False OR Unknown both evaluate to Unknown.

 

WHERE Age > 35 OR Country = 'Nederland'

Here’s an example where you don’t have to fear the wrath of the shooter. No matter what my age is, you can safely answer that, since I’m living in the Netherlands, it is also true that I am over 35 years of age or living in the Netherlands.

Both Unknown OR True and True OR Unknown evaluate to True.

 

WHERE Age > 35 OR Age < 30

(Those who have ever seen me will have to stretch their imagination a bit. Just pretend that you really can’t tell for sure that I’m definitely not under 30, to make this a proper example of Unknown OR Unknown, ’kay?)

Not much to be said about this example - Unknown OR Unknown is still Unknown.

 

Now that the truth tables for AND and OR are complete, we only have the table for NOT left. This one grows from two cells (NOT True = False, and NOT False = True) to three, the third cell being for NOT Unknown. So, here’s the example:

WHERE NOT Age > 35

You don’t know my age. That’s why you don’t know whether I’m over 35. So, how can you possibly know whether I am not over 35? Duh!

NOT Unknown is, of course, Unknown.

 

So, to wrap things up, here are the completed truth tables for the operators AND, OR, and NOT in three-valued logic

 

The truth table for AND:

 

True

Unknown

False

True

True

Unknown

False

Unknown

Unknown

Unknown

False

False

False

False

false

 

The truth table for OR:

 

True

Unknown

False

True

True

True

True

Unknown

True

Unknown

Unknown

False

True

Unknown

false

 

The truth table for NOT:

True

False

Unknown

Unknown

False

True

 

Some oddness

 

If you ever forget the logic tables, you can use examples similar to these to easily reconstruct them on the spot. However, do take care that you use independent propositions. If you don’t, you can easily get at incorrect conclusions, as illustrated by these propositions:

 

WHERE Age > 35 AND Age < 30

WHERE Age > 35 OR Age < 45

This is a fine example of the difference between humans and computers. Humans think; computers only do as told. You do not need to know my age in order to tell everyone who asks you that Hugo Kornelis is definitely not both over 35 and under 30, or that he definitely is either over 35 or under 45. A computer lacks this logic insight – it just does as told. The instructions for the computer are clear: first evaluate the individual propositions (all resulting in Unknown, since each of the propositions by itself is indeed Unknown), then apply the AND or OR operator to the two truth values – so the computer will return Unknown for the two propositions above, even though every sane human knows that they should evaluate to False and True respectively.

 

CHECK, the odd one out

 

Once a proposition has been evaluated to a single truth value, the computer has to determine what to do with the result. In most all places where the SQL language rules allow a logical expression, the defined behavior is to treat True one way, and both False and Unknown in another way. For instance, a row is included in the result set of a query if the WHERE clause evaluates to True, but excluded if it evaluates to either False or Unknown. Same goes for the decision to include or exclude a group of rows based on the HAVING clause – it has to evaluate to True; both Unknown and False means that the group is out. The query processor prefers to be safe rather than sorry – if it can’t tell for sure that you want the row included, it won’t be.

 

For logical expressions in IF and WHILE statements and in CASE expressions, the same logic holds. A condition statement introduced with IF will be executed if the condition is True whereas the ELSE part (if any) is executed on both False and Unknown; a WHILE block will be repeated as long as the expression evaluates to True and execution stops on False and Unknown; and in a CASE expression, the WHEN that evaluates to True determines the result.

 

There is one notable exception to this – the CHECK constraint. If a row is inserted or updated that causes the CHECK constraint to evaluate to True, the modification is accepted; if it evaluates to False, it is rejected – but if it evaluates to Unknown, the modification is accepted! This is an unexpected change from the accustomed, but not one that has been made without reason – since Unknown usually results from a NULL value in the constrained column, rejecting rows if the CHECK constraint if the proposition evaluates to Unknown would merely duplicate the effect of a NOT NULL constraint. With the definition of a CHECK constraint as it is, illogical as it might seem, it’s very easy to define both an optional age column that will only accept ages over 35 or no age at all, and equally easy to define a mandatory age column that forces you to enter an age of over 35:

 

CREATE TABLE DemoPersons

            (PersonID     int     NOT NULL

            ,AgeOptional  tinyint NULL     CHECK (AgeOptional > 35)

            ,AgeMandatory tinyint NOT NULL CHECK (AgeMandatory > 35)

            ,CHECK (AgeOptional = AgeMandatory)

             );

 

Note the third check constraint, which ensures that both age columns are populated with the same age – but only if both age columns are populated at all, since this expression also results in Unknown if either of the age columns is NULL.

 

More oddness

 

The different treatment of Unknown in a CHECK constraint versus in a WHERE clause (or anywhere else, for that matter) can result in some interesting behavior. Let’s add some rows to illustrate this.

 

INSERT INTO DemoPersons (PersonID, AgeOptional, AgeMandatory)

SELECT 1, 36, 36

UNION ALL

SELECT 2, NULL, 37

UNION ALL

SELECT 3, 38, 38

UNION ALL

SELECT 4, 39, 39

UNION ALL

SELECT 5, NULL, 40;

 

Now, since the check constraint excludes row with an age of 35 or less, the query below should return 0, right? Well, no. It returns 2.

 

SELECT (SELECT COUNT(*)

        FROM   DemoPersons)

     - (SELECT COUNT(*)

        FROM   DemoPersons

        WHERE  AgeOptional > 35);

 

It appears odd, on first sight, that a table with a CHECK constraint that only allows ages of over 35 still allows for rows that do not have an age of over 35 – but it does, in fact, make perfect sense. Remember that rows for which AgeOptional > 35 is Unknown are admitted by the CHECK constraint, but rejected by the WHERE clause!

 

In the next post on NULL, I’ll show more seemingly odd behavior of NULL, and explain why this behavior is, in fact, not odd at all. I’ll also answer the question if null if null is null null null is null – I bet you can’t wait!

Published Tuesday, July 17, 2007 11:58 PM by Hugo Kornelis
Filed under: , ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

James Luetkehoelter said:

Beautifully done. This is a great discussion of three-valued logic. How about tackling Belief Logic and SSAS Data Mining :)

July 18, 2007 5:28 AM
 

David Portas said:

Good information Hugo. The only danger is that by trying to relate three-value logic to everyday examples of what we know and don't know someone might infer that the truth value "unknown" is directly related to its usual meaning in English. You gave one example where that isn't the case: "WHERE Age > 35 AND Age < 30". An even simpler example would be "WHERE Age = Age". Again, we don't have to know your age to know that the value of Age is equal to itself. The value of this predicate is not unknown in mathematical terms or in everyday experience. So it would be safer and less counter-intuitive if SQL named the third truth value as something else altogether - the common English meaning of the word "unknown" clearly does not apply.

July 21, 2007 7:28 AM
 

Hugo Kornelis said:

Note to all - while rereading this post in preparation of the third part of this series (do I hear someone say "at last"?), I noted a disturbing error.

In the second paragraph after the heading "CHECK, the odd one out" I had mistakenly types WHERE instead of WHILE. Not once but twice. Aaarrgghh!!!

Since this is not just an amusing type, but actually an error that changes what I want to say, I just went in and corrected my mistake. If my error has confused you, then please know that I am sorry, and accept my apologies.

September 22, 2007 6:06 AM
 

SELECT Hints, Tips, Tricks FROM Hugo Kornelis WHERE RDBMS = 'SQL Server' said:

Two months ago, I posted the first two parts of a series about NULL. After that, I went quiet. Much to

September 22, 2007 4:35 PM
 

Hugo Kornelis said:

The fourth (and final) part of this series about  NULL is now available at http://sqlblog.com/blogs/hugo_kornelis/archive/2007/09/30/what-if-null-if-null-is-null-null-null-is-null.aspx.

October 1, 2007 2:17 AM
 

Denis Gobo said:

This is part two of a three part series. Part one was about the phone interview , this part is about

December 10, 2007 3:02 PM

Leave a Comment

(required) 
(required) 
Submit

About Hugo Kornelis

Hugo is co-founder and R&D lead of perFact BV, a Dutch company that strives to improve analysis methods and to develop computer-aided tools that will generate completely functional applications from the analysis deliverable. The chosen platform for this development is SQL Server. In his spare time, Hugo likes to visit the SQL Server newsgroups, in order to share and enhance his knowledge of SQL Server.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement