The behavior of NOT IN clause may be confusing and as such
it needs some explanations. Consider the following query:
SELECT LastName, FirstName FROM
Person.Contact WHERE
LastName NOT IN('Hedlund', 'Holloway', NULL)
Although there are more than a thousand distinct last names
in AdventureWorks.Person.Contact, the query returns nothing. This may look
counterintuitive to a beginner database programmer, but it actually makes
perfect sense. The explanation consist of several simple steps. First of all, consider
the following two queries, which are clearly equivalent:
SELECT LastName, FirstName FROM
Person.Contact
WHERE LastName IN('Hedlund', 'Holloway', NULL)
SELECT LastName, FirstName FROM
Person.Contact
WHERE LastName='Hedlund' OR LastName='Holloway' OR LastName=NULL
Note that both queries return expected results. Now, let us recall
DeMorgan's theorem, which states that:
not (P and Q) = (not P) or (not Q)
not (P or Q) = (not P) and (not Q)
I am cutting and pasting from Wikipedia (http://en.wikipedia.org/wiki/De_Morgan_duality).
Applying DeMorgan's theorem to this queries, it follows that these two queries
are also equivalent:
SELECT LastName, FirstName FROM
Person.Contact WHERE
LastName NOT IN('Hedlund', 'Holloway', NULL)
SELECT LastName, FirstName FROM
Person.Contact
WHERE LastName<>'Hedlund' AND LastName<>'Holloway' AND
LastName<>NULL
Note that when I run both queries against AdventureWorks database,
they both return empty result sets. The reason is simple - the condition LastName<>NULL can never be
true (unless you play with ANSI_NULLS setting, which you should not be doing
anyway). Now that you understand why
queries with NOT IN can return empty result sets, consider this query:
SELECT <some columns> FROM SomeTable s
WHERE s.LastName NOT IN(SELECT LastName FROM Person.Contact WHERE <some conditions>)
At the time of this writing Person.Contact.LastName is not nullable, and in the
short term it is perfectly safe to write such queries. In the long term,
however, database schemas can and do change. At some time later on the column Person.Contact.LastName might become
nullable. Come to think of it, Person.Contact
might become a view instead of a table. To be on the safe side, you can
either rewrite your query using NOT EXISTS clause instead of NOT IN, or you can
explicitly ensure that Person.Contact.LastName is not
nullable, as follows:
SELECT <some columns> FROM SomeTable s
WHERE s.LastName NOT IN(SELECT LastName FROM Person.Contact
WHERE
LastName IS NOT
NULL AND <some conditions>)
Either way, your modified query will not break when the
underlying database schema changes.
Next post in this series:
Defensive database programming: qualifying column names