THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Alexander Kuznetsov

When you add an index and your query blows up...

You cannot assume that the conditions in your WHERE clause will evaluate in the left-to-write order - making such assumptions leads to unsafe queries. For example, the following query is not safe:

SELECT FROM dbo.Messages
  WHERE ISDATE(VarcharColumn)=
    
AND CAST(
VarcharColumn) AS DATETIME)='20090707'
 

It can blow up at any time, and the reason is simple: the conditions in your WHERE clause can evaluate in any order, and the order can change the next time your query executes. If an invalid value casts before the check if it is valid, the query will blow up. Of course, I am not the first to point this out. Right from the top of my head I recall that you can find similar recommendations in Itzik Ben Gan's book on T-SQl Fundamentals, and in a recent Plamen Ratchev's post "Predicates in SQL". However, I would like to demonstrate how little it may take to break such code, how brittle it may be. I will provide a repro script in which a query originally succeeds, but fails after I have added an index.

 

Do not make assumptions which you cannot guarantee

 

Whenever you write such code as the previous sample, you clearly assume that the first condition, the call to ISDATE, will evaluate before the CAST. However, this assumption cannot be guaranteed, and when this assumption is not true, the query blows up. Even if this query always succeeds in your test runs, you cannot assume that it will always succeed in the future. The safe way to ensure that invalid values are never cast to datetime is this:

 

SELECT FROM dbo.Messages
  
WHERE CASE WHEN ISDATE(
VarcharColumn)=1
    
THEN CAST(
VarcharColumn AS DATETIMEEND ='20090707'

 

It is safe becasue it is documented that CASE expression evaluates the WHEN clause before evaluating the THEN clause.

 

When order in which conditions are evaluated changes - a repro script.

 

I am going to provide a repro script demostrating that provide that the order in which conditions are evaluated can change. However, I have to begin with a big disclaimer: there is no guarantee that this repro script will work as expected on your server. My repro script does work for me, and most likely will work for you too, but I cannot promise that.

 

Setting up test data

 

The following script adds 1M messages, all of them except one have invalid dates:

 

 

-- helper table
CREATE TABLE dbo.Numbers(INT NOT NULL PRIMARY KEY)
GO
DECLARE @i INT;
SET @i 1;
INSERT INTO dbo.Numbers(nSELECT 1;
WHILE @i<1024000 BEGIN
  INSERT INTO 
dbo.Numbers(n)
    
SELECT @i FROM dbo.Numbers;
  
SET @i @i 2;
END;
GO
CREATE TABLE dbo.Messages(MessageID INT NOT NULL PRIMARY KEY,
  
SenderID INT NOT NULL,
  
ReceiverID INT NOT NULL,
  
MessageDateAs
VarcharColumn VARCHAR(30) NULL,
  
SomeMoreData CHAR(200) NULL);
GO
INSERT INTO dbo.Messages(MessageID,
  
SenderID,
  
ReceiverID,
 
MessageDateAsVarcharColumn,
  
SomeMoreData)
SELECT nn%1000n/1000'Wrong Date''SomeMoreData'
  
FROM dbo.Numbers;
GO
-- only one message has a valid date
UPDATE dbo.Messages SET
MessageDateAsVarcharColumn='20090707' 
  
WHERE SenderID 123 AND ReceiverID 456;
 


 

Originally an unsafe query against test data succeeds


The following query succeeds on my server, selecting the only one row which has a valid date:

 

SELECT FROM dbo.Messages
  
WHERE SenderID 123 
    
AND ReceiverID 456
    
AND CAST(
MessageDateAsVarcharColumn AS DATETIME)='20090707'
 

 

Because the query succeeds, clearly for all the rows the first two conditions are evaluated before MessageDate is cast to DATETIME, because only the row for which the first two conditions are turn has a valid datetime value. Let me repeat my disclaimer: there is no guarantee that this query will succeed on your server. Even if you reshuffle the conditions in your WHERE clause, the query still succeeds, so clearly the cast is attempted only after both other conditions are evaluated:

 

 

SELECT FROM dbo.Messages
  
WHERE
CAST(MessageDateAsVarcharColumn AS DATETIME)='20090707'
    
AND ReceiverID 456
    
AND
SenderID 123 
 

 

As you have seen, conditions in WHERE clause are not evaluated in left-to-right order.  You can also play with parenthesis trying to enforce one particular order, but that will not make any difference either.

 

After adding an index, the same query blows up

 

Here is the index:

 

CREATE INDEX Messages_SenderID_MessageDate
  
ON dbo.Messages(SenderID,
MessageDateAsVarcharColumn);

 

After it is created, the query blows up:

Msg 241, Level 16, State 1, Line 1
Conversion failed when converting datetime from character string.

My explanation is simple: the optimizer chooses to use the new non-clustered index, and it chooses to cast the MessageDate value and evaluate the third condition CAST(MessageDateAsVarcharColumn AS DATETIME)='20090707') before doing an expensive bookmark lookup. Let me repeat my disclaimer again: there is no guarantee that this query will blow upon your server.

Also I played a little bit with parenthesis, trying to use them to enforce a particular order of evaluation - of course that did not make any difference.

As you have seen, it is unsafe to make assumptions about the order in which conditions in your WHERE clause are evaluated. Whenever you see a query developed under such an assumption, rewrite it eliminating the unsafe assumption altogether.

 

If you are still reading this, can you do me a favor and post if this repro script worked on your server the way it did on mine. Please also post the output of SELECT @@VERSION.  I will be on vacation for a couple of weeks. This means that I will respond to questions, but not immediately.

 

This post continues my series on defensive database programming. Here are my previous posts  from the series: 

Defensive database programming: fun with triggers

Defensive database programming: fun with ROWCOUNT

Summarizing previous posts about defensive database programming

Defensive database programming: SET vs. SELECT.

Stress testing UPSERTs

Defensive database programming: fun with UPDATE.

Defensive database programming: eliminating IF statements.

Defensive database programming: fun with changing column widths.

Avoid mixing old and new styles of error handling.

Defensive database programming: adding ESCAPE clauses.

Defensive database programming: qualifying column names.

Defensive database programming: rewriting queries with NOT IN().

Published Thursday, July 16, 2009 5:39 PM by Alexander Kuznetsov

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Marius said:

Hi Alexander,

I got the same results as your.

I am running:

Microsoft SQL Server 2008 (RTM) - 10.0.1600.22 (Intel X86)   Jul  9 2008 14:43:34   Copyright (c) 1988-2008 Microsoft Corporation  Express Edition with Advanced Services on Windows NT 5.1 <X86> (Build 2600: Service Pack 3)

July 24, 2009 3:26 AM
 

Alexander Kuznetsov said:

Sounds trivial? Right, but different flavors of this myth still persist. Yesterday I noticed a thread

October 9, 2009 5:23 PM
 

Dan said:

Thanks so much for the CASE trick to force data validation before sending the data to the CAST/CONVERT function.  

I found many places online that explained the issue at hand was that there was no guaranteed order to the WHERE execution, but you were the only one to provide a beautiful/simple/elegant solution.

October 21, 2009 2:53 PM

Leave a Comment

(required) 
(required) 
Submit

About Alexander Kuznetsov

Alex Kuznetsov has been working with object oriented languages, mostly C# and C++, as well as with databases for more than a decade. He has worked with Sybase, SQL Server, Oracle and DB2. He regularly blogs on sqlblog.com, mostly about database unit testing, defensive programming, and query optimization. Alex has written a book entitled "Defensive Database Programming with Transact-SQL" and several articles on simple-talk.com and devx.com. Currently he works at DRW Trading Group in Chicago, where he leads a team of developers, practicing agile development, defensive programming, TDD, and database unit testing.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement