THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Adam Machanic

Adam Machanic, Boston-based SQL Server developer, shares his experiences with programming, monitoring, and performance tuning SQL Server. And the occasional battle with the query optimizer.

What Happened Today? DATE and Date Ranges Over DATETIME

A few days ago Aaron posted yet another fantastic entry in his Bad Habits series, this one discussing mishandling of date ranges in queries. This is a topic near and dear to me, having had to clean up a lot of poorly thought out code in the past few years. Aaron's post includes many examples, all of which boil down to the proper way to do the job, a query like the following (which I've stolenborrowed from the post):

SELECT
    COUNT(*)
FROM dbo.SomeLogTable
WHERE
    DateColumn >= '20091011'
    AND DateColumn < '20091012';

This query asks a simple question: How many rows do we have for October 11, 2009? And assuming the presence of an index on DateColumn, the work will be done much more efficiently than the most common anti-pattern I see, converting both the input date and the date column to CHAR(8) using CONVERT format 112 (YYYYMMDD) and comparing the strings. Doing that will cause a table scan, which is definitely a bad habit worth kicking.

But what I want to show in this post is that contrary to what you might understand from working with previous versions of SQL Server, in SQL Server 2008 conversion is not always a bad thing. In conjunction with the addition of the DATE type, the query optimizer received a minor upgrade. Converting both sides of the predicate to DATE will cause the optimizer to do exactly what you probably wanted to begin with: Return all of the events that occurred today (or on whatever input date you specified), by seeking into the index rather than scanning the table.

To see this optimization, start with the following test data:

CREATE TABLE #dates
(
    the_date DATETIME NOT NULL PRIMARY KEY
);
GO

WITH
numbers AS
(
    SELECT number
    FROM master..spt_values
    WHERE
        type = 'P'
        AND number > 0
)
INSERT #dates
SELECT DATEADD(hh, -number, GETDATE())
FROM numbers
UNION ALL
SELECT DATEADD(hh, number, GETDATE())
FROM numbers;
GO

... And now ask a question. What happened today? First we'll ask the wrong way:

SET SHOWPLAN_TEXT ON;
GO

SELECT
    COUNT(*)
FROM #dates
WHERE
    CONVERT(CHAR(8), the_date, 112) = CONVERT(CHAR(8), GETDATE(), 112);
GO

SET SHOWPLAN_TEXT OFF;
GO

---

  |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1006],0)))
       |--Stream Aggregate(DEFINE:([Expr1006]=Count(*)))
            |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[#dates]), WHERE:(CONVERT(char(8),[tempdb].[dbo].[#dates].[the_date],112)=CONVERT(char(8),getdate(),112)))

Oops! An index scan. That won't do. But if you like writing code like this, all is not lost--you just need to do a different conversion:

SET SHOWPLAN_TEXT ON;
GO

SELECT
    COUNT(*)
FROM #dates
WHERE
    CONVERT(DATE, the_date) = CONVERT(DATE, GETDATE());
GO

SET SHOWPLAN_TEXT OFF;
GO

---

  |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1009],0)))
       |--Stream Aggregate(DEFINE:([Expr1009]=Count(*)))
            |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1007], [Expr1008], [Expr1006]))
                 |--Compute Scalar(DEFINE:(([Expr1007],[Expr1008],[Expr1006])=GetRangeThroughConvert(CONVERT(date,getdate(),0),CONVERT(date,getdate(),0),(62))))
                 |    |--Constant Scan
                 |--Clustered Index Seek(OBJECT:([tempdb].[dbo].[#dates]), SEEK:([tempdb].[dbo].[#dates].[the_date] > [Expr1007] AND [tempdb].[dbo].[#dates].[the_date] < [Expr1008]),  WHERE:(CONVERT(date,[tempdb].[dbo].[#dates].[the_date],0)=CONVERT(date,getdate(),0)) ORDERED FORWARD)

This plan is quite a bit more complex, but the important things to note are:

  1. An index seek is used, rather than an index scan, meaning that this query will return the results in a much more efficient manner
  2. The new query is logically equivalent to the first query

Need more than one day? Use either IN or BETWEEN, both of which produce query plans similar to the above.

SELECT
    COUNT(*)
FROM #dates
WHERE
    CONVERT(DATE, the_date) IN (CONVERT(DATE, GETDATE()-1), CONVERT(DATE, GETDATE()));
GO

SELECT
    COUNT(*)
FROM #dates
WHERE
    CONVERT(DATE, the_date) BETWEEN CONVERT(DATE, GETDATE()-1) AND CONVERT(DATE, GETDATE());
GO

Please note that I'm not suggesting that this methodology is any better or worse than what Aaron suggested in his post. I simply want you to know your options and--more importantly--understand that the knee-jerk "we must never use functions in the WHERE clause" approach is not always advantageous. The query optimizer has come a long way in recent versions of SQL Server and I suspect we'll see a lot more in these areas in the coming releases. It will be interesting to see how many of today's truths cease to be an issue as the optimizer becomes smarter and more adept at fixing user error.

Enjoy!

Published Tuesday, October 20, 2009 3:21 PM by Adam Machanic

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

AaronBertrand said:

Yeah there was some follow-up in the comments on the optimizations for DATE.  I'm not crazy about using the CONVERT() on the LHS just because it is easy to change the code and forget that the CONVERT() is only used because it is an exception to the rule (I'm having deja vu that we have discussed habits here before).  

I also suggest you stop using calculations like GETDATE()-1 as this will break if you do a massive search/replace for sysdatetime().  :-)

October 20, 2009 3:02 PM
 

Dan Rolfe said:

The rows from today.. a little extra writing but specific. Even the MS-Brain can figure it out logically

<code>

declare @date datetime

set @date = '10/20/2009 23:59:59.999'

SELECT COUNT(*) FROM [TABLE]

WHERE (MONTH(Date_Field) = MONTH(@date) AND YEAR(Date_Field) = YEAR(@date) AND DAY(Date_Field) = DAY(@date))

</code>

October 20, 2009 3:11 PM
 

Adam Machanic said:

Sorry, Dan, but that query will force a table scan to occur. Not a good idea.

October 20, 2009 3:20 PM
 

Adam Machanic said:

Aaron: Yes, that's why I put the -1 inside the CONVERT :-) -- in this case it was just much easier to read than DATEADD. I know firsthand the pain of the overloads not being supported; I recently did a conversion of a 2005 DB to 2008, and along the way we changed all of the date/time columns in the database from [SMALLDATETIME with a constraint forcing midnight] to DATE. Suddenly lots and lots of code broke all over the place. Oops.

October 20, 2009 3:23 PM
 

AaronBertrand said:

Placing the -1 inside the convert doesn't help.  This still yields a type clash error when you change GETDATE() to SYSDATETIME(), since it tries to subtract:

CONVERT(DATE, SYSDATETIME()-1)

I understand why you want to skip the DATEADD() syntax to make code less busy, but especially in the context of new DATE/TIME features I think it's dangerous to do so.

October 20, 2009 4:35 PM
 

AaronBertrand said:

I think Dan missed the link to my date/range post.  :-)

October 20, 2009 4:35 PM
 

Sankar Reddy said:

Just want to chime in and add this. In previous versions of SQL Server 2005, a lot of noise was made about datetime data type conversions and I know a lot of code was modified to use someting like below dateadd(dd,datediff(dd,0,getdate()),0). I stumbled across this post by CSS about a performance problem that the above code snippet has inherent problems with low estimates in SQL Server 2005/2008 leading to performance problems. Have any of you ran into this isse?

http://blogs.msdn.com/psssql/archive/2009/07/17/using-datediff-can-query-performance-problems-in-sql-2005-and-2008.aspx

October 20, 2009 7:43 PM
 

AaronBertrand said:

I can see where this could cause an issue if your data distribution is heavily skewed, but in a normal distribution the estimate shouldn't affect performance by much at all.  I haven't seen the issue yet (though I now know to watch for it).  In any case, almost unilaterally when I am performing date range queries, the DATEADD/DATEDIFF calculations aren't actually part of the query; I apply them to the incoming parameter and/or create new parameters, depending on the other functionality in the procedure.  The query itself then uses the variable as opposed to a calculation that happens to use the variable.  Yes this can lead to parameter sniffing issues but certainly nothing due to the use of DATEADD/DATEDIFF vs. more primitive methods like converting to a string.  The big issue is when you essentially perform one or two calculations for every row, and this is certainly going to lead to more frequent, severe and predictable/preventable performance problems than an estimation problem.

October 20, 2009 8:14 PM
 

Adam Machanic said:

Sankar: Agreed with Aaron; the potential issue seems like an edge case, not a normal situation. I have used the DATEADD/DATEDIFF techniques in dozens of projects--hundreds of queries--and have never seen any issues caused by estimates. It's definitely good to keep in mind but there are certainly bigger issues to dwell upon!

October 20, 2009 9:10 PM
 

Sankar Reddy said:

Neither did I (and/or probably didn't even noticed) and was curious coming from PSS. Thanks for the feedback.

October 21, 2009 2:30 AM
 

Joe Celko said:

Wouldn't it be faster to use the constant phi and the closed form?

Fibonacci(@n INTEGER)

AS

RETURN

EOUND (((POWER (1.6190339887, @n)- POWER (1.0 - 1.6190339887, @n))/ SQRT (5.0)), 0);

untested.  Probably ought to cast phi as DOUBLE PRECISION

October 25, 2009 9:49 AM
 

Adam Machanic said:

Joe: Wrong post. I think you meant to be here:

http://sqlblog.com/blogs/peter_larsson/archive/2009/10/18/simple-fibonacci-calculation.aspx

Not sure what EOUND is, either :-)

October 25, 2009 10:33 AM
 

matt said:

wow!  that's great!  thanks!

November 6, 2009 11:17 AM
 

Aaron Bertrand said:

Give us easier to read execution plans Michelle Ufford ( @SQLFool ) recently asked for help pinpointing

January 22, 2010 3:46 PM

Leave a Comment

(required) 
(required) 
Submit

About Adam Machanic

Adam Machanic is a Boston-based SQL Server developer, writer, and speaker. He focuses on large-scale data warehouse performance and development, and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "SQL Server 2008 Internals" (Microsoft Press, 2009) and "Expert SQL Server 2005 Development" (Apress, 2007). Adam regularly speaks at conferences and training events on a variety of SQL Server topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server, a Microsoft Certified IT Professional (MCITP), and an alumnus of the INETA North American Speakers Bureau.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement