THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Alexander Kuznetsov

Optimizing a query, then documenting assumptions in a unit test.

To optimize a query, we frequently have to explicitly tell optimizer some information which it does not realize by itself. Short term, this works, but long term we run the risk that what we are telling to the optimizer may be no longer relevant. This is when unit test shine - they allow us to document the assumptions which we are telling to the optimizer, and to verify if these assumptions are still correct.

For example, let us go through a very typical case of query optimization. The following table stores events:

CREATE TABLE dbo.Events
    
(
      
EventID INT NOT NULL ,
      
StartedAt DATETIME NOT NULL ,
      
EndedAt DATETIME NOT NULL ,
      
CONSTRAINT PK_Events PRIMARY KEY EventID ) ,
      
CONSTRAINT CHK_Events_ValidEnds CHECK StartedAt EndedAt )
    ) ;
  

Let us add 512K events, which start at different times and last at most 10 minutes:

DECLARE @i INT ;
SET @i ;
WHILE @i 512000 
    
BEGIN ;
        
INSERT  dbo.Events
                
EventID ,
                  
StartedAt ,
                  
EndedAt
                
)
                
SELECT  EventID @i -- EventName - varchar(30)
                        
DATEADD(SECONDEventID @i'20100217') , -- StartedAt - datetime
                        
DATEADD(SECONDEventID @i 10'20100217')  -- EndedAt - datetime
                
FROM    dbo.Events
                
UNION ALL
                
SELECT  ,
                        
'20100217 00:01:00' ,
                        
'20100217 00:10:00'
                
WHERE   @i 1
                
UNION ALL
                
SELECT  ,
                        
'20100217 00:05:00' ,
                        
'20100217 00:15:05'
                
WHERE   @i 1     
        
SET @i @i ;
    
END ;
 

Let us also add an index:

 CREATE INDEX Events_StartedAt_EndedAt ON dbo.Events(StartedAtEndedAt) ;
 

Here comes the stored procedure which we need to optimize:

CREATE PROCEDURE dbo.SelectRunningEventsForTime @AsOfTime DATETIME
AS 
    BEGIN 
;
        
SELECT  EventID ,
                
StartedAt ,
                
EndedAt
        
FROM    dbo.Events
        
WHERE   @AsOfTime BETWEEN StartedAt AND EndedAt ;
    
END ;
 

 The problem with this stored procedure is this: to satisfy the query, a range of the index is scanned, from the earliest StartedAt all the way to @AsOfTime.

When @AsOfTime is close to the the earliest event, this is no big deal:

 SET STATISTICS IO ON ;
EXEC dbo.SelectRunningEventsForTime @AsOfTime '20100217 00:09:00' ;

Table 'Events'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

But for times at the end this is becoming a problem, the whole index is scanned:

SET STATISTICS IO ON ;
EXEC dbo.SelectRunningEventsForTime @AsOfTime '20100223 01:38:08' ;
Table 'Events'. Scan count 1, logical reads 1502, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

What can we do?  Suppose that we have a business rule that events never last more than 10 minutes.If the optimizer knows this fact, can it provide a better plan?

Let us add a constraint to enforce this business rule

 ALTER TABLE dbo.Events ADD CONSTRAINT CHK_Events_ValidDuration CHECK(DATEDIFF(MINUTEStartedAtEndedAt) <=10) ;

Note: to be precise, this constraint may allow some events with duration of 10 minutes and 59 cents. Suppose that is OK.

In my environment this constraint does not speed up the query at all:

SET STATISTICS IO ON ;
EXEC dbo.SelectRunningEventsForTime @AsOfTime '20100223 01:38:08' ;
Table 'Events'. Scan count 1, logical reads 1502, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Let us explicitly use the fact that events never last more than 10 minutes in our procedure:

ALTER PROCEDURE dbo.SelectRunningEventsForTime @AsOfTime DATETIME
AS 
    BEGIN 
;
        
DECLARE @EarliestPossibleStart DATETIME ;
        
SET @EarliestPossibleStart DATEADD(MINUTE, -11@AsOfTime) ;
        
SELECT  EventID ,
                
StartedAt ,
                
EndedAt
        
FROM    dbo.Events
        
WHERE   @AsOfTime BETWEEN StartedAt AND EndedAt
                
AND StartedAt >= @EarliestPossibleStart ;
    
END ;
  

Note: We used -11 and not -10 in DATEADD because events may last up to 10 minutes 59 seconds.

Of course, this optimizes the query:

 SET STATISTICS IO ON ;
EXEC dbo.SelectRunningEventsForTime @AsOfTime '20100223 01:38:08' ;

Table 'Events'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Short term, we are all set.

Long term, we have a risk: what if the business rules change, what if we can allow events to last longer, let's say 20 minutes, and what if we fail to change the stored procedure? If that happens, we have a bug.

To be safer, we can create a unit test which verifies that we cannot insert a longer event. The test should attempt to insert a longer event, and should verify that this insert must fail:

-- this event is too long, must not insert
INSERT  dbo.Events
        
EventID ,
          
StartedAt ,
          
EndedAt
                
        
)
        
SELECT  -,
                
'20100217 00:00:00' ,
                
'20100217 00:11:00' ;
 

 If at some later time we change our business rule to allow 20 minute events:

ALTER TABLE dbo.Events DROP CONSTRAINT CHK_Events_ValidDuration;                
GO
ALTER TABLE dbo.Events ADD CONSTRAINT CHK_Events_ValidDuration CHECK(DATEDIFF(MINUTEStartedAtEndedAt) <=20) ;                
 

Our unit test will fail and alert us that we have to change the stored procedure.

Anybody know a better approach?

Update:

I came up with an alternative solution: Yet another example of defensive query optimization.

 

 

Published Wednesday, February 17, 2010 5:02 PM by Alexander Kuznetsov

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Uri Dimant said:

Alex

Perhaps I am missing something but is that possuble that stored procedure will get one more parameter to specify in DATEADD function?

declare @t int

SET @T=-11

SELECT  DATEADD(MINUTE, -@t, '20100223 01:38:08' ) ;

February 18, 2010 2:34 AM
 

Uri Dimant said:

Alex

Perhaps I am missing something but is that possuble that stored procedure will get one more parameter to specify in DATEADD function?

declare @t int

SET @T=-11

SELECT  DATEADD(MINUTE, -@t, '20100223 01:38:08' ) ;

February 18, 2010 2:34 AM
 

Uri Dimant said:

Alex

Perhaps I am missing something but is that possuble that stored procedure will get one more parameter to specify in DATEADD function?

declare @t int

SET @T=-11

SELECT  DATEADD(MINUTE, -@t, '20100223 01:38:08' ) ;

February 18, 2010 2:34 AM
 

Alexander Kuznetsov said:

Uri,

We definitely can add one more parameter, @maximunDuration. Yet the problem remains the same: that additional parameter still has to be consistent with the CHECK constraint, right? How can we enforce that?

February 18, 2010 9:21 AM
 

Alexander Kuznetsov said:

My second take at the same problem I blogged about yesterday: To optimize a query, we frequently have

February 18, 2010 3:24 PM

Leave a Comment

(required) 
(required) 
Submit

About Alexander Kuznetsov

Alex Kuznetsov has been working with object oriented languages, mostly C# and C++, as well as with databases for more than a decade. He has worked with Sybase, SQL Server, Oracle and DB2. He regularly blogs on sqlblog.com, mostly about database unit testing, defensive programming, and query optimization. Alex has written a book entitled "Defensive Database Programming with Transact-SQL" and several articles on simple-talk.com and devx.com. Currently he works at DRW Trading Group in Chicago, where he leads a team of developers, practicing agile development, defensive programming, TDD, and database unit testing.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement