To optimize a query, we frequently have to explicitly tell optimizer some information which it does not realize by itself. Short term, this works, but long term we run the risk that what we are telling to the optimizer may be no longer relevant. This is when unit test shine - they allow us to document the assumptions which we are telling to the optimizer, and to verify if these assumptions are still correct.
For example, let us go through a very typical case of query optimization. The following table stores events:
CREATE TABLE dbo.Events
(
EventID INT NOT NULL ,
StartedAt DATETIME NOT NULL ,
EndedAt DATETIME NOT NULL ,
CONSTRAINT PK_Events PRIMARY KEY ( EventID ) ,
CONSTRAINT CHK_Events_ValidEnds CHECK ( StartedAt < EndedAt )
) ;
Let us add 512K events, which start at different times and last at most 10 minutes:
DECLARE @i INT ;
SET @i = 1 ;
WHILE @i < 512000
BEGIN ;
INSERT dbo.Events
( EventID ,
StartedAt ,
EndedAt
)
SELECT EventID + @i , -- EventName - varchar(30)
DATEADD(SECOND, EventID + @i, '20100217') , -- StartedAt - datetime
DATEADD(SECOND, EventID + @i + 10, '20100217') -- EndedAt - datetime
FROM dbo.Events
UNION ALL
SELECT 1 ,
'20100217 00:01:00' ,
'20100217 00:10:00'
WHERE @i = 1
UNION ALL
SELECT 2 ,
'20100217 00:05:00' ,
'20100217 00:15:05'
WHERE @i = 1
SET @i = @i * 2 ;
END ;
Let us also add an index:
CREATE INDEX Events_StartedAt_EndedAt ON dbo.Events(StartedAt, EndedAt) ;
Here comes the stored procedure which we need to optimize:
CREATE PROCEDURE dbo.SelectRunningEventsForTime @AsOfTime DATETIME
AS
BEGIN ;
SELECT EventID ,
StartedAt ,
EndedAt
FROM dbo.Events
WHERE @AsOfTime BETWEEN StartedAt AND EndedAt ;
END ;
The problem with this stored procedure is this: to satisfy the query, a range of the index is scanned, from the earliest StartedAt all the way to @AsOfTime.
When @AsOfTime is close to the the earliest event, this is no big deal:
SET STATISTICS IO ON ;
EXEC dbo.SelectRunningEventsForTime @AsOfTime = '20100217 00:09:00' ;
Table 'Events'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
But for times at the end this is becoming a problem, the whole index is scanned:
SET STATISTICS IO ON ;
EXEC dbo.SelectRunningEventsForTime @AsOfTime = '20100223 01:38:08' ;
Table 'Events'. Scan count 1, logical reads 1502, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
What can we do? Suppose that we have a business rule that events never last more than 10 minutes.If the optimizer knows this fact, can it provide a better plan?
Let us add a constraint to enforce this business rule
ALTER TABLE dbo.Events ADD CONSTRAINT CHK_Events_ValidDuration CHECK(DATEDIFF(MINUTE, StartedAt, EndedAt) <=10) ;
Note: to be precise, this constraint may allow some events with duration of 10 minutes and 59 cents. Suppose that is OK.
In my environment this constraint does not speed up the query at all:
SET STATISTICS IO ON ;
EXEC dbo.SelectRunningEventsForTime @AsOfTime = '20100223 01:38:08' ;
Table
'Events'. Scan count 1, logical reads 1502, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.
Let us explicitly use the fact that events never last more than 10 minutes in our procedure:
ALTER PROCEDURE dbo.SelectRunningEventsForTime @AsOfTime DATETIME
AS
BEGIN ;
DECLARE @EarliestPossibleStart DATETIME ;
SET @EarliestPossibleStart = DATEADD(MINUTE, -11, @AsOfTime) ;
SELECT EventID ,
StartedAt ,
EndedAt
FROM dbo.Events
WHERE @AsOfTime BETWEEN StartedAt AND EndedAt
AND StartedAt >= @EarliestPossibleStart ;
END ;
Note: We used -11 and not -10 in DATEADD because events may last up to 10 minutes 59 seconds.
Of course, this optimizes the query:
SET STATISTICS IO ON ;
EXEC dbo.SelectRunningEventsForTime @AsOfTime = '20100223 01:38:08' ;
Table 'Events'. Scan count 1, logical reads 5, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.
Short term, we are all set.
Long term, we have a risk: what if the business rules change, what if we can allow events to last longer, let's say 20 minutes, and what if we fail to change the stored procedure? If that happens, we have a bug.
To be safer, we can create a unit test which verifies that we cannot insert a longer event. The test should attempt to insert a longer event, and should verify that this insert must fail:
-- this event is too long, must not insert
INSERT dbo.Events
( EventID ,
StartedAt ,
EndedAt
)
SELECT -1 ,
'20100217 00:00:00' ,
'20100217 00:11:00' ;
If at some later time we change our business rule to allow 20 minute events:
ALTER TABLE dbo.Events DROP CONSTRAINT CHK_Events_ValidDuration;
GO
ALTER TABLE dbo.Events ADD CONSTRAINT CHK_Events_ValidDuration CHECK(DATEDIFF(MINUTE, StartedAt, EndedAt) <=20) ;
Our unit test will fail and alert us that we have to change the stored procedure.
Anybody know a better approach?
Update:
I came up with an alternative solution: Yet another example of defensive query optimization.