THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Paul White: Page Free Space

A technical SQL Server blog from New Zealand. See also my articles on SQLperformance.com

Dynamic Seeks and Hidden Implicit Conversions

Most people know that a LIKE predicate with only a trailing wildcard can usually use an index seek:

SELECT 
    p.Name 
FROM Production.Product AS p
WHERE 
    p.Name LIKE N'D%';

Index Seek on LIKE

As the execution plan shows, SQL Server determines a covering range (which depends on the collation), seeks the string index using the range as the start and end points of a partial scan, and applies the original LIKE condition as a residual predicate to just the rows that match the initial seek operation.  Specifically, the Storage Engine seeks the index to locate rows in the covering range, and the Query Processor applies the residual predicate (the LIKE) to the rows it receives.

Dynamic Seeks

But what if the LIKE search term is in a variable?

DECLARE @Like NVARCHAR(50) = N'D%'
 
SELECT 
    p.Name 
FROM Production.Product AS p
WHERE 
    p.Name LIKE @Like;

SQL Server can still perform a seek here, but it needs to determine the covering seek range for the search term at execution time, not at compilation time:

Dynamic Index Seek

The plan now contains an extra Constant Scan,  a Compute Scalar and a Nested Loops Join.  These operators are interesting because they have zero cost estimates: no CPU, no I/O, nothing.  That’s because they are purely architectural: a workaround for the fact that SQL Server cannot currently perform a dynamic seek within the Index Seek operator itself.  To avoid affecting plan choices, this extra machinery is costed at zero.

The Constant Scan produces a single in-memory row with no columns.  The Compute Scalar defines expressions to describe the covering seek range (using the runtime value of the @Like variable).  Finally, the Nested Loops Join drives the seek using the computed range information as correlated values.

The upper tooltip shows that the Compute Scalar uses three internal functions, LikeRangeStart, LikeRangeEnd, and LikeRangeInfo.  The first two functions describe the range as an open interval.  The third function returns a set of flags encoded in an integer, that are used internally to define certain seek properties for the Storage Engine.  The lower tooltip shows the seek on the open interval described by the result of LikeRangeStart and LikeRangeEnd, and the application of the residual predicate ‘LIKE @Like’.

More Dynamic Seeks

Something very similar occurs in plans that use IN or OR with variables:

DECLARE 
    @1 INTEGER = 320,
    @2 INTEGER = 325,
    @3 INTEGER = 330
 
SELECT
    p.Name
FROM Production.Product AS p
WHERE 
    p.ProductID IN (@1,@2,@3);

Dynamic Seek With IN and OR

Now we have three ranges: one for each of the variables in the original query.  The Compute Scalar operators again define three columns containing the start and end of the range, and the associated informational flags (previously seen as a result of the LikeRangeInfo function).  This time, we see the decimal representation of these flags, which happens to be 62 for an equality comparison.  The IN expands to (ProductID = @1 OR ProductID = @2 OR ProductID = @3), so each of the ‘ranges’ here is in fact a single value, so the start and end range values are the same in each Compute Scalar.

The three dynamic ranges are concatenated, sorted (so any overlapping ranges appear next to each other in the stream) and the Merge Interval collapses these intervals into one or more disjoint (non-overlapping) ranges.  This is important, because the three variables might, for example, all contain the same value, and it would be incorrect to return that value three times.  Anyway, for each disjoint range produced, the Nested Loops Join drives a new seek of the Clustered Index.  The overall effect is that an arbitrary number of possibly overlapping ranges are computed, merged, and then used to drive one or more seek operations. The final result of the query will be the combination of all the seek results, as you would expect.

Hidden Conversions

The following example contains a table with DATETIME2 values, and a query with a expression that at first sight seems unlikely to be able to seek on an index (the variable is typed as DATE, and there is a CONVERT function applied to the DATETIME2 column):

DECLARE @Example TABLE (date_time DATETIME2 PRIMARY KEY);
 
INSERT @Example (date_time) 
VALUES ('20110101 12:34:56');
 
DECLARE @date DATE = '2011-01-01';
 
SELECT * 
FROM @Example AS e 
WHERE 
    @date = CONVERT(DATE, e.date_time);

Nevertheless, a query plan that uses a seek can be produced:

GetRangeThroughConvert

In this case, neither SSMS or Plan Explorer will show the contents of the Compute Scalar (this is probably just an oversight, rather than deliberate concealment!).  We have to open the XML form of the execution plan to see the three familiar expressions, wrapped in a Value Vector (just a fancy container for multiple expressions).

Another internal function, GetRangeThroughConvert, is responsible for determining the the range of DATETIME2 values covered by the DATE variable @date, and the informational flags needed.  In the same way the engine works out covering ranges for some LIKE predicates, this function determines ranges where certain problematic type conversions are required.  Otherwise, the machinery is the same: a range description is defined by the Compute Scalar, and the Nested Loops Join driving a seek using those values.

More Hidden Conversions

There is another related internal function used when the Query Processor needs to determine a range for a comparison between different data types.  This example returns rows based on a greater-than-or-equal comparison between DATE column values and the DATETIME return value of the GETDATE() intrinsic function:

DECLARE @Example TABLE (col1 date PRIMARY KEY)
SELECT * FROM @Example AS e 
WHERE e.col1 >= DATEADD(DAY, -7, GETDATE());

GetRangeWithMismatchedTypes

Again, the SSMS graphical plan and Plan Explorer cannot display the contents of the Value Vector, so we have to dig into the XML again.  The function evaluates the DATEADD(GETDATE()) expression, computes the open-interval start point of a DATE range accounting for the conversion from DATETIME to DATE, and specifies NULL as the end of the range (since this is a >= comparison, there is no end value).  The flags value in this case is 22 (the flags for a >= seek operation).

Everything All At Once

This last example features all sorts type sloppiness, resulting in an execution plan that uses GetRangeThroughConvert on the string expression and GetRangeThroughConvert on the result of GetRangeWithMismatchedTypes applied to the result of the GETDATE function.  The whole thing is then wrapped in a dynamic seek with the Merge Interval enforcing the (annoying) BETWEEN requirement that the first parameter must be less than or equal to the second.  See if you can work out all the conversions necessary for this query, using the rules of data type precedence.  It is really quite impressive that this example of lazy T-SQL coding results in an index seek, don’t you think?

DECLARE @Example TABLE (col1 DATETIME PRIMARY KEY)
SELECT * FROM @Example AS e
WHERE CONVERT(DATE, e.col1) BETWEEN '20000101' AND GETDATE();

Merge Interval With Everything

Conclusion

SQL Server works quite hard sometimes to produce index seeks where they might seem unlikely.  This is a good thing, and it would be great to see this capability extended further in future.  The downside is that this extra effort means you are less likely to see an Index Scan when you have done something daft with data types.

Why is this a bad thing if you get a seek anyway?  The problem is that these hidden implicit conversions can result in inaccurate cardinality and distribution estimations at any stage of the plan.  So, even if you get a seek, the plan might be way off overall.  If that isn’t persuasive enough, consider this: will having hidden nested range calculations improve your chances of getting a good query plan?  Probably not, no.  Be very aware of types, and in particular of the types returned by functions and expressions.  If in doubt, use SELECT INTO to materialize the results of an expression or query, and check the types of the columns produced.


Note: if you have any scripts that trawl the plan cache looking for implicit conversions (CONVERT_IMPLICIT), you might want to look into updating them to check for these conversions too.  Not all conversions are bad ones, of course :)

© 2012 Paul White
Twitter: @SQL_Kiwi
Email: SQLkiwi@gmail.com

Further Reading:

Implicit Conversions – Craig Freedman
More on Implicit Conversions – Craig Freedman
Join Performance, Implicit Conversions, and Residuals - Me

Published Wednesday, January 18, 2012 2:37 AM by Paul White

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Gianluca Sartori said:

I guess I know where this one comes from :-)

Great job, Paul, as usual!

January 17, 2012 8:56 AM
 

Paul White said:

Yes that twitter #sqlhelp question was just the prompt I needed to finally write this stuff down.  Thanks, Gianluca.

January 17, 2012 9:16 AM
 

tobi said:

These optimizer rules are life savers when you want to go through an ORM.

January 17, 2012 11:54 AM
 

Paul White said:

Yes I can believe it tobi, being stuck with index scans in cases where there is an obvious relationship between mismatched types (e.g. DATE and DATETIME) would be unfortunate, to say the least.

January 17, 2012 12:34 PM
 

tobi said:

My belief is that one of greatest strength of the optimizer is to take away very laborious and stupid query optimization work in the small. In the large the programmer is still responsible. But in the small, I just want these little obvious problems to go away. The optimizer does a great job with that.

January 17, 2012 3:02 PM
 

Edafe Onerhime said:

Thanks for a great article. I'm trying to wrap my head around why this produces an Index Seek on SQL 2005 and is on par with the trailing wildcard:

DECLARE @Like dbo.Name;

SET @Like = N'%Tube'

SELECT p.Name

FROM Production.Product AS p

WHERE  p.Name LIKE @Like;

January 18, 2012 6:10 AM
 

Paul White said:

Hello Edafe,

The optimizer produces a 'dynamic seek' plan that is valid for any possible @Like string value (the plan might be cached and reused in future).  In your case, the leading wildcard means the LikeRangeStart and LikeRangeEnd values computed at runtime define the *whole range* of possible values, so this seek is logically equivalent to a table scan.  (Sadly there is no easy way to see the computed range values.)

On SQL Server 2008 and later, we can add OPTION (RECOMPILE) to construct a one-time plan for the specific value of @Like.  This plan is not cached for reuse, so SQL Server can base it on the specific value of @Like at the time.  Doing this produces the expected table scan with a residual predicate of LIKE @Like, as you would expect.

Hope that helps, thanks for a great question.

Paul

January 18, 2012 6:32 AM
 

Edafe Onerhime said:

Ahh, thanks for the reply. I wondered why it shows up as performing better than the same query without the variable. I definitely need to delve into this more!

Cheers

January 18, 2012 8:07 AM
 

DevJef said:

Thank you for you help Paul. You've done a great job helping me find the issue, and pointing me in the right direction!

Your blog post was definitely an interesting read!

January 19, 2012 9:27 AM
 

Paul White said:

Thanks 'DevJef' for the #sqlhelp question that finally motivated me :)

January 19, 2012 9:38 AM
 

@DataSic said:

Great article. My plan cache crawler already had GetRangeThroughConvert but GetRangeWithMismatchedTypes was missing. Just curiosity - is it possible to compare convert cost for different data types pairs? What's the worst combo?

January 24, 2012 7:14 PM
 

Paul White said:

Hi @DataSic,

It seems reasonable to expect that some conversions would be more costly than others, though it's not something I have ever explicitly tested.  If I had to guess, I'd say conversions to or from LOB (including XML) or CLR types would be the worst.

Paul

January 24, 2012 7:27 PM
 

Martin said:

Hello,

Any input as to why we get this plan shape with the compute scalars and merge interval when an `IS NULL` is involved as opposed to just getting a simpler looking plan with multiple seek predicates?

(As per question here http://dba.stackexchange.com/q/14789/3690)

March 12, 2012 8:18 AM
 

Paul White said:

Hi Martin,

The simple answer is that the storage engine can't currently perform multiple different types of seeks at once.  WHERE column IN (1, 5) can be implemented as two seeks of the same type (ScaOp_Comp x_cmpEq), same goes for the semantically identical WHERE column = 1 OR column = 5.

We can also seek for NULL (ScaOp_Comp x_cmpIs) or NOT NULL (ScaOp_Comp x_cmpIsNot) but combining Is and Eq variants is not supported in the same operation (similar to the case where an index cannot be backward scanned in parallel, but it can forward).  There's no great reason why not, as far as I know, it just hasn't been done yet.

So, the merge interval plan is the lowest-cost alternative available to the optimizer.  Writing the query slightly differently (e.g. with UNION ALL) can produce other viable alternatives.

There are other issues with NULL seeking (e.g. with filtered indexes) so my guess is that product support will continue to deepen in this broad area over time.

Paul

March 13, 2012 10:39 AM
 

Martin said:

Thanks for that explanation.

So that explains why the equality part of `c2 > 1048576  or c2 = 0` appears in the plan as `c2 >= 0 AND c2 <= 0` as well then.

Never noticed that before!

March 13, 2012 10:58 AM
 

Paul White: Page Free Space said:

Ask anyone what the primary advantage of temporary tables over table variables is, and the chances are

August 14, 2012 11:22 AM
 

Rob Farley said:

CONVERT_IMPLICIT isn’t the only problem with getting data types wrong. You might have the right type,

September 22, 2013 9:47 AM
 

Martin Smith said:

One other aspect to this that I just noticed is that the "GetRangeThroughConvert" doesn't necessarily get the range that you might expect.

The CAST as date version uses > < and a residual predicate instead of >= and < so ends up reading more rows.

October 12, 2013 8:38 AM

Leave a Comment

(required) 
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement