THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Rob Farley

- Owner/Principal with LobsterPot Solutions (a MS Gold Partner consulting firm), Microsoft Certified Master, Microsoft MVP (SQL Server), APS/PDW trainer and leader of the SQL User Group in Adelaide, Australia. Rob is a former director of PASS, and provides consulting and training courses around the world in SQL Server and BI topics.

The SQL Query Optimizer – when Logical Order can get it wrong

Hi! - Great that you've found this page, but it's no longer here! You can find the content over at: http://blogs.lobsterpot.com.au/2012/12/30/the-sql-query-optimizer-when-logical-order-can-get-it-wrong/

Published Sunday, December 30, 2012 4:32 PM by Rob Farley

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Martin Smith said:

In the seek version performance is much more predictable. It always has to perform the same work.

With the scan version the worst case scenario can be much more expensive than estimated even with perfect statistics due to the modelling assumption that the matching rows will be evenly distributed with respect to the other column. If in fact there is some correlation such that the matching rows for one column value all happen to be at one end of the range of values for the other column then a full scan is needed which can be much worse.

Example of that here

http://stackoverflow.com/questions/7481818/sql-why-is-select-count-mincol-maxcol-faster-then-select-mincol-max/7482342

December 30, 2012 5:28 AM
 

Rob Farley said:

Hi Martin,

There are also plenty of times when predicates with low selectivity cause a large number of rows to be returned by a Seek, so I would suggest your argument could work on both sides.

I would suggest it's an example which suits making index choices based on business knowledge of the data model, if you are worried about the statistics being insufficient for good choices to be made.

Rob

December 30, 2012 5:41 AM
 

Martin Smith said:

Hi Rob,

I'm not saying that the scan plan is definitely worse (at least assuming perfect statistics) just that it can have more variable performance when the rows are not in fact exactly evenly distributed.

Say there are 1,000,000 rows. 1,000 match the seek predicate. Under the even distribution assumption SQL Server will assume that 1,000 (1 million / 1 thousand) rows need to be scanned.

For the seek plan the best, worst, and estimated case are all 1,000 rows

For the scan plan the best, worst, and estimated case are (1, 999,000, 1,000) and if the statistics are not perfect and in fact no rows match at all then the real worst case would be 1 million rows.

If the predicate is made less selective so 10,000 rows now match

For the seek plan the best, worst, and estimated case are all 10,000 rows

For the scan plan the best, worst, and estimated case are (1, 990,000, 100)

December 30, 2012 8:53 AM
 

Martin Smith said:

Just pointing it out as a potential issue to be considered.

In the real world there may be all sorts of correlations that don't necessarily occur to one when writing queries.

SELECT MIN(OrderDate), MAX(OrderDate)

FROM Orders

WHERE ProductId = @ProductId

It is highly unlikely that productIds will be evenly distributed throughout the Orders table as new products get launched and old ones get discontinued for example.

December 30, 2012 9:46 AM
 

Rob Farley said:

Your points are all valid, and again I'll say that designing indexes around business knowledge is important. Many business scenarios would consistently be much closer to the best case, and it would be foolish to settle for the "least bad worst case" alternative.

Obviously the queries can be forced into either plan, once the developers have considered their options.

And your query example doesn't work. If you need both MIN and MAX you'd need to approach from both ends, and your equality predicate causes a simple composite index most effective.

December 30, 2012 3:45 PM
 

Martin Smith said:

Actually the link in my first post does show sometimes SQL Server will generate such a plan calculating the MIN and MAX separately (in that case it also has added lookups which make things worse.)

December 30, 2012 4:37 PM
 

Ian Yates said:

Great post! I really liked the clear explanations 😊 I think I shall use them in real life when trying to explain some of this behaviour to others (or just point them to this blog)

December 30, 2012 7:09 PM
 

Rob Farley said:

Martin: Yes - if it thinks the range is wide enough and low enough selectivity it can decide to go from both sides. But in that scenario you can solve it easily with a composite index, because the predicate is an equality.

December 30, 2012 7:14 PM
 

Rob Farley said:

Ian: Thanks

December 30, 2012 7:16 PM
 

Simon Sabin said:

Having so few rows in the table make this a little contrite. The indexes only have 4 leaf pages to read. All data is found on one page and so its a question in both of scanning a single page for a value.

It would be helpful to show this with more data where the impact is significant.

January 1, 2013 7:51 AM
 

Rob Farley said:

Ok Simon... how about in AdventureWorksDW, which has 60K records in dbo.FactInternetSales? I get it's still not huge, but it shows that it's easy to have a bad plan come out.

CREATE INDEX ix1 ON dbo.FactInternetSales(OrderDateKey) INCLUDE (UnitPrice);

CREATE INDEX ix2 ON dbo.FactInternetSales(UnitPrice) INCLUDE (OrderDateKey);

SELECT MIN(OrderDateKey)
FROM dbo.FactInternetSales
WHERE UnitPrice between 0 and 100;
--Prefers ix1. 20 reads

SELECT MIN(OrderDateKey)
FROM dbo.FactInternetSales
WHERE UnitPrice between 600 and 700;
--Prefers ix2. 2 reads.

SELECT MIN(OrderDateKey)
FROM dbo.FactInternetSales WITH (INDEX(ix2))
WHERE UnitPrice between 0 and 100;
--Forced ix2. 150 reads.

Of course, with a correlation between the two fields, it could be possible to show an example of ix1 being particularly nasty as well.

January 1, 2013 9:09 PM
 

Chris Adkin said:

Rob,

You may be alluding to what the optimizer team refer to as the ascending key problem, whereby data gets added to a table in ascending key order, for large tables ( opposite of the 'Small' table you mentioned - I know ), this can be mitigated against via trace flag 2371. You may also want to look at connect item 676224, another popular database engine includes the ability for hints to be used on statements that specify:

1. That data in the relevant table(s) should be sampled in order to produce better plans.

2. How aggresively the sampling should be

Refer to connect item 676224

January 3, 2013 8:48 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Tags

No tags have been created or used yet.

News

News? Haven't you read my blog?

My Company


Can't find something?

Contact Me

IM: rob_farley@hotmail.com
Twitter: @rob_farley
Skype: rob_farley
E: rob_farley@hotmail.com

MVP (SQL Server)




Certifications








Adelaide SQL UG

Privacy Statement