THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Benjamin Nevarez

How the Query Optimizer Uses Statistics – Part II

I got a question from a reader of my post How the Query Optimizer Uses Statistics (http://sqlblog.com/blogs/ben_nevarez/archive/2009/09/03/how-the-query-optimizer-uses-statistics.aspx) and I thought that it would be a good idea to post my answer here. Basically the request was to expand the previous example for a predicate with two columns.

 

First, a reminder that the histogram only shows the values of the first column of the statistics object. So, how the Query Optimizer does this?

 

One way to find the answer to this is, well, just to run an example and see what the Query Optimizer does. Let us run a query, see which statistics are automatically created and inspect those statistics.

 

This is the code sent by the reader. Run this to create a table and populate it with some data.

 

create table MyStatsTest (

      id int identity(1, 1),

      ProductGroupID int,

      ProductID int

)

 

declare @i int

set nocount on

set @i = 1

while @i < 100000

begin

insert MyStatsTest (ProductGroupID, ProductID)

select datepart(millisecond, getdate()) % 2, datepart(millisecond, getdate())

      set @i = @i + 1

end

 

Make sure you start with no statistics objects in the table. You can run this to verify that there are no statistics

 

select * from sys.stats

where object_id = object_id('MyStatsTest')

 

If there are some statistics, perhaps after running a query, you can drop them using a command like this (the name of your statistics objects may be different)

 

drop statistics MyStatsTest._WA_Sys_00000002_7D78A4E7

 

Run the first query to see the statistics created automatically by the Query Optimizer.

 

select * from MyStatsTest

where ProductId < 17 and ProductGroupId = 1

 

Now you can run the previous query again and notice that two statistics objects were created. Use DBCC SHOW_STATISTICS to display both histograms like in the next example

 

dbcc show_statistics('MyStatsTest', _WA_Sys_00000003_014935CB)

 

You can use the first histogram to estimate the number of records for the ProductId < 17 predicate using the method I described in my previous post. I got 1,744 rows which is the sum of the values 392, 302, 313, 419 and 318 (Your table has different data so you will get a different value).

 

clip_image002

 

Using the second histogram you can notice that 60.0316003% of the records have ProductGroupid = 1 (This is 60,031 divided by the total number of records 99,999).

 

 

clip_image004

 

 

Since this is using the AND operator you need to obtain the 60.0316003% of 1,744. This is 1,046.95 which is the estimated number of rows shown on the execution plan of the query.

 

 

clip_image006

You can do the same calculation for ProductGroupId = 0 and you will get an estimated number of rows of 697.049 which are also shown on the execution plan. Notice that you may need to run DBCC FREEPROCCACHE between tests to clear the plan cache, otherwise it may show a cached execution plan.

 

Finally, let us change the query to use OR operator (instead of AND)

 

select * from MyStatsTest

where ProductId < 17 or ProductGroupId = 1

 

The plan shows 60,728 estimated rows. Since this is using an OR operator this value could be obtained as the union of the following two queries

 

select * from MyStatsTest

where ProductId < 17   

 

and

 

select * from MyStatsTest

where ProductId >= 17 and ProductGroupId = 1

 

The first query estimates 1,744 rows and the second one 58,984, for a total of 60,720 rows. You can use the previous method to estimate these values.

 

The same value could also be obtained as the union of these two queries

 

select * from MyStatsTest

where ProductId < 17 and ProductGroupId = 0

 

and

 

select * from MyStatsTest

where ProductGroupId = 1

 

The first one shows 697.049 estimated rows and the second one 60,031, for a total of 60,728.049 estimated rows.

Published Friday, September 11, 2009 11:32 PM by Ben Nevarez
New Comments to this post are disabled

About Ben Nevarez

Benjamin Nevarez is a SQL Server MVP and independent consultant based in Los Angeles, California who specializes in SQL Server query tuning and optimization. He is the author of "SQL Server 2014 Query Tuning & Optimization" and "Inside the SQL Server Query Optimizer" and co-author of "SQL Server 2012 Internals". With more than 20 years of experience in relational databases, Benjamin has also been a speaker at many SQL Server conferences, including the PASS Summit, SQL Server Connections and SQLBits. Benjamin’s blog can be found at http://www.benjaminnevarez.com and he can also be reached by e-mail at admin at benjaminnevarez dot com and on twitter at @BenjaminNevarez.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement