I got a question from a reader of my post How the Query Optimizer Uses Statistics (http://sqlblog.com/blogs/ben_nevarez/archive/2009/09/03/how-the-query-optimizer-uses-statistics.aspx) and I thought that it would be a good idea to post my answer here. Basically the request was to expand the previous example for a predicate with two columns.

First, a reminder that the histogram only shows the values of the first column of the statistics object. So, how the Query Optimizer does this?

One way to find the answer to this is, well, just to run an example and see what the Query Optimizer does. Let us run a query, see which statistics are automatically created and inspect those statistics.

This is the code sent by the reader. Run this to create a table and populate it with some data.

create table MyStatsTest (

id int identity(1, 1),

ProductGroupID int,

ProductID int

)

declare @i int

set nocount on

set @i = 1

while @i < 100000

begin

insert MyStatsTest (ProductGroupID, ProductID)

select datepart(millisecond, getdate()) % 2, datepart(millisecond, getdate())

set @i = @i + 1

end

Make sure you start with no statistics objects in the table. You can run this to verify that there are no statistics

select * from sys.stats

where object_id = object_id('MyStatsTest')

If there are some statistics, perhaps after running a query, you can drop them using a command like this (the name of your statistics objects may be different)

drop statistics MyStatsTest._WA_Sys_00000002_7D78A4E7

Run the first query to see the statistics created automatically by the Query Optimizer.

select * from MyStatsTest

where ProductId < 17 and ProductGroupId = 1

Now you can run the previous query again and notice that two statistics objects were created. Use DBCC SHOW_STATISTICS to display both histograms like in the next example

dbcc show_statistics('MyStatsTest', _WA_Sys_00000003_014935CB)

You can use the first histogram to estimate the number of records for the ProductId < 17 predicate using the method I described in my previous post. I got 1,744 rows which is the sum of the values 392, 302, 313, 419 and 318 (Your table has different data so you will get a different value).

Using the second histogram you can notice that 60.0316003% of the records have ProductGroupid = 1 (This is 60,031 divided by the total number of records 99,999).

Since this is using the AND operator you need to obtain the 60.0316003% of 1,744. This is 1,046.95 which is the estimated number of rows shown on the execution plan of the query.

You can do the same calculation for ProductGroupId = 0 and you will get an estimated number of rows of 697.049 which are also shown on the execution plan. Notice that you may need to run DBCC FREEPROCCACHE between tests to clear the plan cache, otherwise it may show a cached execution plan.

Finally, let us change the query to use OR operator (instead of AND)

select * from MyStatsTest

where ProductId < 17 or ProductGroupId = 1

The plan shows 60,728 estimated rows. Since this is using an OR operator this value could be obtained as the union of the following two queries

select * from MyStatsTest

where ProductId < 17

and

select * from MyStatsTest

where ProductId >= 17 and ProductGroupId = 1

The first query estimates 1,744 rows and the second one 58,984, for a total of 60,720 rows. You can use the previous method to estimate these values.

The same value could also be obtained as the union of these two queries

select * from MyStatsTest

where ProductId < 17 and ProductGroupId = 0

and

select * from MyStatsTest

where ProductGroupId = 1

The first one shows 697.049 estimated rows and the second one 60,031, for a total of 60,728.049 estimated rows.