THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Paul White: Page Free Space

A technical SQL Server blog from New Zealand. See also my articles on SQLperformance.com

Inside the Optimizer: Constructing a Plan - Part 1

For today’s entry, I thought we might take a look at how the optimiser builds an executable plan using rules.  To illustrate the process performed by the optimiser, we’ll configure it to produce incrementally better plans by progressively applying the necessary rules.

Here’s a simple query (using the AdventureWorks sample database) that shows the total number of items in the warehouse for each of a small number of products:

SELECT
    P.ProductNumber,    
    P.ProductID,
    total_qty = SUM(I.Quantity)
FROM Production.Product P 
INNER JOIN Production.ProductInventory AS I ON
    I.ProductID = P.ProductID
WHERE
    P.ProductNumber LIKE N'T%'
GROUP BY
    P.ProductID,
    P.ProductNumber;

Heaps of work is done by SQL Server to parse and bind the query before it hits the query optimiser, but when it does, it arrives as a tree of logical relational operators, like this:

LogicalTree

The optimiser needs to translate this into plan that can be executed by the Query Processor.  If the query optimiser did nothing more than translate the logical relational operators to the first valid form it found, we would get a plan like this:

Literal-plan

This executable plan features two full table scans, a Cartesian product, a Filter on the two predicates, and an Aggregate.  This is a long way from being an optimal query plan, but it does produce correct results.  (In case you are wondering, the Compute Scalar is there to ensure that the SUM aggregate returns NULL instead of zero when no rows are processed).

Matching and Applying Rules

The optimiser found this plan by replacing logical operations with one of the physical alternatives it knows about.  This type of transformation is performed by an implementation rule within the optimiser.  For example, the logical operation “Inner Join” was physically implemented by Nested Loops (other implementation rules exist for Merge and Hash join).

To get from the logical tree to the executable plan shown, a total of five implementation rules were successfully matched and run by the optimiser:

  1. GET to Scan
  2. JOIN to Nested Loops
  3. SELECT to Filter
  4. Group By Aggregate to Stream Aggregate
  5. Group By Aggregate to Hash Aggregate

The first rule replaces the logical GETs with table scans.  The second rule implements the logical JOIN using Nested Loops as mentioned before.  The third converts all the predicates (including the join predicate) into a Filter operator.  The fourth and fifth rules represent two alternative strategies to physically perform the aggregation.  In this case, the optimiser chose a Stream Aggregate over a Hash Aggregate, based on costing.

Ok, but no-one would buy SQL Server if it produced that sort of plan on a regular basis.  Luckily, there are many other rules that the optimiser can use; in fact there are nearly four hundred in total.  Rest assured that the query optimiser fought hard to resist producing such a dismal plan: more than one dozen separate rules had to be turned off to do so.

As well as more implementation rules, there are a number of exploration and substitution rules.  These transform some part of the logical request into an equivalent form, based on mathematical equivalences or heuristics.  A simple example of an exploration rule is 'Join Commute'.  This rule exploits the fact that A JOIN B is the same as B JOIN A (for inner joins).

The optimiser also has simplification rules and enforcer rules.  Application of all these rules generates alternative strategies, the best of which will be incorporated in the final plan.  As an aside, a single simplification rule was responsible for the transformation shown in my Segment Top post.  That amazing transformation goes by the understated name "Generate Group By Apply".

Improving the Plan

An obvious deficiency in the executable plan above is that it performs a Cartesian product of the two source tables followed by a Filter.  It would be much more efficient to evaluate the join predicate at the same time as performing the physical join.  The rule that performs this task is called SELonJN (SELECT on JOIN).  I should stress that this is not the T-SQL SELECT statement, it is the relational SELECT operation: a filter applied to rows.  By enabling the SELonJN optimisation rule, we get a better plan:

SELonJN_on

The Cartesian product has been replaced by a more normal-looking Nested Loops operator, which is a result of the join predicate being moved from the Filter into the join.  In fact, the Filter operator has disappeared completely - where has the predicate “ProductNumber LIKE 'T%'” gone?  It has also been pushed down the plan - all the way to the Product table scan.

That’s still not a great plan: we are scanning the whole Inventory table once for every row from the Product table that matches on the ProductNumber predicate.  We need rules that know about non-clustered indexes - the basic rule we have relied on so far simply translates a GET to a table scan.  Enabling a couple more rules produces:

NCImpl_1

That's a little better, we are now using the correct non-clustered indexes but the predicates are being applied to a scan - not the index seek we might have hoped for.  There are quite a few rules left to apply before we get a really efficient plan.  I'll cover those next time.


This post is part of a series: Part1 Part 2 Part3 Part4

© Paul White
email: SQLkiwi@gmail.com
twitter: @SQL_Kiwi

Published Thursday, July 29, 2010 7:17 AM by Paul White

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Alexander Kuznetsov said:

Welcome aboard, Paul!

This is a very interesting series.

Would you mind sharing with us what exactly did you mean by "enabling the SELonJN optimisation rule" and by "Enabling a couple more rules"?

July 28, 2010 4:04 PM
 

Paul White said:

Hey, Alex!

Thanks for the welcome - I'm very excited to be on here, as I've been a big fan of blogs like yours and Adam's for many years.

The current series of posts is building up to answering the exact question you raise.  I promise that all will be revealed very soon :)

Thanks so much for the feedback.

Paul

July 28, 2010 9:22 PM
 

Uri Dimant said:

Paul

Running  your query on AW of SQL Server 2005 I got Index Seek on P.ProductNumber LIKE N'T%' and Clustered Index Seek on ProductInventory..

July 29, 2010 1:34 AM
 

Paul White said:

Hey Uri,

Yes you would do - that's the final, fully-optimised, form of the query plan.  What I'm trying to show in these series of posts, is how the optimiser incrementally finds improved plans by applying its internal rules.

I've tweaked the optimiser to restrict the rules available to it at each stage, to better illustrate the iterative nature of the process.  Hopefully, things will become clearer as the series progresses ;c)

Paul

July 29, 2010 1:40 AM
 

Page Free Space: Paul White said:

Summary: Investigating an optimiser transformation that exposes a bug in SQL Server’s MERGE implementation.

August 3, 2010 2:06 PM
 

Jay said:

hello:

Nice article. i ran un-documented DMV & found 377 implementation rules in SQL Sever 2008 Developer Edition SP-1.

I am curious to know if there is a possible way to see the implementation rules for every execution plan when query or Store Procs is executed in SSMS with Actual Execution Plan

Thanks

Jay

August 3, 2010 9:15 PM
 

Page Free Space: Paul White said:

Iterators SQL Server uses an extensible architecture for query optimisation and execution, using ‘iterators’

August 4, 2010 2:10 PM
 

Page Free Space: Paul White : Inside the Optimiser: Constructing a Plan ??? Part 4 said:

August 10, 2010 6:23 PM
 

Page Free Space: Paul White : Inside the Optimiser: Constructing a Plan ??? Part 3 said:

August 10, 2010 6:24 PM
 

Page Free Space: Paul White : Inside the Optimiser: Constructing a Plan - Part 2 said:

August 10, 2010 6:25 PM
 

Jeff Moden said:

Awesome series, Paul!  Thanks for taking the time to post it.

--Jeff Moden

August 23, 2010 8:35 PM
 

Paul White said:

Thank you Jeff - means a lot coming from you.

August 23, 2010 8:50 PM
 

ColdCoffee said:

Paul, awesome article.. still dint touch other parts, but will definitely spend the night with them.. thanks for sharing...

August 26, 2010 10:41 AM
 

Paul White said:

Hey Cold Coffee (from SSC I presume!)

Thanks - and do try to make it to the end, because most of the stuff that makes sense is that way...

Paul

August 26, 2010 11:01 AM
 

Chris Leonard said:

This is really interesting.  May I (with attribution to you, of course) use some of this information in my PASS Summit presentation?  The DMV is part 4 could be useful for showing the impact of certain hints.

Also, is there a way to get SQL Server to show the logical operation tree it comes up with to begin with?  I've heard that older versions of SQL Server would sometimes rewrite queries (replacing left joins with right joins, for example) but I can't find any evidence of this kind of "query normalization" still happening in SQL 2005/2008.

Thanks again - great series!

September 14, 2010 12:26 AM
 

Paul White said:

Hi Chris,

Apologies for the slow response.  Yes of course that would be fine - in fact I'd be honoured, thank you.

There's no way to expose the initial algebrized tree I'm afraid.  The optimizer sure does consider many plan alternatives, and transforming left and right joins is just one of many tricks it can perform.

Paul

September 22, 2010 6:10 PM
 

Paul White: Page Free Space said:

If you’re not already a regular reader of Brad Schultz’s blog, you’re missing out on some great material. 

December 9, 2010 12:17 PM
 

Paul White: Page Free Space : Inside the Optimizer: Constructing a Plan - Part 1 said:

February 26, 2011 8:26 PM
 

Paul White: Page Free Space said:

As usual, here’s a sample table: With some sample data: And an index that will be useful shortly: There’s

July 1, 2011 10:51 AM
 

Adriano Nascimento said:

This is a very Good Series,

When will you write a book about "Inside the Optimizer"?

Thanks,

Adriano Nascimento

August 7, 2012 11:11 AM

Leave a Comment

(required) 
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement