THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Paul White: Page Free Space

A technical SQL Server blog from New Zealand. See also my articles on SQLperformance.com

Myth: SQL Server Caches a Serial Plan with every Parallel Plan

Many people believe that whenever SQL Server creates an execution plan that uses parallelism, an alternative serial plan is also cached.  The idea seems to be that the execution engine then decides between the parallel and serial alternatives at runtime.

I’ve seen this on forums, in blogs, and even in books.  In fairness, a lot of the official documentation is not as clear as it might be on the subject.  In this post I will show that only a single (parallel) plan is cached.  I will also show that SQL Server can execute a parallel plan on a single thread…

Before diving into the demonstration, I want to quickly run through some background information about the SQL Server plan cache.

Compiled Plans

Queries are expensive to compile and optimize, so SQL Server uses caching to improve efficiency through plan reuse.  The server optimizes an entire batch all at once, and the result is known as a compiled plan (or sometimes ‘query plan’).

The dynamic management view sys.dm_exec_cached_plans contains one row for each compiled plan, with a plan_handle that uniquely identifies the compiled plan among those currently in cache (plan handles can be reused over time).  This plan handle can be passed to the dynamic management function sys.dm_exec_query_plan to show the compiled plan in XML format.  When displayed in Management Studio, we can click on this XML representation to view the familiar graphical plan.

A compiled plan is a compile-time object – no user or runtime context is stored.  You might find it helpful to think of the compiled plan as a template – or perhaps as being similar to the estimated execution plans seen in Management Studio.

Execution Contexts

An execution context (or ‘executable plan’) - internally known as an MXC – is generated when a compiled plan is prepared for execution.  Execution contexts contain specific runtime information, for a single execution, for a single user.

thinking of the compiled plan as a template, an execution context is the concrete executable object derived by filling in run-time details like parameter values, local variables, details of objects created at runtime, and state information (for example, which statement within the batch is currently executing).  The following image, reproduced from the Plan Caching in SQL Server 2008 White Paper, shows the relationship:

Recompile_Figure_01.gif

If five users concurrently run the same batch, there might be a single compiled plan, but there will always be five execution contexts containing runtime information for each of the separate executions.  Execution contexts never exist without a parent compiled plan.

Execution contexts can be cached, but they are much cheaper to recreate than a compiled plan, which can only be regenerated by compiling a SQL query.  Execution contexts can be regenerated directly from the compiled plan.  Execution contexts that are not in use can be reinitialized with contextual information for a new user and runtime conditions.

You can find more information in the Books Online entry Execution Plan Caching and Reuse.  For those of you that want to explore the very deep internals of plan cache structures, there is an excellent blog entry by the SQL Server Programmability and API Development Team.

The Demonstration Script

As usual, I’m going to use a query against the AdventureWorks sample database, using SQL Server 2008 SP1 CU10 (build 10.0.2799).  The test rig I use in this post is best run on 2008 – it will execute on 2005, but there are important issues with that, which I will cover later.  It will not run correctly on Express Edition (any version).

DECLARE @PID INTEGER,
        @Qty INTEGER;
 
SELECT  @PID = P.ProductID,
        @Qty = SUM(TH.Quantity)
FROM    Production.Product P
JOIN    Production.TransactionHistory TH
        ON  TH.ProductID = P.ProductID
WHERE   P.ProductID % 27 = 0
GROUP   BY
        P.ProductID
ORDER   BY
        SUM(TH.Quantity) ASC;

The query itself isn’t important – it doesn’t even produce any output thanks to the variable assignments.  The important thing is that it produces an interesting parallel plan, if we lower the cost threshold for parallelism a bit from the default (click to expand the image):

image

This plan runs every iterator except the final sort on multiple threads, and includes a number of parallel-plan-only features.  There are three ‘exchanges’ (two Repartition Streams and one Gather Streams), and a bitmap.  See Parallel Query Processing and Bitmap Showplan Operator in Books Online for more information on exchanges and bitmaps.

Those of you that are very familiar with parallel plans might find it interesting that the Bitmap appears below a Merge Join – these are much more frequently seen in plans featuring a Hash Join (but I digress).

Environment

The first part of the test script sets the server max degree of parallelism option to zero, affinity mask to zero, and cost threshold for parallelism to 1.  These changes are simply to ensure that the optimizer produces a parallel plan for our test query.

-- =================
-- Environment Setup
-- =================
 
-- Connection settings
SET     ANSI_NULLS, ANSI_PADDING, ANSI_WARNINGS, ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER ON;
SET     NUMERIC_ROUNDABORT OFF;
GO
USE     AdventureWorks;
GO
-- Ensure advanced configuration options are available
EXECUTE sys.sp_configure
            @configname = 'show advanced options',
            @configvalue = 1;
 
-- Allow SQL Server to use all cores initially
EXECUTE sys.sp_configure
            @configname = 'affinity mask',
            @configvalue = 0;
 
-- Ensure server MAXDOP allows parallelism
EXECUTE sys.sp_configure
            @configname = 'max degree of parallelism',
            @configvalue = 0;
 
-- Reduce cost threshold for parallelism to make it easier
-- to produce a parallel plan
EXECUTE sys.sp_configure
            @configname = 'cost threshold for parallelism',
            @configvalue = 1;
 
-- Commit the configuration changes
RECONFIGURE;
GO

Procedures

The second part creates two stored procedures: one to run the test query, and one to show relevant information about the plan cache:

-- =================
-- Test objects
-- =================
IF      OBJECT_ID(N'dbo.TestQuery', N'P')
        IS NOT NULL
        DROP PROCEDURE dbo.TestQuery;
 
IF      OBJECT_ID(N'dbo.ShowPlanInfo', N'P')
        IS NOT NULL
        DROP PROCEDURE dbo.ShowPlanInfo;
GO
CREATE  PROCEDURE dbo.TestQuery
AS
BEGIN
        SET     NOCOUNT ON;
 
        DECLARE @PID INTEGER,
                @Qty INTEGER;
 
        SELECT  @PID = P.ProductID,
                @Qty = SUM(TH.Quantity)
        FROM    Production.Product P
        JOIN    Production.TransactionHistory TH
                ON  TH.ProductID = P.ProductID
        WHERE   P.ProductID % 27 = 0
        GROUP   BY
                P.ProductID
        ORDER   BY
                SUM(TH.Quantity) ASC;
END;
GO
CREATE  PROCEDURE dbo.ShowPlanInfo
AS
BEGIN
        SET     NOCOUNT ON;
        
        SELECT  cached_object_type = CP.cacheobjtype,
                plan_use_count = CP.usecounts,
                P.query_plan,
                source_text = TXT.[text],
                context_type = EC.cacheobjtype,
                context_use_count = EC.usecounts,
                ATTR.roundabort,
                ATTR.set_options,
                CP.plan_handle
        FROM    sys.dm_exec_cached_plans CP
        CROSS
        APPLY   sys.dm_exec_query_plan(CP.plan_handle) P
        CROSS
        APPLY   sys.dm_exec_cached_plan_dependent_objects(CP.plan_handle) EC
        CROSS
        APPLY   sys.dm_exec_sql_text(CP.plan_handle) TXT
        CROSS
        APPLY   (
                SELECT  roundabort = (CONVERT(INTEGER, PVT.set_options) & 8192) / 8192,
                        PVT.set_options
                FROM    (
                        SELECT  attribute,
                                value
                        FROM    sys.dm_exec_plan_attributes(CP.plan_handle)
                        ) ATTR
                PIVOT   (
                        MAX(value) 
                        FOR attribute IN (set_options)
                        ) PVT
                ) ATTR
        WHERE   CP.cacheobjtype = N'Compiled Plan'
        AND     CP.objtype = N'Proc'
        AND     TXT.[text] LIKE N'%dbo.TestQuery%'
        AND     TXT.[text] NOT LIKE N'%sys.dm_exec_cached_plans%';
END;
GO

The TestQuery stored procedure is a simple wrapper for the test SQL statement shown earlier.

The ShowPlanInfo stored procedure uses dynamic management views and functions to show the following information about plan cache objects:

  • cached_object_type – Always ‘Compiled Plan’ for our test procedure
  • plan_use_count – The number of times the compiled plan has been used (or reused)
  • query_plan – The XML representation of the plan.  Click on this in SSMS to show it in graphical form
  • source_text – The SQL batch that produced the plan (the CREATE PROCEDURE statement in our case)
  • context_address – The memory address of each execution context derived from the compiled plan
  • context_type – Always ‘Executable Plan’ (aka execution context)
  • context_use_count – The number of times the execution context has been reinitialized and reused
  • roundabort – The setting of the NUMERIC_ROUNDABORT connection used when creating the compiled plan
  • plan_handle – The compiled plan identifier (guaranteed to be unique per compiled plan in cache)

The Tests

The third part of the script runs three tests:

  1. The test query with affinity mask set to zero (parallel execution on all available logical CPUs)
  2. The test query with affinity mask set to one (execution on single logical CPU zero)
  3. The test query after changing a connection SET option (to generate a new compiled plan)
-- =================
-- Tests
-- =================
 
-- Reset
CHECKPOINT;
DBCC    DROPCLEANBUFFERS;
DBCC    FREEPROCCACHE;
GO
 
-- Test 1: Parallel plan running on all available cores
EXECUTE dbo.TestQuery;
EXECUTE dbo.ShowPlanInfo;
 
GO
-- Set affinity mask to limit SQL Server to one core
EXECUTE sys.sp_configure
            @configname = 'affinity mask',
            @configvalue = 1;
 
RECONFIGURE;
GO
-- Test 2: Reusing the cached parallel plan, but running on a single scheduler
EXECUTE dbo.TestQuery;
EXECUTE dbo.ShowPlanInfo;
GO
-- Test 3:
-- Changing a SET option
SET     NUMERIC_ROUNDABORT ON;
EXECUTE TestQuery;
EXECUTE dbo.ShowPlanInfo;
SET     NUMERIC_ROUNDABORT OFF;
GO

Test 1 – Normal Parallel Execution

Running Test 1 with the SSMS option ‘Include Actual Execution Plan’ on, produces the parallel plan shown previously:

image

Selecting the connector running from the Clustered Index Scan to the Hash Match (Partial Aggregate) and looking in the SSMS Properties window, we see that this part of the plan executed on eight threads:

image

The XML version of the graphical show plan contains elements that confirm the parallel execution:

image

image

For reasons that will become apparent shortly, I also want to highlight the information shown for the line connecting the Bitmap iterator to the Sort:

image

The output from the plan information procedure is (click to enlarge):

image

This shows a single cached compiled plan, with a single associated execution context.  Click on the XML query plan in SSMS to see the graphical representation of the compiled plan (notice the lack of any runtime information in a compiled plan).

Test 2 – Limited to a Single Logical CPU

Before running Test 2, the script sets the affinity mask to limit SQL Server to using a single execution unit.  Prior to SQL Server 2005, changing this option required a service restart, but the change is now dynamically applied in all editions except Express Edition. 

Importantly, unlike the max degree of parallelism and cost threshold for parallelism configuration options, changing the affinity mask does not clear the procedure cache.  This allows the plan cached by Test 1 to be potentially reused by Test 2.

Running Test 2 (again with actual execution plan on) we again get a parallel plan:

image

The first surprise, perhaps, is that we see a parallel plan at all – after all, we have limited SQL Server to a single CPU.  The little yellow circles are still there, as are the Exchange and Bitmap iterators.  Closer inspection reveals some important differences.  Here’s the information shown in the SSMS Properties panel for the line connecting the Clustered Index Scan to the Hash Match (Partial Aggregate):

image

Instead of the per-thread row counts we saw in Test 1, we see a single count for ‘All threads’.  This is exactly what we see when a serial plan is executed.

The XML show plan information contains:

image

image

The degree of parallelism is shown as zero, again exactly as for a serial plan.  The individual iterator information still shows Parallel=”true”, but the run time information shows a single thread of execution.  We can go further and monitor DMVs like sys.dm_os_tasks, but suffice it to say that the parallel plan shown does indeed execute on a single thread.

Another important difference can be seen by examining the tool-tips shown when hovering over the other connecting lines in the actual execution plan.  All the connectors originating from a parallelism-specific iterator show only estimated information.  As a reminder, the parallelism-specific iterators are the two Repartition Streams, the Gather Streams, and Bitmap iterators.  For comparison with Test 1, here is the tool-tip shown for the connector following the Bitmap, as before.  Notice that the Actual Number of Rows information is missing.

image

The output from the plan information procedure is:

image

There is still just a single cached compiled plan and a single execution context, but the usage counts for both have increased from 1 to 2.  This shows that SQL Server was able to reuse the parallel compiled plan, but execute it on a single thread.

Running a Parallel Compiled Plan Serially

SQL Server can derive a serial execution context from a parallel compiled plan at runtime.  It does this by removing the parallel-plan-specific operators.  This results in a serial execution context with the same plan shape as the parallel compiled plan.  This explains why the actual execution plan shown in SSMS lacks runtime information for the parallel operators – they were not present in the execution context.

SQL Server can do this because it is always possible to convert a parallel compiled plan to serial form – though the resulting serial plan may not be the optimal serial plan.

The reverse is not true, however: a serial compiled plan cannot have parallel operators dynamically inserted to produce a parallel execution context.

Test 3 – After Changing a SET Option

This test changes the connection SET option NUMERIC_ROUNDABORT to ON, before running the test procedure.  The idea here is that SQL Server will be unable to reuse the cached plan (see Query Tuning Recommendations in Books Online).  In fact, we receive a serial plan, with a very different shape from the parallel version seen previously:

image

The output from the plan information procedure confirms that a new serial compiled plan was cached, and a new execution context was created:

image

The point here is to show that the optimal serial plan is very different from the optimal parallel plan.  If SQL Server really had cached two optimized plans (one serial, one parallel) for our query, we would expect Test 2 to use the optimal serial plan, rather than a derivation of the parallel one.

Clean-Up

-- =================
-- Clean up
-- =================
EXECUTE sys.sp_configure
            @configname = 'affinity mask',
            @configvalue = 0;
 
EXECUTE sys.sp_configure
            @configname = 'cost threshold for parallelism',
            @configvalue = 5;
 
RECONFIGURE;
GO
DROP PROCEDURE  
        dbo.TestQuery,
        dbo.ShowPlanInfo;
GO

You may need to modify the values shown if you had previously changed your server settings from the defaults.

Issues with SQL Server 2005

The ability to dynamically change the affinity mask setting was introduced in SQL Server 2005.  As is sometimes the case with new features, engine support was not quite perfect first time around.  If you run the test script on SQL Server 2005 you will probably not see the illustrated results.

The problem is that the engine does not quite handle things correctly for the first execution of a cached compiled plan after the affinity mask option is changed.  For example, the execution of Test 2 (with affinity mask changed to 1) will result in a multiple threads running on a single scheduler (logical CPU).  The second, and subsequent, executions after changing the affinity mask will run correctly and as shown above (a single thread using a parallel plan).

So, the workaround is to disregard the results from the first execution of the TestQuery stored procedure after any change to the affinity mask setting (or more precisely the required call to RECONFIGURE).  All rather unfortunate, but there we are.

The second limitation with SQL Server 2005 comes with Test 3.  Although a new compiled plan is generated and cached, you will not see the serial plan shown.  The optimizer produces a parallel compiled plan, despite the setting of the affinity mask.  The parallel plan is nevertheless executed serially if affinity mask remains set to limit execution to a single logical CPU.  The behaviour of the optimizer changed between 2005 and 2008 in this respect – opinions may vary on which is the better approach.

Summary

  • The result of a single query compilation is a single compiled plan, either serial or parallel
  • The execution engine can dynamically derive a serial execution context from a parallel cached compiled plan

If anyone is able to show serial and parallel versions of a plan being cached from a single compilation, please let me know via email or in the comments section below.

Acknowledgements

My thanks go to Adam Machanic (blog | twitter) and Gail Shaw (blog | twitter) for their assistance.  Any errors that remain are entirely my own, naturally.

Paul White
email: SQLkiwi@gmail.com
twitter: @SQL_kiwi

Published Thursday, November 04, 2010 10:23 PM by Paul White

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Pradeep Singh said:

Hi Paul, This is really amazing stuff you have posted. Thanks for putting in so much effort.

November 4, 2010 6:27 AM
 

Pradeep Adiga said:

Thanks Paul for the very useful post!

November 4, 2010 6:35 AM
 

Paul White said:

Hi Pradeep (and Pradeep!)

Thank you both very much (especially Pradeep Singh for the 'retweet').

It was a lot of effort, but I enjoyed researching and writing this entry very much.  Glad you liked it.

Paul

November 4, 2010 7:19 AM
 

Matt Whitfield said:

Wow - that is an awesome post.

I am going to have to read that several times before it sinks in.

Top job sir.

November 4, 2010 10:53 AM
 

Robert L Davis said:

Excellent post, as always!! There was indeed a lot of confusion around this when Conor Cunningham visited our MCM rotation. According to Conor, SQL Server generates 2 plans (assuming it isn't a trivial plan), but only caches the plan that ends up being used based on whether or not the costing determines that parallelism can be used. And even if the parallel plan is generated, the engine isn't locked into performing it in parallel. Like you said, it may cache the parallel plan, and the engine may strip out the parallelism at run time and execute it serially.

November 4, 2010 10:56 AM
 

Paul White said:

Matt,

Thank you - it's always pleasing to find people I 'know' reading my blog.  Glad you enjoyed it first time through.

I agree there is a lot to think about.  I originally planned this as a relatively short entry showing a parallel plan executing on one thread, but it kinda ran away with me from there.

Thanks again for taking the time to leave a comment.  It's very encouraging.

Paul

November 4, 2010 11:11 AM
 

Paul White said:

Robert,

Many thanks to you also - kinda humbling getting such great feedback from a member of the exclusive MCM club!  The compilation/optimization process is a truly fascinating area, and there are no shortage of myths to be found there.  That gives me an idea... :c)

Cheers,

Paul

November 4, 2010 11:19 AM
 

Robert L Davis said:

No offense intended to my fellow MCMCs, but frankly, a lot of what you post is above the level of many MCMs.

November 4, 2010 11:29 AM
 

Alexander Kuznetsov said:

Hey Paul,

This is extremely interesting, keep it up! You should write a book.

November 4, 2010 11:38 AM
 

Paul White said:

Robert,

Then I shall try a second time to compliment you by calling you a famous author instead! :c)

Paul

November 4, 2010 12:20 PM
 

Sandeep Kalidindi said:

Paul - Thanks for this really informative and well explained post. good work!  

November 4, 2010 2:25 PM
 

ALZDBA said:

As usual ... Very, very .... nice, informing, illustrated and correct !

November 5, 2010 2:50 AM
 

Paul White said:

Alex,

Glad you enjoyed it.  A book you say?  What an interesting idea ;c)

Paul

November 5, 2010 11:48 AM
 

Paul White said:

Sandeep & Johan (ALZDBA):  Thanks guys!

November 5, 2010 11:52 AM
 

GrumpyOldDBA said:

I can't recollect this type of information ever being made available for SQL2008 or SQL 2005 and is what I might have expected to be found in various "internals" books. I agree that you should consider a book - I'd happily pay for this type of in depth stuff - it must take you some considerable time and I for one appreciate your time and effort.. Thanks.

November 8, 2010 9:43 AM
 

Paul White said:

Hi Colin (Grumpy Old DBA),

First - thank you *very* much for you comments.

Yes, as far as I am aware, the information in this post isn't available anywhere else.  By the way, I tried to leave an answer to your latest blog post (about memory) but keep getting an error page whenever I try to log in - same with Dave Ballentyne's blog.  I wonder if sqlblogcasts is blocking connections from New Zealand?

Anyway, I almost always recommend setting locked pages on x64, but it is important to set *max* (not min!) server memory in that case.  You'll find lots of good, and accurate, information here:

http://blogs.msdn.com/b/psssql/archive/2009/09/11/fun-with-locked-pages-awe-task-manager-and-the-working-set.aspx

http://sqlblog.com/blogs/jonathan_kehayias/archive/2009/08/24/troubleshooting-the-sql-server-memory-leak-or-understanding-sql-server-memory-usage.aspx

Paul

November 8, 2010 10:52 AM
 

csm said:

Fantastic job, Paul. All of your posts must be readed carefully and added to the favorites links (in fact, this is what I always do)

And if you write this book (please, e-book), here is another reader

November 17, 2010 6:53 AM

Leave a Comment

(required) 
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement