THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SELECT Hints, Tips, Tricks FROM Hugo Kornelis WHERE RDBMS = 'SQL Server'

Parameterization and filtered indexes (part 1)

Parameterization is a process where SQL Server slightly modifies a query in the parse phase: it replaces constant values by parameters. So if you submit

-- Query 1

SELECT COUNT(*)

FROM   Sales.SalesOrderDetail

WHERE  ProductID = 706;

the parser will replace this by (and fool the optimizer into thinking you submitted):

DECLARE @1 smallint = 706;

SELECT COUNT(*)

FROM   Sales.SalesOrderDetail

WHERE  ProductID = @1;

You can verify this very easily. If you go to the actual execution plan and hover over the left-most icon (the SELECT), a popup window will open that shows (a.o.) the query that was fed into the optimizer – but beware, this can be misleading, as will be demonstrated later in this post. I will show a more reliable method shortly. Don’t forget to disable the “Include Actual Execution Plan” option now, as it results in some extra work being done that will influence the output of the queries below.

 

The benefit of this is that a second execution of the same query with a different value will reuse the same plan. This saves compilation time, and less procedure cache space is consumed for these queries. However, you do run the risk of getting bad plans because of parameter sniffing with a bad value. Because that risk increases with query complexity, SQL Server’s default behavior is to only parameterize very simple queries (“simple parameterization”), but you can opt to use “forced parameterization” instead, by setting the database option PARAMETERIZATION FORCED.

 

Parameterization in action

 

Before adding filtered indexes to the mix, let’s start with some experiments. First, verify that you are on your development server and not accidentally logged in to a production box. Then run the query below. The first four lines disable the “Optimize for ad hoc workloads” options, a server-wide option that should probably be enabled on every production box, but is not as relevant for development – and would complicate this blog post. (The same principles apply, it is just a bit harder to demonstrate). The fifth line instantly clears out the execution cache, a very bad thing to do in production, but great for experiments such as this.

-- Query 2

EXEC sp_configure 'show advanced options', 1;

RECONFIGURE;

EXEC sp_configure 'optimize for ad hoc workloads', 0;

RECONFIGURE;

DBCC FREEPROCCACHE;

 

Let’s take a look at the contents of the plan cache – it should be empty, so ideally this query will return an empty result set.

-- Query 3

WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')

SELECT      cp.objtype AS ObjectType,

            cp.usecounts AS UseCount,

            st.[text] AS QueryText,

            cp.plan_handle AS PlanHandle,

            stmt.value('(@ParameterizedPlanHandle)',

                       'varchar(64)') AS ParameterizedPlanHandle,

            qp.query_plan AS [Execution Plan]

FROM        sys.dm_exec_cached_plans AS cp

CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp

CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) AS st

CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple')

                                             AS batch(stmt)

WHERE       qp.dbid = DB_ID()

AND         st.[text] NOT LIKE '%ExcludeMe%';

The filter on “ExcludeMe” is a trick to ensure that the entry for the query itself does not show up. Other unrelated queries can also pop up, even when you are on a development system that nobody else is using – internal management tasks and installed tools (such as intellisense or third party database management tools) may fire queries that will all show up. Most of these should be excluded because of the filter on DB_ID (but do note that as a consequence, you MUST run this query in the same database where you are running your experiments – which for this blog post will be AdventureWorks). If the remaining unrelated queries bother you, find a part of the query text that identifies those queries and that never occurs in your own queries and add an extra filter on st.[text] to exclude those queries as well.

 

Now rerun query 1 above, then inspect the plan cache again (using query 3), and you should see a result similar to this:

image

The “Adhoc” plan is not a real execution plan. It is a placeholder that links the original text of the query before parameterization to the actual plan, called a “Prepared” plan. Check the values in the PlanHandle and ParameterizedPlanHandle columns to see this. Or click the content of the Execution Plan column to see a graphical representation of each plan. If you submit Query 1 a few more times, the UseCount of the Adhoc plan goes up, because in case of an exact match between the text of a submitted query and the text of a query in the plan cache, the same plan will be reused. The UseCount of the Prepared plan does not go up, even though this is the plan that is actually executed – a weird way of counting, but documented (though not entirely correct).

 

To see parameterization in action, let’s count orders for a different product:

-- Query 4

SELECT COUNT(*)

FROM   Sales.SalesOrderDetail

WHERE  ProductID = 710;

Go back to query 3 and run it to see what this has done to the plan cache. Here are the results I got:

image

The raw text of the query is different, so a new “Adhoc” entry has been made. But after parameterizing the query text, SQL Server found that it already had a plan, so it could skip the compilation process and start executing right away. This saves lots of CPU cycles for the (relatively expensive) compilation process. It also reduces plan cache bloat, because the fake plan for the unparameterized query takes up a lot less space than a full execution plan (add the column cp.size_in_bytes to the SELECT list in query 3 if you want to see how much it saves). As you can see, the UseCount of the parameterized plan does go up this time, which conflicts with the remark in Books Online that it is “not incremented when parameterized queries find a plan in the cache”. I guess it actually represents the number of distinct unparameterized plans that have been linked to this parameterized plan.

 

Assuming you use the default of simple parameterization, a more complicated query should not use this mechanism. We can verify this using the same method. After clearing the plan cache (using the last line of query 2 above), I added some needless complexity to the previous query:

-- Query 5

SELECT     COUNT(*)

FROM       Sales.SalesOrderDetail AS sd

INNER JOIN Sales.SalesOrderHeader AS sh

      ON   sh.SalesOrderID = sd.SalesOrderID

WHERE      sd.ProductID = 710;

The join to SalesOrderHeader does nothing, and will even be eliminated by the query optimizer; the actual execution plan is still the same as before. But if you now run query 3 to inspect the plan cache contents, you will get just a single row in the result:

image

In this case, there is just an “Adhoc” plan that contains the original query text, including the hard-coded product ID value. It does not reference a different, parameterized plan; and when you click the Execution Plan column, you will indeed see a real execution plan. Change the product ID in query 5 back to 706 and execute it, and a new row will be added to the plan cache – again containing a full plan. So in this case, the optimizer had to do all the work to produce a plan. Imagine hundreds of queries like this on a busy production system, and you can see how all those compilation could impact overall performance, and bloat the plan cache. Using forced parameterization changes this, but at the risk of potentially introducing parameter sniffing issues – so pick your poison.

 

As you can see, parameterization works as designed, and it can be quite useful. Depending on your workload, you might want to consider enabling forced parameterization. But that is not the subject of this blog post!

 

Filtered indexes

 

The feature “filtered indexes” was first introduced in SQL Server 2008. This feature can be very useful, despite its known limitations (hint: if you filter on any columns not included in the index, always INCLUDE them – that solves 95% of the problems). But it has an effect on parameterization and plan reuse that many people are not aware of. To illustrate this, let’s just create a filtered index that should not affect my demo queries at all – an index that includes only products with a much higher product number.

-- Query 6

CREATE INDEX ix_ProductID_Over_1000

ON Sales.SalesOrderDetail (ProductID)

WHERE ProductID > 1000;

Run this query to create the index, then clear the plan cache, activate the option to include the actual execution plan and then run query 1 again. Looking at the plan, you might think that nothing changed – just as before, the value 706 has been replaced by the placeholder @1 in the “Statement” property of the “SELECT” iterator. But as I said, this is not always a reliable source of information – let’s get a second opinion from the plan cache by using query 3 again. Here is the result:

image

The two STATISTICS XML queries are the result of including the actual execution plan (if you don’t believe me, clear the procedure cache, then rerun query 1 and you will see only that single line). Just a single Adhoc plan, no ParameterizedPlanHandle, no Prepared plan – the picture is clear, the plan was not parameterized. The information in the execution plan is misleading

 

So how can a filtered index that would never have been used for this query have this effect on parameterization? The explanation lies of course in the existence of a filtered index that has the ProductID column in its predicate – the same column that is also used in the predicate of the query. Because of the different values, the filtered index may look irrelevant to this query. But if the parameterization were successful, the same plan would ALSO be used for this query:

-- Query 7

SELECT COUNT(*)

FROM   Sales.SalesOrderDetail

WHERE  ProductID = 1706;

After parameterization, this query looks the same as query 1. If you look at the actual execution plan of this query, you will see that this query not only CAN, but also WILL use the filtered index we created earlier. If SQL Server had allowed parameterization to take place, this option would not have been available. To understand the misleading information in the actual execution plan, we have to understand the actual order of events, and looking at the right performance counter can help here (thanks to Alejandro Mesa, who helped me understand ). Let’s clear out that procedure cache again, and now run the query below – it is the same as query 7, but with two copies of the same query before and after it.

-- Query 8

SELECT cntr_value

FROM   sys.dm_os_performance_counters

WHERE  [object_name] LIKE '%:SQL Statistics%'

AND    counter_name = 'Unsafe Auto-Params/sec';

go

SELECT COUNT(*)

FROM   Sales.SalesOrderDetail

WHERE  ProductID = 706;

go

SELECT cntr_value

FROM   sys.dm_os_performance_counters

WHERE  [object_name] LIKE '%:SQL Statistics%'

AND    counter_name = 'Unsafe Auto-Params/sec';

go

If you follow along, you will see that the value of this performance counter goes up by one. This is an artifact of a chain of events that starts with one component (I assume the parser – I guess someone like Paul White would probably know, or immediately attach a debugger to find out) deciding to parameterize the query – this is the version of the plan you see in the graphical execution plan. The next component (probably the optimizer) notices that the plan is not stable – meaning that different values of the constant can result in different plans being optimal. That makes it unsafe for simple parameterization, so the parameterized version is rejected, and the original, non-parameterized version is compiled, stored in the plan cache, and executed instead.

 

Unfortunately, this mechanism of rejecting unsafe parameterizations may get a bit too enthusiastic. Let’s first create one more index – this time not indexed, but with an included column:

-- Query 9

CREATE INDEX ix_ProductID_Incl_OrderQty

ON Sales.SalesOrderDetail (ProductID)

INCLUDE (OrderQty);

If I now want to find the total ordered quantity instead of just the number of rows for a product, I would modify my original query 1 as follows:

-- Query 10

SELECT SUM(OrderQty)

FROM   Sales.SalesOrderDetail

WHERE  ProductID = 706;

This query will never use the filtered index. Change the value 706 to something above 1000 and check the execution plan if you don’t believe me. Then use a hint to force the filtered index and compare the new execution plan with the old one to see the reason. The new index includes the OrderQty column, so it covers this query. The filtered index does not include this column, so a lookup into the clustered index has to be added to the plan, making it much more expensive (about twice the cost, according to the optimizer’s estimates). For this particular query, the filtered index is totally irrelevant – so now the plan is stable again, and there is no need to reject the parameterization attempt, right? Well, you can use any of the methods outlined above (clearing the cache, running the query, then looking at the cache contents; or watching the “Unsafe Auto-Params/sec” when submitting the query) and you will see that the parameterization is still rejected. The decision to reject or accept a parameterized query is not made after a full optimization, but after just a quick glance at the start of the optimization process. By the time the optimizer has explored enough options to realize that there is a single plan that is optimal for all values, even those included in the filtered index, the decision has already been made and it cannot be undone.

 

Conclusion


If you have issues with a bloated plan cache or too much CPU time going to compiling queries, check for filtered indexes. Their habit of getting in the way of successfully parameterizing queries may be a root cause of your problems.

 

But that’s not the whole story! In the next part, I will show you how to convince SQL Server to change the default behavior of not parameterizing queries if a filtered index is present – but I will also demonstrate how this can result in even worse performance, or even runtime errors!
Published Sunday, January 19, 2014 11:29 PM by Hugo Kornelis

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

gotqn said:

So, that's why you said at the beggining: "hint: if you filter on any columns not included in the index, always INCLUDE them – that solves 95% of the problems".

Making alter filter indexes covered indexes should handle the situation above.

January 26, 2014 4:06 PM
 

Vicky Thakor said:

I think you didn't understand properly. Its not understandable for human, Its for alien only.

August 12, 2014 9:42 AM

Leave a Comment

(required) 
(required) 
Submit

About Hugo Kornelis

Hugo is co-founder and R&D lead of perFact BV, a Dutch company that strives to improve analysis methods and to develop computer-aided tools that will generate completely functional applications from the analysis deliverable. The chosen platform for this development is SQL Server. In his spare time, Hugo likes to visit the SQL Server newsgroups, in order to share and enhance his knowledge of SQL Server.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement