THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Andrew Kelly

Query Store - Forced Doesn’t Always Mean Forced

 

The new Query Store feature in SQL Server 2016 is a great new addition that we have been anticipating for many years now. There are already a bunch of articles out there that explain what it is even how to go about using it so I won’t repeat that here. However there is one aspect of the feature that has been poorly documented and even a little bit misleading that I want to go over. Please note that as I was writing this blog some of the documentation has been updated by MS and I want to thank them for doing so. The part of this new feature that I want to discuss is the forcing of query plans. After all the ability to force a particular query plan is certainly one of the main reasons this feature exists.  As you read along keep in mind the Query Store tracks at the query or statement level not the overall batch or procedure. Within a particular Query Store (which is database specific) it assigns a unique query_id for each unique statement as defined by the query_hash. We will also have a unique plan_id which is based on the query_plan_hash for a given query_Id. If the query plan for a given query_id changes over time we will get another entry in the Query Store for that combination of the query_id and the new plan_id. This allows us to see the history for a given query for each unique query plan used for such query. The run time statistics such as reads, writes, duration etc. are tied to each combination of the query_id and plan_id. It actually gets a bit more complex in that there are time intervals that come into play as well but for simplicity sake let’s ignore them here, especially since all stats are still tied to the query_id and plan_id combination when aggregated. What this does is allow us to track and see how each plan performed for that query and is very useful information.

This is after all what allows us to be able to decide which plan we want the query to use. Let’s say we had a situation in which over time we ended up with 2 different query plans due to recompilation and parameter sniffing. Each plan may have been perfect for the parameters passed in on the first execution after each recompile and if called with that same parameter each time things are great. However let’s also assume the 2nd time we got a plan it was based on a value passed into the procedure that was not typical for most executions of this procedure. In fact only .1% of the time will this be a valid plan and thus gives us very bad performance for 99.9% of the subsequent calls to this procedure with other values for parameter passed in. That means most of the time the performance will not be what we want and may cause lots of performance issues.  We can try to recompile the procedure and hope we get the right values passed in to give us the plan that is best for 99.9% of the time but that is somewhat of a gamble. Instead we can use this great new feature called the Query Store and we can force the previous plan that gave us the great performance the majority of the time. We could see from the history of each plan which plan worked best by looking at the statistics kept for each plan.  This would be true even if there were a dozen plans. SSMS even has graphical tools to help us with that decision making and also allows us to force that plan for subsequent runs. We can do this via TSQL as well of course.

Let’s summarize the situation. We have 2 query plans in the Query Store and the most recent one is also the current plan in the plan cache that is being used for all new executions of this procedure. But that plan is bad for all but .1% of the values we may pass in to the procedure. The previous plan in the Query Store is a much better plan overall and that is the one we want to ensure is used regardless of the value passed in. As such we go ahead and force the plan using the provided tools or TSQL which sets the is_forced_plan to 1 for the 1st plan in sys.query_store_plan. As a simplified explanation this action invokes a recompile and the current plan (which was bad) is replaced with a new plan that is based on the one we forced. That new plan now becomes the current one in the cache and is now the one in the Query Store that all new statistics are tied to as well.

Most people would think that if they forced a particular plan that was in the Query Store and it was marked as forced we would in fact be using that very same plan identified by the plan_id and query_plan_hash which is tied to the plan we forced. Keep in mind that if there were problems with the recompile such as it was missing an index that was there when the original plan was created we would get an error which would be listed in the force failure columns and a different plan would obviously need to be used. Errors aside most of the time when we force a plan it gets recompiled and we end up with the same plan as that which we forced. If that plan is the same as the original one we forced it will have the same query_plan_hash and thus the same plan_id.  All future executions will now use that plan and all statistics will be tied to it as well.  This is exactly what we would expect once we forced a plan in the Query Store.

OK so here is where things may not happen exactly as you would expect. When you force a plan as stated earlier it goes thru a recompilation process and it tries to generate the same or very similar plan as the one you picked. Again assuming nothing has changed such as objects being created or dropped most people would assume it would give us the exact plan we forced. However it is not guaranteed that you will get the same exact plan and in fact it may have a different plan_id and query_plan_hash altogether. How can it do that you say, after all didn’t I FORCE that plan? Yes you did but during the recompile process it may have found a slightly different (maybe better maybe not) way to accomplish the same task that is very similar yet somewhat different than the original plan. For instance the original plan was a Clustered Index scan as is the new one, but the new one may have added, moved or removed an operator which still gives it a valid plan just not the exact one you forced. In this case since it gets a new plan_id all subsequent statistics will be tied to the new plan_id and not the one you actually forced. The is_forced_plan is still set to 1 for the original plan that you forced where as the one currently being used is set to 0.  That part can be a bit confusing to some people unless they understand how this came to be.

I realize this has been a very long winded explanation but in my opinion necessary to ensure everyone fully understands this potential anomaly so that when or if they see it they will know it is by design and not a bug. It is important to keep in mind that the new plan even if different will be similar to the original plan (the one you forced). As such it will likely perform just fine and get the intended job done which was to get the procedure using a plan that was more appropriate for the majority of the calls. The differences in the plan from the actual forced plan may or may not be noticeable from an overall performance standpoint and as always it depends. But if you are tracking the statistics in the Query Store it is important to know that even if there are no warnings of a failed forced plan the currently active plan may have a different plan_id than the one you manually forced.  One other thing to note is that if the current query plan is recompiled it will always go back to the plan that is marked as forced and that will be the one used in the recompile process. So it may very well pick the original plan again and now you are back to operating as you expected to begin with.

Don’t in any way take this post to be a warning not to use the Query Store because it is not. This is simply an attempt to make people aware of the potential behavior that is not well documented or can be confusing to many of us. I too was confused until I dug deep enough to figure out what was going on and hopefully this will save you the time and effort to do the same.  Here is an example of how you can see this behavior and why it happens. I think it will go a long way towards making people aware of this behavior and understanding why it is normal albeit a bit unexpected.

 This example uses the new WideWorldImporters sample database for SQL Server 2016 with just a few slight modifications. First I update a single row in the Sales.Invoices table to set the DeliveryMedthodID = 1. All other rows have a DeliveryMethodID = 3. I then create a stored procedure as shown below which queries the Invoices table using the DeliveryMethodID tied to the input parameter of the procedure.

 

USE WideWorldImporters ;

GO

-- This allows for a NCI Seek since thre is only 1 row with this value

UPDATE Sales.Invoices SET DeliveryMethodID = 1 WHERE InvoiceID = 425 ;

GO

 

 

IF OBJECT_ID('Sales.GetInvoicesByDeliveryMethodID',N'P') IS NOT NULL

    DROP PROCEDURE Sales.GetInvoicesByDeliveryMethodID

GO

CREATE PROCEDURE Sales.GetInvoicesByDeliveryMethodID

@DeliveryMethodID INT

 

AS

BEGIN

 

    SET NOCOUNT ON ;

 

    SELECT TOP (10000) * FROM [WideWorldImporters].[Sales].[Invoices] AS i

        WHERE i.DeliveryMethodID = @DeliveryMethodID ;

 

END ;

GO

 

I then clear out the query store tables using this command:

ALTER DATABASE [WideWorldImporters] SET QUERY_STORE CLEAR ;

GO

 

Throughout the demo you can use this query to see the results in the Query Store stables.

SELECT--CAST(p.query_plan AS XML) AS [XML Plan],

       q.query_id, q.query_hash, p.plan_id, p.query_plan_hash

     , p.is_forced_plan, last_force_failure_reason_desc AS [Failures]

     , SUM(s.count_executions) AS [Execs], SUM(s.avg_logical_io_reads) AS [Avg LReads]

     , MAX(s.last_execution_time) AS [Last Execution], MAX(q.last_compile_start_time) AS [Last Compile], t.query_sql_text

FROM sys.query_store_query AS q

JOIN sys.query_store_query_text AS t ON q.query_text_id = t.query_text_id

JOIN sys.query_store_plan AS p ON q.query_id = p.query_id

JOIN sys.query_store_runtime_stats AS s ON p.plan_id = s.plan_id

WHERE t.query_sql_text LIKE '%(@DeliveryMethodID int)SELECT%' AND t.query_sql_text NOT LIKE 'SELECT%'

GROUP BY q.query_id, q.query_hash, p.plan_id, p.query_plan_hash, p.is_forced_plan, last_force_failure_reason_desc, t.query_sql_text

ORDER BY q.query_id, p.plan_id ;

I then will execute a series of commands as shown directly below which will show the behavior outlined in the blog above. Just below this code is a step by step description of what happens and why, including the query plans that result from each stage.

EXEC Sales.GetInvoicesByDeliveryMethodID @DeliveryMethodID = 3 ;

GO

EXEC Sales.GetInvoicesByDeliveryMethodID @DeliveryMethodID = 1 ;

GO

 

EXEC sp_recompile N'Sales.GetInvoicesByDeliveryMethodID' ;

GO

 

EXEC Sales.GetInvoicesByDeliveryMethodID @DeliveryMethodID = 1 ;

GO

EXEC Sales.GetInvoicesByDeliveryMethodID @DeliveryMethodID = 3 ;

GO

 

-- Force the scan plan  (query_id, Plan_id)

EXEC sys.sp_query_store_force_plan 1,1;

GO

 

EXEC Sales.GetInvoicesByDeliveryMethodID @DeliveryMethodID = 1 ;

GO

 

First we execute the stored procedure twice ensuring we pass in the value = 3 for the first execution. Since all but 1 row in the table will math that value we will get a plan that is a full scan of the Clustered Index on the Invoices table as shown below, which includes a filter operator after the scan to filter the WHERE clause SARG. When you run the supplied query you will see that we have 2 executions for the same combination of query_id and plan_id in the Query Store tables.

Plan1

Next we recompile the stored procedure to simulate a situation in which the current plan becomes invalid such as statistics being updated on the underlying table or index.  

Then we execute the procedure two more times careful to pass in the value = 1 for the parameter the first time after the recompile. This simulates a situation in which we now have a query plan that was based on an atypical parameter and as mentioned in the blog above will give us poor performance for any future executions with a value other than 1.  The reason is that the new plan was based on a single row being returned vs. 10K rows and was able to utilize the nonclustered index on the DeliveryMethodID as shown below.

Plan2

While the index seek may be great for a single row it is far more expensive than a scan when retrieving 10K rows. This can be seen by looking at the Avg LReads column which shows the new plan over twice the reads as the original plan.

Once we realize this we quickly decide to force the original query plan by using the system supplied stored procedure. Please note that the two parameters used in the example for the forcing may not match the ones in your environment.  If not please change them accordingly. The first parameter is for the query_id and the 2nd is for the plan_id that you wish future executions use.

Finally we then execute the procedure once more using the parameter value = 1 and then take another look at the Query Store results.  We now see that there is a 3rd plan that was generated for that query_id and it has a different plan_id and query_plan_hash than the one we forced.  We see that the is_forced_plan column is set for the one we forced and not the new one that was generated as a result. However you can see that the statistics for any new executions are tied to the newly generated plan and not the one marked as forced. Again this does not mean it is broken or that there is a bug it is simply a different implementation then most people would have expected when first using the Query Store and especially if they read the documentation as it existed before a few days ago.

I can see that you have one last question though and that is “why did it generate a new plan when it should have used the original one since the original plan is still valid”? That is a good question and luckily I have a good answerJ. If we look at the plan that was generated below we will see a slight difference in the overall plan. Notice that we no longer have the filter operator highlighted in the 1st plan. Instead if we look at the properties for the scan operator we see that the optimize chose to push the predicate (WHERE DeliveryMethodID = @DeliveryMethodID) as part of the scan operation itself. If you remember the original plan sniffed the parameter passed in on the first execution and it was a 3 which matched all rows but one. The optimizer knew that all rows but 1 matched this value and we had a TOP 10K on a table with much more rows than that. So it made sense that it could simply push ~10K rows down and filter later. However when we forced the plan which caused a recompile the very next time we executed the procedure we passed in a 1 which only has 1 matching row. Normally that would use the nonclustered index and do a seek. But since the original plan that we forced was a clustered index scan this needed to also do a nonclustered index scan. But this time the optimizer saw that the value we were looking for only matched one row and decided that it could filter on that row during the scan better than filtering later in the process. Or it simply found this plan was close enough to the original and stopped there. I honestly don’t know which but suspect the 1st explanation is more correct. 

Plan3

In either case the point is that we can indeed end up with different plans than what we have explicitly forced using the Query Store. Most of the time this will likely never be a problem but there are always exceptions and you may find some day the plan it generated is not what you ideally want and this is why. Also be aware that a recompile can happened for many reasons at any time so exactly which plan it will use going forward may change over time. Chances are whichever plan it choses will still be better than the one that caused you to force one in the first place if you did your homework correctly.  Another question I hear is “why not just use a query hint such as OPTIMIZE FOR in the procedure and be done with it”? Well one good reason is that you may be using 3rd party software that uses procedures in which you are not allowed to change. This in my opinion is a much better and easier solution than using plan guides.  And one last tip is that once you do force a plan you should check it regularly to ensure it is still doing what you expect. Your fellow DBA may decide to make changes to the objects this plan references and make your forced plan invalid without you knowing. The Query store is just another tool in our SQL Server toolbox that certainly has the potential to give us great benefit but if misused or neglected can have the opposite effect.

If you are going to the PASS Summit 2016 and Query Store sounds interesting to you and you want to know more please attend my session for a better overall summary of what it is and how to use it.

If you made it this far thanks for hanging in there and good luck,

Andy

Published Sunday, September 18, 2016 8:30 PM by Andrew Kelly

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

No Comments

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Privacy Statement