THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Tibor Karaszi

Execution plan re-use, sp_executesql and TSQL variables

Let me start by saying that the contents of this post is not very advanced. If you have read the excellent paper "Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005", http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx and understood it, you would already know below, and much more...

I was reading a thread in the open newsgroups today (.tools, posted by Mike), about an advice that the SQL Server 2005 Database Engine Tuning Advisor apparently gave. It advices to replace sp_executesql usage with declaring TSQL variables and using those in the WHERE clause. Translated to the Adventureworks database, it advices that instead of 1) below, we should use 2) below.

1)
EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM Person.Person
WHERE LastName = @P1'
,
N'@P1 nvarchar(50)''Diaz'

2)
DECLARE @P1 NVARCHAR(50)
SET @P1 'Diaz'
SELECT FirstNameLastNamePersonTypeTitle
FROM Person.Person
WHERE LastName @P1

Now, I could not reproduce this (make DTA give me the same advice, to replace 1) with 2) ). Perhaps I misunderstood the poster in the group, it is because I'm running SQL Server 2008 DTA and engine, I'm not looking in the right place for that advice, my data isn't representative, I'm running the DTA with some other settings, etc.. But say that DTA does indeed give such an advice, would would it do that? To be honest, I don't know. It can hardly have enough information to determine whether 1) or 2) is the best choice.

In short: Say we have an index on the LastName column and the name we look for can either be a very common name, like perhaps "Smith", or a not so common name, like "Karaszi". For the more common name, a table scan might be the best choice, where for the not-so-common name, an index seek it probably best thing. OK, a very simple example, but serves well for this discussion.

Back to the difference between 1) and 2). There are potentially very important differences between the two:

For 1), SQL Server will determine a plan based on the contents of the parameter when the plan is created. I.e., it can determine selectivity based on that and determine whether it is a good idea to use an index or not. The plan is then cached and can be re-used. But what if we for the first execution pass in something which is very selective, but typically we are not very selective? Or the other way around? I,e, the plan for one case might not be optimal for some other case. This is where we have new optimizer hints in 2005 like OPTIMIZE FOR and the RECOMPILE hints. I will not go into details about these here, since I'm focusing on the differences between 1) and 2). See the white paper I mentioned, and of course Books Online, for more details.

For 2), SQL Server (or rather: the optimizer) has no knowledge of the contents of the variable when the plan is produced. So it cannot use the statistics to determine selectivity. In above case, it instead uses density (stored with the statistics, assuming such exists for the column). Density is basically the 1/ number of unique values for the column(s). This might be a good representation for a typical lastname, but perhaps not too good for a very common or a very uncommon lastname. Now, in 2005, we have hints for these situations as well (RECOMPILE), but again, that is not the point here.

In order for DTA to give the best advice here, it would have to know about the distribution over the data for that column and also have many executions of that query to see if "all" executions are using a typical value (sp_executesql might be better) or if the stored density value is a good representation for "all" queries that are passed from the clients. I very much doubt that DTA has this level of sophistication. Basically, I don't know why it advices this. There might be other aspects, like "avoid dynamic SQL" (which whether that holds for this case we could argue in another post), but DTA is about performance, not best practices.

Bottom line is that these things are not simple and we should be very cautious with "rules of thumbs".

Here's some TSQL code to demonstrate the differences between 1) and 2). As always, only execute after you read and understood the code!

--Create a copy of the person table
--We will have lots of "Diaz" and very few "Gimmi"
IF OBJECT_ID('p'IS NOT NULL DROP TABLE p
GO
SELECT INTO 
FROM Person.Person

CREATE INDEX ON dbo.p(LastName)

--Create lots of Diaz
INSERT INTO p
 
SELECT  BusinessEntityID 30000PersonTypeNameStyleTitleFirstNameMiddleNameN'Diaz'SuffixEmailPromotionAdditionalContactInfoDemographicsrowguidModifiedDate
 
FROM p
 
--Make sure we have up-to-date statistics
UPDATE STATISTICS WITH FULLSCAN

--Verify execution plan and I/O cost
--for table scan with low selectivity 
--and index seek with high selectivity
SET STATISTICS IO ON

DBCC 
FREEPROCCACHE
SELECT FROM WHERE LastName 'Diaz'
--20183 rows, table scan, 7612 pages

DBCC FREEPROCCACHE
SELECT FROM WHERE LastName 'Gimmi'
--1 row, index seek, 3 pages



------------------------------------------------------------
--sp_execute alternative
------------------------------------------------------------

--Table scan will be used for both because of execution plan re-use
DBCC FREEPROCCACHE
EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1'
,
N'@P1 nvarchar(50)''Diaz'
--20183 rows, table scan, 7612 pages

EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1'
,
N'@P1 nvarchar(50)''Gimmi'
--1 row, table scan, 7612 pages

--Other way around
--Index search will be used for both because of execution plan re-use
DBCC FREEPROCCACHE
EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1'
,
N'@P1 nvarchar(50)''Gimmi'
--1 row, index seek, 3 pages

EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1'
,
N'@P1 nvarchar(50)''Diaz'
--20183 rows, index seek, 20291 pages


------------------------------------------------------------------
--Alternative using variable
------------------------------------------------------------------
DBCC FREEPROCCACHE
DECLARE @P1 NVARCHAR(50)
SET @P1 'Diaz'
SELECT FirstNameLastNamePersonTypeTitle
FROM p
WHERE LastName @P1
GO
--20183 rows, index seek, 20291 pages

DBCC FREEPROCCACHE
DECLARE @P1 NVARCHAR(50)
SET @P1 'Gimmi'
SELECT FirstNameLastNamePersonTypeTitle
FROM p
WHERE LastName @P1
GO
--1 rows, index seek, 1 pages

--Same plan even though very different selectivity
--and emptying plan cache in between

--Estimated 33 rows for both above.
--See if that is drawn from statistics density?

--Formula for density: 1/#OfUniqueValuesInColumn
SELECT 1./COUNT(DISTINCT LastNameFROM p
--0.00082918739

--Does that match density in index statistics?
DBCC SHOW_STATISTICS('p''x')
--Yes

--How many rows in the table?
SELECT COUNT(*) FROM p
--39944

--So how many rows would we estimate based on density?
SELECT 0.00082918739 39944
--Yep, 33 rows.

--I.e, for the variable alternative, SQL Server has no 
--knowledge of contents of those variables so it must use density instead.

--Clean up:
IF OBJECT_ID('p'IS NOT NULL DROP TABLE p
Published Friday, August 29, 2008 9:34 AM by TiborKaraszi
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Michael Zilberstein said:

That's a parameter sniffing problem. I my reporting application I use OPTION(RECOMPILE) hint on statement level for statements that run on unevenly distributed data. Those statements don't run hundreds times a second, e.g. I don't pay too much for recompilation overheard. On the other hand cost of inefficient execution plan (in case of reuse - without RECOMPILE hint) would be much higher. Those are online reports, so couple of milliseconds delay for all users is still cheaper than seconds or even tens of a seconds delay for part of the users.

August 29, 2008 6:39 AM
 

TiborKaraszi said:

I wouldn't call parameter sniffing a "problem" per se. I would say that parameter sniffing is a behaviour that one should be aware of and from that awareness one can decide whether one like that behaviour and want to explore it or if one want to avoid it (using for instance OPTION RECOMPILE).

My point, however, with this post was more to question such a blanket statement such as DTA was reported in that newsgroup post to have made. And in general question various rules of thumbs that we all find here and there...

August 29, 2008 7:12 AM
 

Ranga said:

Very good analysis...

--Ranga

September 3, 2008 4:24 PM
 

r5d4 said:

Definitely adding this to my list of things to try.

Made excellent reading.

r

September 10, 2008 7:21 PM
 

ibrahim Özbek said:

OPTION RECOMPILE is not the best way. Because as u can understand it recompiles executionplan so it takes time. I have a query like this normal query takes less than 1 second query with sp_executesql takes 37 seconds :) wtih RECOMPILE 27 seconds :))

September 19, 2008 10:25 AM
 

TiborKaraszi said:

I would not call OPTION RECOMPLIE either a "good" or a "bad" option. It is an option and *used in the right situation* it can be very valuable.

If you have a query where compile time is very high and difference in execution time is low, then we would *not* want to use this option.

But what about the other way around? Say that compile time is only a few milliseconds, and difference in execution time is several seconds, or even minutes? Here we definitely have a candidate for OPTION RECOMPILE (although there are also other options for us).

September 22, 2008 1:18 AM
 

gmasselli said:

We had the exact problem some user here reported: same query using sp_executesql ran in 21 seconds, ad-hoc from mgmt studio less than a second. Reading the white paper was extremely useful for understanding how, internally, Sql Server was working the queries; thanks for the link!

Our situation was resolved by increasing the sampling size of the statistics for two of our tables used in the query. Once this was done both queries used the same query plan and, more importantly, both executed in under a second. Problem solved!

December 18, 2008 11:29 AM

Leave a Comment

(required) 
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement