THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Paul White: Page Free Space

A technical SQL Server blog from New Zealand. See also my articles on SQLperformance.com

Temporary Tables in Stored Procedures

Ask anyone what the primary advantage of temporary tables over table variables is, and the chances are they will say that temporary tables support statistics and table variables do not.  This is true, of course; even the indexes that enforce PRIMARY KEY and UNIQUE constraints on table variables do not have populated statistics associated with them, and it is not possible to manually create statistics or non-constraint indexes on table variables.  Intuitively, then, any query that has alternative execution plans to choose from ought to benefit from using a temporary table rather than a table variable.  This is also true, up to a point.

The most common use of temporary tables is in stored procedures, where they can be very useful as a way of simplifying a large query into smaller parts, giving the optimizer a better chance of finding good execution plans, providing statistical information about an intermediate result set, and probably making future maintenance of the procedure easier as well.  In case it is not obvious, breaking a complex query into smaller steps using temporary tables makes life easier for the optimizer in several ways.  Smaller queries tend to have a smaller number of possible execution plans, reducing the chances that the optimizer will miss a good one.  Complex queries are also less likely to have good cardinality (row count) estimates and statistical information, since small errors tend to grow quickly as more and more operators appear in the plan.

This is a very important point that is not widely appreciated.  The SQL Server query optimizer is only as good as the information it has to work with.  If cardinality or statistical information is badly wrong at any point in the plan, the result will most likely be a poor execution plan selection from that point forward.  It is not just a matter of creating and maintaining appropriate statistics on the base tables, either.  The optimizer does use these as a starting point, but it also derives new statistics at every plan operator, and things can quickly conspire to make these (invisible) derived statistics hopelessly wrong.  The only real sign that something is wrong (aside from poor performance, naturally) is that actual row counts vary widely from the optimizer’s estimate.  Sadly, SQL Server does not make it easy today to routinely collect and analyse differences between cardinality estimates and runtime row counts, though some small (but welcome) steps forward have been made in SQL Server 2012 with new row count information in the sys.dm_exec_query_stats view.

The benefits of using simplifying temporary tables where necessary are potentially better execution plans, now and in the future as data distribution changes and new execution plans are compiled.  On the cost side of the ledger we have the extra effort needed to populate the temporary table, and maintain the statistics.  In addition, we expect a higher number of recompilations for optimality reasons due to changes in statistics.  In short, we have a trade-off between potential execution plan quality and maintenance/recompilation cost.

The problem, though, is that temporary tables do not work quite how (almost) everyone expects them to…

A World Without Temporary Objects

Imagine if temporary tables and table variables did not exist, and we had a complex query with cardinality-estimation difficulties that would be best addressed by breaking the query into parts.  The example presented below is necessarily simplified, and uses the AdventureWorks sample database to make it accessible.  It is not a query with insurmountable optimization problems by any means, but bear with me.

IF  OBJECT_ID(N'dbo.Demo', N'P') IS NOT NULL
    DROP PROCEDURE dbo.Demo;
GO
CREATE PROCEDURE dbo.Demo
    @StartsWith nvarchar(50)
AS
BEGIN
    SET NOCOUNT ON;
 
    SELECT
        p.Name,
        OrderCount = COUNT_BIG(DISTINCT th.ReferenceOrderID)
    FROM Production.Product AS p
    JOIN Production.TransactionHistory AS th ON
        th.ProductID = p.ProductID
    WHERE
        p.Name LIKE @StartsWith + N'%'
    GROUP BY
        p.Name;
END;

This procedure returns a count of orders containing a part whose name starts with the specification passed in as a parameter.  For example:

DBCC FREEPROCCACHE;
EXECUTE dbo.Demo @StartsWith = N'E';

image

A Short Digression about Parameter Sniffing

The only real optimization issue in this simplified example is parameter-sniffing.  The number of rows that match the LIKE predicate varies widely depending on the value of the parameter, which can impact execution plan selection.  When the SELECT query is compiled or recompiled, SQL Server uses the actual value of the parameter at the time to estimate the number of rows qualified from the Product table.  If the SELECT statement happens to compile or recompile with a parameter that matches very few rows, we get an execution plan like this (click to enlarge the graphic):

DBCC FREEPROCCACHE;
EXECUTE dbo.Demo @StartsWith = N'[0-9]';

Parameter Sniffing [0-9]

This plan is cached and reused next time the procedure is called, perhaps with a parameter value that matches more rows than before, but not enough to cause a recompilation:

EXECUTE dbo.Demo @StartsWith = N'M';

Parameter Sniffing M

This results in a large number of Key Lookups and a Sort that spills to tempdb physical disk (the warning triangle is new in SQL Server 2012).  The memory grant for the sort was sized for the expected 370 rows, not the 22,527 that actually turned up.  (If you are curious about the Constant Scan and Compute Scalar in these plans, see my previous post on dynamic seeks for details).

One way (certainly not the only way) to address this parameter-sniffing problem is to ask SQL Server to compile a fresh query plan for the SELECT statement on every execution, using the OPTION (RECOMPILE) query hint:

ALTER PROCEDURE dbo.Demo
    @StartsWith nvarchar(50)
AS
BEGIN
    SET NOCOUNT ON;
 
    SELECT
        p.Name,
        OrderCount = COUNT_BIG(DISTINCT th.ReferenceOrderID)
    FROM Production.Product AS p
    JOIN Production.TransactionHistory AS th ON
        th.ProductID = p.ProductID
    WHERE
        p.Name LIKE @StartsWith + N'%'
    GROUP BY
        p.Name
    OPTION (RECOMPILE);
END;

Now we get plans optimized for the particular parameter value at runtime, and (in SQL Server 2008 SP1 CU5 or later) the added bonus that the runtime value of the parameter is embedded into the query so the dynamic seek is not required:

EXECUTE dbo.Demo @StartsWith = N'[0-9]';

Parameter Sniffing with RECOMPILE

EXECUTE dbo.Demo @StartsWith = N'M';

Parameter Sniffing with RECOMPILE 2

Back to the Main Example

However, we are imagining that the example query is much more complex than shown, and we decide to break it into two parts.  The first part will fetch qualified items from the Products table, and the second will join the results (with statistics) to the History table.  Remember that temporary objects do not exist for this part of the discussion, so we have to use a permanent table:

ALTER PROCEDURE dbo.Demo
    @StartsWith nvarchar(50)
AS
BEGIN
    SET NOCOUNT ON;
 
    -- Real table to hold intermediate result
    CREATE TABLE dbo.Temp
    (
        ProductID   integer NOT NULL, 
        Name        nvarchar(50) COLLATE DATABASE_DEFAULT NOT NULL
    );
 
    -- First part of the 'complex' query
    INSERT INTO dbo.Temp
        (ProductID, Name)
    SELECT
        p.ProductID,
        p.Name
    FROM Production.Product AS p
    WHERE
        p.Name LIKE @StartsWith + N'%';
 
    -- Second part referencing the 'temp' table
    SELECT
        t.Name,
        OrderCount = COUNT_BIG(DISTINCT th.ReferenceOrderID)
    FROM dbo.Temp AS t
    JOIN Production.TransactionHistory AS th ON
        th.ProductID = t.ProductID
    GROUP BY
        t.Name;
 
    -- Show the statistics for the Name column
    DBCC SHOW_STATISTICS (Temp, Name) WITH STAT_HEADER, HISTOGRAM;
 
    DROP TABLE dbo.Temp;
END;

This procedure is not very practical as it stands, because the CREATE TABLE and DROP TABLE statements would make it impossible for more than one user to execute it at the same time.  Nevertheless, it does show how most people expect things to work regarding statistics, temporary table lifetimes, and compilation:

DBCC FREEPROCCACHE;
EXECUTE dbo.Demo @StartsWith = N'A';

There are three matching Product records, shown below together with the SHOW_STATISTICS output:

Permanent Table Output A

Nothing too surprising in the execution plans: the table INSERT is a trivial plan (no choices for the optimizer to make), and the SELECT query has cardinality estimates that are very close to the actual numbers encountered.  The warning triangle on the SELECT (again, new for SQL Server 2012) is just suggesting an index for the History table, which we won’t be adding on this occasion.

Permanent Execution Plan A

While the procedure was executing, a Profiler trace captured the following compilation and statistics activity:

image

The table is created, and since this is the first time the procedure has been run, the INSERT and SELECT statements both incur a deferred compilation.  As part of the SELECT compilation, new statistics are created on the Name and ProductID columns of the new table, the DBCC SHOW_STATISTICS command executes, and finally the table is dropped.  Now we execute the procedure again but for products that start with ‘E’ rather than ‘A’ as above:

EXECUTE dbo.Demo @StartsWith = N'E';

Permanent Output E

This time there are nine matching records, though the statistics-gathering algorithms compress the information into the five histogram steps shown.  The execution plan is below (trivial INSERT plan not repeated):

image

Again, the cardinality estimates are extremely good, and the optimizer selected a nested loops plus key-lookup plan based on costs computed according with the smaller number of expected rows.  The trace captured the following:

image

The only change is that this time recompilation is triggered because the cached INSERT and SELECT plans reference a table that no longer exists, leading to a ‘schema changed’ recompilation.  The crucial observations here are that the table is recreated and new statistics are gathered on every execution.  This is the way most people expect tables, statistics, and compilations to behave inside stored procedures, so don’t worry if there’s nothing too surprising so far.

Using a Temporary Table

The previous example is not terribly practical.  It is possible to conjure up schemes where ordinary tables can serve the purpose effectively, but as it stands, the procedure would likely cause runtime errors if more than one user or process attempted to execute it concurrently.  In addition, ordinary tables do not benefit from engine optimizations that only apply to genuine temporary tables, so they tend to be less efficient, even if created in tempdb.  We now return to the real world (where temporary tables do exist) and modify the procedure to use a temporary table instead of a permanent table:

ALTER PROCEDURE dbo.Demo
    @StartsWith nvarchar(50)
AS
BEGIN
    SET NOCOUNT ON;
 
    CREATE TABLE #Temp
    (
        ProductID   integer NOT NULL,
        Name        nvarchar(50) COLLATE DATABASE_DEFAULT NOT NULL
    );
 
    INSERT INTO #Temp
        (ProductID, Name)
    SELECT
        p.ProductID,
        p.Name
    FROM Production.Product AS p
    WHERE
        p.Name LIKE @StartsWith + N'%';
 
    SELECT
        t.Name,
        OrderCount = COUNT_BIG(DISTINCT th.ReferenceOrderID)
    FROM #Temp AS t
    JOIN Production.TransactionHistory AS th ON
        th.ProductID = t.ProductID
    GROUP BY
        t.Name;
 
    DBCC SHOW_STATISTICS (N'tempdb..#Temp', Name) WITH STAT_HEADER, HISTOGRAM;
 
    DROP TABLE #Temp;
END;

Executing this procedure with an ‘E’ parameter produces exactly the same output as when the permanent table was used.  Same execution plan, same statistics, same Profiler output, everything:

DBCC FREEPROCCACHE;
EXECUTE dbo.Demo @StartsWith = N'E';

Temporary Table Execution Plan E

Now we will execute the procedure with a parameter value of ‘T’, which has 65 matching records in the Product table.  This is the output (only the first nine of the 65 matching names are shown in the first result set):

image

The results are as expected (products that begin with ‘T’) but the SHOW_STATISTICS output displays ‘E’ statistics from the previous execution!

Temporary Table Execution Plan T

The execution plan has serious cardinality estimation errors.  It has the same nested loops and key-lookup shape and estimated row counts as the ‘E’ execution, instead of being optimized for the ‘T’ data.  This causes slow performance and a sort that spills data to physical tempdb disk.

image

The Profiler output explains why: no recompilations occurred, and no statistics were created.  In fact, with the AdventureWorks database, there is no way to persist a different execution plan or update the statistics for this procedure without changing it (more on that later).  Even executing with [A-Z] as the parameter reuses the plan optimized for the ‘E’ value:

EXECUTE dbo.Demo @StartsWith = N'[A-Z]';

Temporary Table Execution Plan A-Z

The key lookup is now executed 113,443 times, and the sort has to perform a multi-pass spill to tempdb (previously only a single pass spill was required).  Executing the procedure with the WITH RECOMPILE option will produce better plans, at the cost of recompiling the whole procedure, and the resulting plans will not be cached for reuse.  In general, recompiling the whole procedure every time could be bad since many procedures have many more statements to recompile than just a single INSERT and SELECT.  Although we have not yet identified a cause for the behaviour we are seeing, perhaps the answer lies in adding a RECOMPILE query hint to the SELECT statement as we did earlier for the parameter-sniffing example?

OPTION (RECOMPILE) to the Rescue?

Thinking that the cause might be a new form of the parameter-sniffing problem, we now modify the procedure to add the RECOMPILE query hint to just the SELECT statement.  This avoids recompiling the whole procedure on each call as EXECUTE … WITH RECOMPILE would do, and will hopefully solve our optimization problems:

ALTER PROCEDURE dbo.Demo
    @StartsWith nvarchar(50)
AS
BEGIN
    SET NOCOUNT ON;
 
    CREATE TABLE #Temp
    (
        ProductID   integer NOT NULL,
        Name        nvarchar(50) COLLATE DATABASE_DEFAULT NOT NULL
    );
 
    INSERT INTO #Temp
        (ProductID, Name)
    SELECT
        p.ProductID,
        p.Name
    FROM Production.Product AS p
    WHERE
        p.Name LIKE @StartsWith + N'%';
 
    SELECT
        t.Name,
        OrderCount = COUNT_BIG(DISTINCT th.ReferenceOrderID)
    FROM #Temp AS t
    JOIN Production.TransactionHistory AS th ON
        th.ProductID = t.ProductID
    GROUP BY
        t.Name
    OPTION (RECOMPILE);
 
    DBCC SHOW_STATISTICS (N'tempdb..#Temp', Name) WITH STAT_HEADER, HISTOGRAM;
 
    DROP TABLE #Temp;
END;

Unsurprisingly, clearing the plan cache and executing with a parameter value of ‘E’ produces the exact same nested loops and key lookup plan as previously shown (and the same statistics and Profiler events too).

DBCC FREEPROCCACHE;
EXECUTE dbo.Demo @StartsWith = N'E';

This is fine because it is the optimal plan for this value.  The test for OPTION (RECOMPILE) comes when we try a second execution with the 65-row ‘T’ value:

EXECUTE dbo.Demo @StartsWith = N'T';

Temporary Table Execution Plan RECOMPILE

The execution plan has a different shape now, with the optimizer correctly preferring a hash join over the nested loops plus key lookup.  The estimated cardinality is correct for the temporary table (65 rows versus 9 previously seen for ‘E’), but it is badly wrong for the hash join (811 rows expected, 12,273 actual) and the distinct sort, which has to perform a single-pass spill to tempdb disk.  The query output shows the reason for the inaccuracy:

Temporary Table RECOMPILE T

The results are correct of course (65 products starting with ‘T’) but the statistics are again still reflecting the parameter value ‘E’.  The RECOMPILE query hint allows the optimizer to see the true cardinality of the temporary table (65 rows) but the 100% wrong statistics on the ProductID column used to estimate the cardinality of the join to the History table result in a mess of a plan from that point forward.  (Note that statistics on the Name column are also considered interesting by the optimizer for this query due to the GROUP BY clause.  The SHOW_STATISTICS output above shows the Name statistics because it more immediately shows that ‘E’ values are stored; the ProductID statistics are integers, and much harder for the reader to correlate with the parameter).

image

The Profiler output above confirms the recompilation due to the query hint, and that no statistics were created or updated.  An interesting thing occurs if we execute the procedure four more times with the same ‘T’ parameter (a total of five times with ‘T’):

EXECUTE dbo.Demo @StartsWith = N'T';
EXECUTE dbo.Demo @StartsWith = N'T';
EXECUTE dbo.Demo @StartsWith = N'T';
EXECUTE dbo.Demo @StartsWith = N'T';

The first three of those executions are exactly as before, but the fourth one suddenly produces this different execution plan:

Temporary Table RECOMPILE T5

The estimates are now well within the boundaries of expected accuracy, and the sort now receives enough workspace memory to avoid spilling rows to tempdb.

image

The statistics have been updated, and now show 65 rows that start with ‘T’ sampled (compressed into 49 histogram steps).  This is confirmed by the trace output:

image

Notice the StatMan and AutoStats entries showing the ProductID and Name statistics being updated.  Unfortunately, executing the procedure again with a new parameter value takes us back to inaccurate cardinality estimates again.  The execution plans for this procedure aren’t always a disaster, but this is pure luck – poor estimates are poor estimates, whether the query plan happens to perform acceptably or not.  We really need to get to the root cause here.  Perhaps OPTION (RECOMPILE) was not the way to go, and we should try a manual statistics update instead of relying on automatic statistics updates…?

Manual Statistics Update to the Rescue?

We know we want to add an UPDATE STATISTICS command to our stored procedure instead of the OPTION (RECOMPILE) … but where to put the new statement exactly?  The statistics we want to update are created during compilation or recompilation of the SELECT statement, suggesting the UPDATE STATISTICS should come after the SELECT; but then how could it possibly affect the preceding SELECT plan?  It seems the only sensible place to put it is before the SELECT, however counter-intuitive it might seem.

ALTER PROCEDURE dbo.Demo
    @StartsWith nvarchar(50)
AS
BEGIN
    SET NOCOUNT ON;
 
    CREATE TABLE #Temp
    (
        ProductID   integer NOT NULL,
        Name        nvarchar(50) COLLATE DATABASE_DEFAULT NOT NULL
    );
 
    INSERT INTO #Temp
        (ProductID, Name)
    SELECT
        p.ProductID,
        p.Name
    FROM Production.Product AS p
    WHERE
        p.Name LIKE @StartsWith + N'%';
 
    UPDATE STATISTICS #Temp;
 
    SELECT
        t.Name,
        OrderCount = COUNT_BIG(DISTINCT th.ReferenceOrderID)
    FROM #Temp AS t
    JOIN Production.TransactionHistory AS th ON
        th.ProductID = t.ProductID
    GROUP BY
        t.Name;
 
    DBCC SHOW_STATISTICS (N'tempdb..#Temp', Name) WITH STAT_HEADER, HISTOGRAM;
 
    DROP TABLE #Temp;
END;

Following the same pattern as before, we clear the plan cache (though this isn’t strictly necessary since the ALTER PROCEDURE evicts plans from the cache anyway) then execute with ‘E’ and ‘T’ parameters as before.

DBCC FREEPROCCACHE;
EXECUTE dbo.Demo @StartsWith = N'E';
EXECUTE dbo.Demo @StartsWith = N'T';

The results for the ‘E’ execution are the same as always, so here are just the results for the ‘T’ execution:

UPDATE STATISTICS T

The above output looks very promising: correct results and the statistics have been updated to reflect the ‘T’ parameter.

image

The trace output shows the statistics updates too…but no recompilation.

UPDATE STATISTICS Execution Plan

The execution plan is a disaster.  The estimated number of rows in the temporary table is 9 (the number of rows the ‘E’ execution produced not the 65 actually present), and everything is wrong from that point forward, including the sort spilling to tempdb again.  The UPDATE STATISTICS command did its job, but the cached query plan did not recompile – though we might expect that it should, right?

As with the first temporary table example (no RECOMPILE or UPDATE STATISTICS) this plan is stuck in cache, given the AdventureWorks database.  No parameter value will cause a recompilation, and every subsequent execution will use the ‘E’ plan.  This is very reminiscent of the parameter-sniffing problem we encountered earlier, though the SELECT query in question has no direct parameter value to be sniffed!  In a final effort to produce reliable plans, our last attempt will be to combine both RECOMPILE and UPDATE STATISTICS…

RECOMPILE and UPDATE STATISTICS to the Rescue?

Our stored procedure now includes an UPDATE STATISTICS before the SELECT, and an OPTION (RECOMPILE) query hint on the SELECT itself:

ALTER PROCEDURE dbo.Demo
    @StartsWith nvarchar(50)
AS
BEGIN
    SET NOCOUNT ON;
 
    CREATE TABLE #Temp
    (
        ProductID   integer NOT NULL,
        Name        nvarchar(50) COLLATE DATABASE_DEFAULT NOT NULL
    );
 
    INSERT INTO #Temp
        (ProductID, Name)
    SELECT
        p.ProductID,
        p.Name
    FROM Production.Product AS p
    WHERE
        p.Name LIKE @StartsWith + N'%';
 
    UPDATE STATISTICS #Temp;
 
    SELECT
        t.Name,
        OrderCount = COUNT_BIG(DISTINCT th.ReferenceOrderID)
    FROM #Temp AS t
    JOIN Production.TransactionHistory AS th ON
        th.ProductID = t.ProductID
    GROUP BY
        t.Name
    OPTION (RECOMPILE);
 
    DBCC SHOW_STATISTICS (N'tempdb..#Temp', Name) WITH STAT_HEADER, HISTOGRAM;
 
    DROP TABLE #Temp;
END;

Our usual test executions:

DBCC FREEPROCCACHE;
EXECUTE dbo.Demo @StartsWith = N'E';
EXECUTE dbo.Demo @StartsWith = N'T';

The first ‘E’ execution performs as it always has: correct statistics and an optimal query plan.  The results for ‘T’ are:

STATS and RECOMPILE T

The output is correct, and so are the statistics: 65 rows sampled, stored in 49 histogram steps.

image

Success!  All estimates are very good, the sort is performed in memory, and the query executes quickly.  Further experimentation with other parameter values shows that new statistics are always generated, and the SELECT query plan always recompiles to use those statistics.  If you are concerned by the small discrepancy after the hash join (estimate 10,449 rows; actual 12,273) don’t be.  Cardinality estimation is a tough problem – try using the statistics histograms yourself to predict how the ProductID values in the temporary table (65 rows in 49 steps, remember) will join to the histogram on the same column for the much larger History table.  You’ll find the data there even more compressed, so the task of matching values to values using RANGE_ROWS, EQ_ROWS and so on is really not an easy one at all.

Enter the Table Variable…

The final option we are going to look at is using a table variable instead of a temporary table:

ALTER PROCEDURE dbo.Demo
    @StartsWith nvarchar(50)
AS
BEGIN
    SET NOCOUNT ON;
 
    DECLARE @Temp AS TABLE
    (
        ProductID   integer NOT NULL,
        Name        nvarchar(50) NOT NULL
    );
 
    INSERT INTO @Temp
        (ProductID, Name)
    SELECT
        p.ProductID,
        p.Name
    FROM Production.Product AS p
    WHERE
        p.Name LIKE @StartsWith + N'%';
 
    SELECT
        t.Name,
        OrderCount = COUNT_BIG(DISTINCT th.ReferenceOrderID)
    FROM @Temp AS t
    JOIN Production.TransactionHistory AS th ON
        th.ProductID = t.ProductID
    GROUP BY
        t.Name
    OPTION (RECOMPILE);
END;

There are a couple of small changes to the procedure here.  The COLLATE clause is not necessary in the definition of the Name column, since the default for table variables is the collation of the host database anyway (the default for temporary tables is the collation of tempdb).  There is no need (or option) to drop the table at the end of the procedure; table variables simply go out of scope.  There is no SHOW_STATISTICS statement either, since table variables do not support statistics (even on constraint-supporting indexes).  The first execution with a parameter value of ‘E’ produces this execution plan and trace output:

image

image

The OPTION (RECOMPILE) on the SELECT statement ensures that the cardinality estimate for the table variable is exactly right, but the lack of statistics means the optimizer resorts to guesses after that point.  The guessed estimates turn out to be higher than the actual number of rows, so no performance problems result, although the sort is allocated more memory than it needs.  The trace output is a bit smaller than before, since no statistics are created.

The second test is with a parameter value of ‘T’:

image

Again, the number of rows in the table variable is exactly right due to the RECOMPILE, and the later guesswork also turns out to work quite well.  As you can see, the overall shape of the plan is quite different for this parameter value (hash joins versus nested loops for example).  There is one cardinality estimation problem at the middle hash operator, but whether it causes performance problems or not depends on your version of SQL Server.  To make the issue clearer, we will perform one final test, a parameter value of ‘[A-Z]’, qualifying all 504 products:

image

Once more, the table variable cardinality is exactly right (504 rows).  Both inputs to the hash join look exactly right, and the guessed output from that join is also very close to the actual.  The lack of statistical information makes itself felt at the middle hash operator though; the optimizer has no idea how many distinct values there might be of the combination (Name, OrderID), so it guesses at 2,825.  This is very much smaller than the 113,338 actual, and the hash table spills out to tempdb physical disk as a result.

The table variable approach happens to produce reasonable plans in some cases, based on the sniffed table cardinality provided by OPTION (RECOMPILE) but the lack of statistics leads to wild guesses and potential performance issues deeper in the tree.  It’s interesting that one of the main perceived benefits of table variables – fewer recompilations – has to be abandoned in order to produce semi-reasonable plans.  This follows hard on the heels of our seeing the main benefit of temporary tables – statistics – being 100% wrong earlier!

The last thing I want to mention about table variables is that for all their limitations, they do something else temporary tables cannot: a join from a table variable is estimated for the average case summed over all rows.  This is similar to hinting OPTIMIZE FOR UNKNOWN for a temporary table query with a parameter, but the average-case estimation occurs for all joined rows, not just one predicate, and of course there is no parameter variable in a join for OPTIMIZE FOR UNKNOWN to work on anyway.

Question Time

At this point, you should have some questions.  Why does the temporary table behave so differently from the permanent table?  Do you really have to UPDATE STATISTICS before they are created, and add OPTION (RECOMPILE) when you use temporary tables this way?  Why does the RECOMPILE query hint only produce the desired effect with temporary tables every so often?  Part of the answer to these questions lies with the caching of temporary tables first introduced to SQL Server in the 2005 version.

CREATE and DROP, Don’t

I will talk about this in much more detail in my next post, but the key point is that CREATE TABLE and DROP TABLE do not create and drop temporary tables in a stored procedure, if the temporary object can be cached.  The temporary object is renamed to an internal form when DROP TABLE is executed, and renamed back to the same user-visible name when CREATE TABLE is encountered on the next execution.  In addition, any statistics that were auto-created on the temporary table are also cached.  This means that statistics from a previous execution remain when the procedure is next called.  The following code demonstrates the object id of a temporary table remaining the same between executions (there will be a more comprehensive demonstration in my next post):

ALTER PROCEDURE dbo.Demo
AS
BEGIN
    CREATE TABLE #Temp (col1 int NULL);
 
    SELECT ObjectID = OBJECT_ID(N'tempdb..#Temp');
 
    DROP TABLE #Temp;
END;
GO
EXECUTE dbo.Demo;
EXECUTE dbo.Demo;
EXECUTE dbo.Demo;

image

This is very different from the case where we used a permanent table.  With a permanent table, CREATE and DROP definitely do create and drop a table on each execution of the procedure.  The table is new each time, new statistics are created, and new plans are compiled on each execution.  The permanent table behaviour is probably more intuitive than the apparently strange behaviour of cached temporary tables, but it does allow temporary tables an opportunity to cache plans and temporary objects for reuse, which is not possible when using a permanent table in the way we did.

Recompilation Thresholds

Most of you will be familiar with the concept of recompilation thresholds, as described in the Plan Caching in SQL Server White Paper.  Briefly though, cached query plans include a Recompilation Threshold (RT) for statistics considered interesting by the optimizer while compiling the plan.  The query plan also records the cardinality of tables and indexed views referenced by the plan, at the time the plan is compiled.  (I demonstrated how to determine which statistics are considered ‘interesting’ by the optimizer during compilation in a previous post).

Generally, if the column modification counter (colmodctr) for interesting statistics change by RT or more, the query recompiles.  Similarly, if the table cardinality has changed by RT or more compared with the value stored at the time the cached plan was created, the query plan also recompiles.  The value of RT depends on the type of table (permanent or temporary) and the number of rows (n) in the table or indexed view at the time of compilation.  The following summary of RT values is taken from the White Paper:

Recompilation Thresholds

There is another condition not listed above.  If the table is empty at compilation time, RT is set to 1.

I say “generally” above because although queries referencing permanent tables recompile when interesting statistics modification counters change by RT or more, this is not the case for cached temporary tables in stored procedures, which only recompile when table cardinality changes by RT or more, unless OPTION (RECOMPILE) is used, when statistics counters come back into play.  If this sounds confusing, that’s because it is.  It may be a bug, but on the other hand, it does seem reasonable that a cached temporary table should not cause a recompilation simply because the procedure is called a number of times with the same parameter, with the number of modifications accumulating on each call to eventually pass RT.  In that specific case, we would probably prefer no recompilations.  I will probably file a Connect item to get the situation with statistics modification counters and temporary tables clarified.  Update: the Connect item can be found here.

In our experiments, this explains why the vanilla temporary table procedure (with no RECOMPILE or UPDATE STATISTICS) stuck with the same plan, whereas adding RECOMPILE meant the plan would occasionally recompile (when accumulated changes passed RT).  The vanilla procedure was initially called with a parameter value of ‘E’, which compiled a SELECT execution plan when 9 rows were in the temporary table.  Following the rules above for n=9, this means RT was set to 500, so the plan would recompile when n reached 9 + 500 = 509.  The AdventureWorks Product table only contains 504 rows, so that target could not be reached, and the plan never recompiled.

When OPTION (RECOMPILE) was added, the modification counters did accumulate to eventually pass RT.  The first execution was the same, with the ‘E’ parameter value resulting in 9 rows in the temporary table at compilation time, with RT = 509 as before.  We then ran the procedure a total of five times with a parameter value of ‘T’.  There happen to be 65 product names that start with ‘T’, so five executions resulted in 5 * 65 = 325 changes.  This does not reach the target of 509, but there is a second factor in play here.  Although the temporary table is not dropped at the end of the procedure (just renamed), it is implicitly truncated (whether an explicit DROP TABLE is included in the procedure or not).  This implicit truncation removes 65 rows on each execution, and this action adds to the modification counters.

On each ‘T’ execution then, 65 modifications are counted when the rows are added to the temporary table, and 65 more when the table is silently cleared at the end of the procedure.  In total, the first four ‘T’ executions result in 65 * 2 * 4 = 520 modifications, which is enough to pass the 509 target.  That in itself is not enough to cause a recompilation though, because the RT test is performed as each query is retrieved from the plan cache for execution, and the final 65-row truncation happens at the end of the procedure, after the SELECT has already completed.  On the next execution, whatever parameter is passed in, a recompilation will occur at the SELECT query.  The OPTION (RECOMPILE) version of the procedure is reproduced below:

ALTER PROCEDURE dbo.Demo
    @StartsWith nvarchar(50)
AS
BEGIN
    SET NOCOUNT ON;
 
    CREATE TABLE #Temp
    (
        ProductID   integer NOT NULL,
        Name        nvarchar(50) COLLATE DATABASE_DEFAULT NOT NULL
    );
 
    INSERT INTO #Temp
        (ProductID, Name)
    SELECT
        p.ProductID,
        p.Name
    FROM Production.Product AS p
    WHERE
        p.Name LIKE @StartsWith + N'%';
 
    SELECT
        t.Name,
        OrderCount = COUNT_BIG(DISTINCT th.ReferenceOrderID)
    FROM #Temp AS t
    JOIN Production.TransactionHistory AS th ON
        th.ProductID = t.ProductID
    GROUP BY
        t.Name
    OPTION (RECOMPILE);
 
    DBCC SHOW_STATISTICS (N'tempdb..#Temp', Name) WITH STAT_HEADER, HISTOGRAM;
 
    DROP TABLE #Temp;
END;

Now run the tests as described above:

-- 9 rows, RT = 509
EXECUTE dbo.Demo @StartsWith = N'E';
 
-- 65 rows, 4 times = 260
-- Plus truncation modifications = 520
EXECUTE dbo.Demo @StartsWith = N'T';
EXECUTE dbo.Demo @StartsWith = N'T';
EXECUTE dbo.Demo @StartsWith = N'T';
EXECUTE dbo.Demo @StartsWith = N'T';
 
-- No rows, but plan still recompiled
EXECUTE dbo.Demo @StartsWith = N'5';

There are no products that start with ‘5’, but the 520 modifications already performed cause a recompilation anyway:

image

No results, and NULL statistics created!  Luckily, the new RT value after this compilation is 1, so just one row is needed to produce a new plan.  This is demonstrated by calling the same procedure again with a ‘K’ parameter value (just one product name starts with ‘K’):

EXECUTE dbo.Demo @StartsWith = N'K';

image

After this compilation, the new RT is 1 + 6 = 7, which can be confirmed by executing the procedure with a parameter value of ‘P’ (there are exactly seven Product rows where the name starts with ‘P’).  And so on.

I should mention though, that it is just the carry-over of accumulated modifications that requires OPTION (RECOMPILE).  Without this query hint, modifications are still accumulated within a single execution of the procedure (and may result in recompilation local to that single execution) but any accumulation is not carried over to the next procedure execution.

Statistics Updates

So far we have explained how the vanilla and OPTION (RECOMPILE) procedures produce the observed behaviour.  The next configuration to examine is where OPTION (RECOMPILE) is not specified, but an UPDATE STATISTICS statement appears before the problematic SELECT query.  It should now be apparent that the placing of the UPDATE STATISTICS statement works because auto-created statistics from a previous execution are cached, and available for update by the next execution.  The one remaining question is why the updated statistics do not cause a recompilation in our example – remember we saw correct statistics, but the execution plan was not rebuilt to reflect the new information.  This is either a bug, or another example of different behaviour for cached temporary tables.  In any case, the observed behaviour does not match that described in the White Paper (the same diagram appears in the SQL Server Internals books from Microsoft Press):

image

Whether this is a bug or undocumented designed behaviour (and it is the same on all versions of SQL Server from 2005 to 2012 inclusive) the effect is clear: UPDATE STATISTICS on a cached temporary table in a stored procedure is not enough on its own to force recompilation of a query plan.  A recompilation may still occur for other reasons, for example if the cardinality of the table has changed by RT or more compared with the cardinality value stored with the cached plan.  This cannot happen with our UPDATE-STATISTICS-only test procedure as shown earlier, since the first execution with the ‘E’ parameter value adds 9 rows to the temporary table, setting RT to 509, and the AdventureWorks Product table only has 504 rows in total as mentioned earlier.

For the moment, at least, we may need to specify OPTION (RECOMPILE) as well as UPDATE STATISTICS if we want to be sure of a plan specific to the contents of a cached temporary table on each execution.

Summary and Final Thoughts

Temporary tables in stored procedures have a number of characteristics that may seem unexpected:

  • Temporary objects may be cached across executions, despite explicit CREATE and DROP statements
  • Statistics associated with a cached temporary object are also cached
  • Statistics may be 100% wrong compared with the current contents of a temporary table
  • An implicit TRUNCATE TABLE at the end of the procedure doubles the expected impact on column modification counters
  • It is possible to update cached statistics before the statement that caused them to be created
  • OPTION (RECOMPILE) changes the way column modifications are carried over between procedure calls
  • Manual UPDATE STATISTICS is not enough to force a recompilation

It is possible that one or more of these behaviours is unintentional, but they have been this way for a very long time, and so are very unlikely to be changed.  Temporary tables behave quite similarly (aside from the UPDATE STATISTICS behaviour) to an in-procedure reference to a permanent table that was created outside the procedure.  It may be that some of the features of temporary tables shown in this post have their roots in a design that sought to mimic permanent table behaviours using non-sharable local temporary tables.  On the other hand, it might be simply that the current arrangement provides the best chances for caching and reuse, and we are expected to take action where we suspect reusing the cached plan, temporary object, and/or statistics may not be optimal.  Who knows.

How likely you are to be affected by the issues illustrated in this post depends on usage.  A temporary table that contains wildly differing numbers of rows on every execution is unlikely to be affected.  If the recompilation threshold is passed, chances are you will get good statistics and any cached plan with recompile without intervention.  For a plan cached when the temporary table contained 10,000 rows the default threshold is 500 + 0.2 * 10,000 = 2,500 rows.  This means the next execution with more than 12,500 rows or fewer than 7,500 rows will pass the threshold test.  The formula to compute RT values when trace flag 2371 is enabled is unknown to me.

In principle, in later versions of SQL Server, we can look in the plan cache to check the cardinality of a temporary table and deduce the RT from there, but this is not convenient to do routinely.  In practice, RT moves around a fair bit, so it can be hard to know how often a procedure might use stale cached statistics or a sub-optimal cached plan.  Avoiding temporary table caching removes the risk of stale statistics, but a cached plan may still be used unless the RECOMPILE query hint is used.  Avoiding temporary table caching will also tend to increase contention on tempdb, and can be difficult to enforce in practice.

Procedures that routinely store relatively small result sets and use cached temporary tables are probably most at risk.  This is quite a common scenario: temporary tables are often used for smaller sets (writing away large temporary tables is more difficult to justify).  One scenario that springs to mind is where a delimited string is passed as a parameter, split to a cached temporary table, which is then used to drive a more complex query.  The plan for the complex query may be very sensitive to the cardinality and statistical distribution of values in the temporary table, so 100% wrong cached statistics and/or inappropriate plan reuse may be a problem.  I’m sure there are many other scenarios.

It is quite ironic that the poor temporary table execution plans shown in this post are caused by the thing that is perceived to be their greatest strength: the availability of distribution statistics.  Some care is needed when choosing between table variables, temporary tables or another technique; the right choice in any particular circumstance will depend on many factors, only a few of which have been discussed today.  Whichever option you choose, be sure to understand the choice you have made and the trade-offs that result.

There are also times where the best-performing solution will be to carefully craft one huge query, perhaps using hints and/or specific syntax tricks.  This approach is perfectly valid but requires careful thought as future maintenance may be harder, and data changes over time may mean the carefully-designed manual execution plan gradually (or suddenly) becomes sub-optimal.  My general preference is still for simpler relational queries and, where necessary, for temporary tables over table variables as a starting point, but the details do indeed depend on the details.  If you happen to be attending the PASS Summit 2012, I will be speaking more about these sorts of things in my half day and regular sessions.

I hope you enjoyed this post, the next one will cover details of temporary table caching that I did not have space for today.  I would like to thank Dave Ballantyne (blog | twitter) for his help with today’s post.

Further Reading

Working with tempdb in SQL Server 2005 – TechNet Library
Plan Caching in SQL Server 2008 – MSDN Library

image

© 2012 Paul White
email: SQLkiwi@gmail.com
twitter: @SQL_Kiwi

Published Wednesday, August 15, 2012 5:22 AM by Paul White

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

AlexK said:

Paul,

This is a very interesting post. I asked my team to read it. Thank you for taking time to write it up.

If you create temporary tables implicitly, via SELECT INTO, do you observe the same behavior?

AK

August 14, 2012 10:40 PM
 

SomewhereSomehow said:

Good post Paul!

Very interesting note about colmodctr and option(recompile), as I understood it is assumed to be empirical and not documented anywhere? If so, you should definitly file a connect item, I'll vote.

Because it seems to be logical that option(recompile) doesn't affects stats (or we would have a lot of update stats when using option recompile for other purpose), but it is not logical, that it starts accumulating mods and suddunly leads to update stats. It may lead to a kind of "mystery behaviour" when debugging. So thank you for pointing that!

Also I consider that explicit update stats should cause a recompilation in any way (I even tried explicitly create stats - no effect, the only issue to deal without recompile and updatestats is to put "alter table #temp rebuild;" - command, than it is effect, but that is realy nasty, especially for big tables, but hopefully may be not so harmfull for small...)

August 15, 2012 11:18 AM
 

Paul White said:

Hi Alex,

Thanks for that, I appreciate it. Yes, the same behaviour applies to temporary tables created with SELECT...INTO, as the modified version of the test procedure below demonstrates.

ALTER PROCEDURE dbo.Demo
   @StartsWith nvarchar(50)
AS
BEGIN
   SET NOCOUNT ON;

   SELECT
       p.ProductID,
       p.Name
   INTO #Temp
   FROM Production.Product AS p
   WHERE
       p.Name LIKE @StartsWith + N'%';

   SELECT
       t.Name,
       OrderCount = COUNT_BIG(DISTINCT th.ReferenceOrderID)
   FROM #Temp AS t
   JOIN Production.TransactionHistory AS th ON
       th.ProductID = t.ProductID
   GROUP BY
       t.Name;

   DBCC SHOW_STATISTICS (N'tempdb..#Temp', Name) WITH STAT_HEADER, HISTOGRAM;

   DROP TABLE #Temp;

END;
GO
DBCC FREEPROCCACHE;
EXECUTE dbo.Demo @StartsWith = N'E';
EXECUTE dbo.Demo @StartsWith = N'T';

That said, tables created this way tend to have at least one index created explicitly after the SELECT...INTO, which will prevent temporary table caching (see the White Paper for details).  That action will prevent statistics being cached across executions, but does not the solve the other issues shown in the main post.

Cheers,

Paul

August 15, 2012 12:36 PM
 

Paul White said:

Hi Dima (SomewhereSomehow),

Thanks.  I will try to put a Connect item together soon, and update the post above with the details.

An explicit CREATE STATISTICS counts as DDL on the temporary table, which prevents temporary table caching (and the caching of statistics that does with it), but as I mentioned to Alex Kuznetsov above, that does not solve the other issues (and it means we don't get the benefit of temporary object caching either, of course).  Rebuilding the table works, naturally, but it is, as you say, not a happy solution!

I'm glad you enjoyed this post; there will be more details on the caching mechanism in the next one.

Paul

August 15, 2012 12:44 PM
 

Paul White said:

August 15, 2012 2:39 PM
 

Alejandro Mesa said:

Paul,

Great post, as always!

I knew about SQL Server caching temporary tables, but I would never imagine that statistics created by AutoStats on those tables were cached too. More intriguing to me is the fact that there is no recompilation of the cached plan referencing the cached temp table, after manually updating the cached statists that were found interesting during compilation. I am also surprised about the way OPTION (RECOMPILE) carried column modifications between procedure calls.

I already added my voted.

Thanks for sharing your findings with us.

--

AMB

August 16, 2012 11:50 AM
 

Paul White said:

Hi AMB,

Do you realize you just summarized in 30 words what it took me 6,600 words to say?  LOL!  Seriously, thanks for that - it's good to hear that the points I was trying to make were received as intended, and that you found it as surprising as I did.  Thanks also for the vote.

Paul

August 16, 2012 12:14 PM
 

WIDBA said:

Wow - I think this really explains some stuff that I used to fight with in "search" procedures using temp tables.

Thanks for another fantastic read - that I will need to read one more time to digest.

August 16, 2012 2:07 PM
 

Paul White: Page Free Space said:

SQL Server 2005 onward caches temporary tables and table variables referenced in stored procedures for

August 16, 2012 5:51 PM
 

Martin Smith said:

Regarding Recompile Thresholds I've encountered a case where increasing table cardinality from 10 to 1,000,000 still doesn't seem to cause an optimality based recompile.

https://connect.microsoft.com/SQLServer/feedback/details/725697/missing-optimality-based-recompile-on-cached-temporary-table-with-clustered-pk

August 17, 2012 8:54 AM
 

Sat Pal said:

Great Post Paul!

I usually get distracted in between when there is a big post and takes me longer than usual to read through the whole post, but in this one I was hooked and read it in one go and couldn't believe what I used to beleive about the Temp Tables.

Would the bevaiour of the temp table be same if there lot of execution happens simultaneously on the same stored procedure in highly busy OLTP server (around 20,000 executions/min of a stored procedure). Would it still use the same cached temp table or multiple copies of the temp table, how the colmodctr would behave if there are multiple temp tables are created?

Sat Pal

August 17, 2012 1:57 PM
 

Paul White said:

Hi Martin,

Yes, on the face of it, I would expect that to recompile due to the change in cardinality of #T from 10 when the plan was cached to 990000 for the second execution.

I notice that it does behave as I would expect so long as there are no leading-column statistics associated with an index on column 'n' of #T.  There are two things I tried: (1) commenting out the PRIMARY KEY, so #T becomes a heap and *auto-statistics* are created for column 'n'; (2) changing the PRIMARY KEY to (s, n) - a clustered table without leading-column index statistics for 'n'.  (The odd behaviour still reproduces if the PRIMARY KEY is NONCLUSTERED on 'n'.)

It seems statistics associated with a constraint on a cached temp table might have a different behaviour from auto-statistics.  I'm about to leave for a weekend away, so I won't have chance to dig into this further for a few days.

There is no shortage of odd behaviour in this area, and I am sure I didn't cover it all even in this very long post.  One thing to try: run your repro again, but after EXEC t2 10; display an estimated plan for EXEC T2 1000000; just before running EXEC T2 1000000; 'for real'.

The table cardinality of #T in the estimated plan is zero. The cached temporary table is not reused.  A *new* temporary table briefly exists, creates *new* statistics and is then dropped.  The 'real' execution of EXEC T2 1000000; that follows recompiles with a reported reason of 'Option (recompile) requested' - even though we didn't.

Honestly one can go gradually mad trying to make sense of all this.

Paul

August 17, 2012 4:39 PM
 

Paul White said:

Hi Sat Pal,

I was a bit worried about the length of this one, so thanks for the feedback on that.

I think I answer your questions about simultaneous executions in my follow-up post (at http://bit.ly/TempCaching).

Paul

August 17, 2012 4:50 PM
 

Martin Smith said:

Strange behaviour indeed that requesting an estimated plan should affect things like that!

I've also found that the modification counter doesn't always seem to get updated.

Starting with an empty procedure cache whether or not the rcmodified column in sys.sysrscols is updated after executing the procedure seems to depend upon the value passed to the stored procedure and hence temp table size.

For me anything between 1 to 277843 (temp table size = 275065 rows) leaves rcmodified at 0. Whereas from "EXEC T2 277843;" upwards  (temp table size = 275066 rows) updates this value.

August 18, 2012 9:20 AM
 

MarkD said:

Great post Paul, thank you very much for taking the time!

August 18, 2012 12:26 PM
 

Nakul Vachhrajani said:

A very, very interesting post! Thank-you for taking the time out and writing it, Paul! I will be reading it again tomorrow, ask the team to read it and then spend a couple of hours during week-ends trying to digest everything. Thank-you very much!

August 19, 2012 2:07 AM
 

Jose Varghese said:

Hi Paul,

Excellent post!

Some time back I had my encounter with parameter sniffing. All that I could find online was just a few posts - providing just high end theoretical information.

Compared to that, your post is entirely different in content.. good job.

Thanks to my friend Shyam Viking for bringing my attention to this post.

August 19, 2012 10:50 AM
 

John Shahan said:

I have a simple question given the complexity of the subject matter.

Would it make sense in some cases to dynamically (programatically) change the name of the temporary tables everytime the stored procedure is run in order to be sure you have accurate statistics.

August 20, 2012 1:17 PM
 

Paul White said:

Martin,

Connecting via the DAC to see sys.sysrscols (or sys.sysrowsetcolumns in 2005) I have noticed that issuing a CHECKPOINT seems to cause the in-memory counters to be flushed to persistent storage, so you might want to try that.  With larger values, you might be seeing flushing based on number of changes (this is just speculation on my part) without needing CHECKPOINT.

That said, in my simplified tests I found it more convenient to query rowmodctr from sysindexes on a regular connection.  So long as one bears in mind the differences between colmodctr and rowmodctr, this can still be an effective technique.

Paul

August 20, 2012 9:39 PM
 

Paul White said:

Hi John,

Good question.  You certainly could use dynamic SQL to create temporary tables each time, and this would prevent caching as I describe in the follow-up post to this one at http://bit.ly/TempCaching

I can't think of any cases where this sort of scheme would be worth the effort though.  All it achieves is to prevent temporary table caching, so you lose the benefit of that, and statistics creation and deferred compiles will happen on each execution of the new stored procedure anyway.  It seems to me that there are better way to achieve the same effect (at least ways that would be easier to maintain in future).

My own preference at this point in time, is to use UPDATE STATISTICS and OPTION (RECOMPILE) wherever I think temporary table and statistics caching might cause me a problem.

Paul

August 20, 2012 10:00 PM
 

KKline said:

Just wanted to say that this is a fantastic post, Paul. Well researched and well written!

Are those Plan Explorer screen shots, btw?  Just curious...

Thanks again,

-Kevin

August 21, 2012 12:02 PM
 

Paul White said:

Hi Kevin,

Thanks!  Yes, the screenshots are from SQL Sentry's Plan Explorer; I find its display style convenient for blog posts.

Paul

August 21, 2012 6:49 PM
 

Paul Randal said:

Nice post Paul - will be linking to it in my class deck on tempdb and temp table abuse.

August 22, 2012 4:55 PM
 

Paul White said:

Awesome!  Thanks Paul.

August 22, 2012 6:07 PM
 

Satyender M said:

Thanks a ton Paul...

Wonderful article..

October 31, 2012 3:41 AM
 

Girijesh said:

Thanka a billion Paul.

November 27, 2012 2:22 AM
 

Karna said:

wonderful and helpful article. thanks @Paul White

January 5, 2013 6:23 AM
 

Revanth said:

Wonderful article ! Thanks Paul

February 12, 2013 11:25 PM
 

Francois said:

Very nice post. I can think of several stored procedures I have to look at now and optimize. Thank you!

February 13, 2013 11:44 PM
 

Pedro Ponce said:

Very nice post.

April 5, 2013 6:32 PM
 

Neelam said:

Awesome!!!

April 28, 2013 5:06 AM
 

Gary said:

Excellent post, really useful.  But I just checked your Connect bug submission - Microsoft have just closed it in thye past week as "Won't Fix".  Apparently it's too obscure to invest any effort in.

May 2, 2013 7:13 AM
 

Paul White said:

Hi Gary,

Thanks - and yes, I noticed *sigh* :)

Paul

May 2, 2013 11:06 AM
 

akshay said:

Excellent Post,its very useful.

I have one question-

I want to add new row in temporary table as like we add new row in data table in C#,Is it possible?

Reply..

May 21, 2013 2:50 AM
 

JRStern said:

Paul, this is fantastic stuff, I read this a couple of months ago and have been applying it to some of our SPs, and I believe it is helping.

What we had are SPs that, a few times a day, would just seem to lose their minds and run 100x slower.  The same parameters rerun would run much faster.  I'm pretty sure this is "The Paul White Bug", that's how I'm labelling it in my code!  Applying these update stats to newly created #temps, seems to stabilize the behavior.  Sometimes I get ambitious and wait until I repopulate the SPs before doing the update stats, tradeoff about cost and accuracy.

As for Microsoft closing it in Connect - I've had very poor experience with Connect for a long, long time, not sure what else to say about it.

June 17, 2013 1:39 PM
 

Paul White said:

JRStrern,

I think we interacted a couple of times on the MSDN forums, so hello again :)

I'm so glad to hear you have found this entry useful, thanks very much for letting me know of your experiences.

Paul

June 17, 2013 4:02 PM
 

JRStern said:

... I meant repopulate the #temps of course.

Sure, I often think out loud on the forums, just posting it solves the problem half the time, you know how it is! :)

Appreciate all your work, whereever it goes!

J.

June 17, 2013 5:59 PM
 

Neeraj prasad sharma said:

Hi paul,

I am(yr FAN) a regular reader of your blog and a first time commenting.

your findings and dedication to make the community stronger is phenomenal,

could you please give me some link or examples to back yhis statement

"In addition, ordinary tables do not benefit from engine optimizations that only apply to genuine temporary tables, so they tend to be less efficient, even if created in tempdb."

will be helpfull.

Thanks

October 31, 2013 3:17 AM
 

Paul White said:

Many of the differences can be found in the tempdb White Paper at http://technet.microsoft.com/library/Cc966545. You might also like to read the Storage Engine blog entries on tempdb at http://blogs.msdn.com/b/sqlserverstorageengine/archive/tags/tempdb. Finally, there is the three-hour Bob Ward video from PASS Summit at http://www.youtube.com/watch?v=SvseGMobe2w

October 31, 2013 3:36 AM
 

Kenny said:

Thanks for the fantastic post!

Is the "DROP TABLE" necessary to produce this behavior? Or will the implicit drop (when the connection on which the procedure is executed closes) also result in caching of the temp table and its auto-created statistics?

January 8, 2014 10:49 AM
 

Paul White said:

Hi Kenny,

No, the DROP TABLE isn't required - the implicit drop has the same effect.

Paul

January 8, 2014 9:04 PM
 

Alex Friedman said:

Great post, thank you!

How about using "WITH RECOMPILE" on the SP level? I noticed in my tests that it did force the creation of a correct plan and statistics every time.

January 14, 2014 3:59 AM
 

Paul White said:

Hi Alex,

Yes, though as I mentioned in the main text:

"Executing the procedure with the WITH RECOMPILE option will produce better plans, at the cost of recompiling the whole procedure, and the resulting plans will not be cached for reuse. In general, recompiling the whole procedure every time could be bad since many procedures have many more statements to recompile than just a single INSERT and SELECT."

Paul

January 14, 2014 9:31 AM
 

Alex Friedman said:

Whoops, missed that, thanks.

But what's the explanation for the SP-level RECOMPILE working differently than the statement-level RECOMPILE? I mean, even if we add OPTION(RECOMPILE) to every single statement in the SP, it would still behave differently.

My guess is that the SP-level RECOMPILE disables the temp table caching.

January 15, 2014 1:54 AM
 

Paul White said:

When RECOMPILE is specified on the procedure, there is no top-level plan cached for the statistics to be associated with.

January 15, 2014 7:17 AM
 

Alex Friedman said:

Well that's better. Interesting!

January 16, 2014 1:47 AM
 

Derek Slone-Zhen said:

Just wanted to say thanks for the write up, helped me tonnes.

Really disappointed to see Microsoft ignoring your connect item simply because the issue has been there for so long.  Some times it takes "us" (the public) this long to realise the crazy s**t that their software is doing!

Thanks again,

Derek

February 12, 2014 4:52 AM
 

Lara Rasner said:

Hey Paul:

Thanks so much for this. I've been racking my head against a wall trying to figure out why estimated and actual rows are so vastly different in my environment I inherited where the use of #temptables inside stored procedures is insanely abused.

Have you submitted this as a question of the day to SQL Server Central? If you haven't, would you give me permission to do so and I will give you credit, and link to your post?

Thanks.

March 21, 2014 6:49 PM
 

Paul White said:

Hi Lara,

Glad you found it helpful. I haven't written an SSC QotD for this - and you don't need my permission to do so yourself :) Have fun.

March 21, 2014 8:13 PM
 

Lara Rasner said:

Thanks for the permission Paul. The QotD is up today and getting lots of good feedback.

April 11, 2014 12:32 PM
 

Paul White said:

Happy to report I answered correctly and earned 2 points :)

April 11, 2014 1:01 PM
 

Bill Kline said:

An informative and incredible write up.  Great job!

April 18, 2014 11:41 AM
 

tina said:

hai paul,

is there any impact if i create many #temp table in one store procedure?

August 19, 2014 3:38 AM
 

yaroslav said:

Thanks Paul, still need to diggest all the information on this blog post and the following one about temporary tables and caching but so far it has been amazing. I'm pretty sure the perf problems we are having (search sp's with lot of temporal tables) have lot to do with all you explain here. Thanks a lot, for real...

August 22, 2014 9:55 AM
 

Derek D said:

Great post Paul!  I couldn't agree more regarding breaking out complex queries into intermediate temp tables.  Some people talk about the tempdb usage, but I've yet to find a complex query that didn't create a worktable.  Better to do it explicitly I think. Also you have to know where to break the query up.

It's also nice knowing simple plans will survive multiple upgrades of the SQL Engine.

September 22, 2014 4:42 PM
 

Paul White said:

Hi Derek,

Thanks for the comment, you make some great points.

Indeed, the skill is in knowing *when* and *where* to simplify a query using a temporary table. I occasionally come across systems where someone has just gone nuts with the idea, leading to tempdb contention or other issues, but for the most part I find even experienced SQLers underuse this idea.

Paul

September 22, 2014 5:38 PM
 

Nida said:

I really enjoy this post. Thanks so much Paul.

October 15, 2014 1:14 AM

Leave a Comment

(required) 
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement