What are the top core SQL Server engine performance issues today, after all the improvements that have gone into 2005 and 2008? (I am excluding matters beyond the power of Microsoft, like eliminating bad developers.)
SQL Server already has statement level recompile for multi-statement stored procedures. But there is no option to change a plan for a single statement in the course of execution. SQL Server also it has the OPTIMIZE FOR hint and Plan Guide options. This allows for a single plan for a particular query. What if there needs to be multiple plans for a single query?
Runtime Plan Change.
Consider a single SQL query, with multiple table access steps in the execution plan. It is not always possible to estimate row count at each step in the execution plan. This is why it is called an estimate. (I know some complain when the estimate is off by 10%, Lubor once said he is interested in estimates off by 2X, but I consider catastrophic errors tend to be orders of magnitude, 10-1000X or more). Suppose the SQL Server engine starts the first steps of execution plan. The estimated row count was low, so the subsequent steps employ index seeks with nested loops joins to complete the query. In actuality, the row count is very large, so table scans with hash or merge joins are much better. The opposite scenario is a very high estimate row count that turns out to be low. The remaining steps of the execution plan involve table scans with hash or merge joins, when index seeks with nested loops joins would be far more efficient. Now there are two options. One is to retain the current partial results, but change subsequent steps, to account for the new row count. In some cases, it will be much better to discard the current progress and generate a completely new plan based on the new row count. If this could be implemented, my thoughts are that this should only be enabled explicitly for known problem queries instead of a system wide on/off switch.
Multiple Execution Plans for a single SQL
Now consider a single SQL query, involving multiple search arguments. Depending on the input parameters, the row count involved for each argument varies drastically. So the Optimize For and Plan Guide options can only optimize for one particular scenario. I have handled this by making two or more copies of the original stored procedure, then having the original procedure check actual data distribution, and then call one of the copies. The sub-procedures are identical, but copy A is only called when parameter 1 is high, parameter 2 is low, and Copy 2 is called when parameter 1 is low and 2 is high. It is not necessary to have one copy for each possibility, only if that possibility requires a significantly different execution plan.
My thought on handling this automatically is to extend the OPTIMIZE FOR hint to from specifying a single parameter value to multiple.
Example: instead of
OPTION (OPTIMIZE FOR @P1 = 1, @P2 = ‘A’, …)
OPTION (OPTIMIZE FOR @P1 = 1 or 2, @P2 = ‘A’ or ‘B’, …)
The presumption above is that @P1 values 1 and 2 have drastically different distributions that really need separate execution plans. So the SQL Server engine would generate a plan for each option. At run time, depending on whether the input parameter has distribution closer option 1 or 2, the appropriate plan is selected. Of course, this is a rather sophisticated hint that should not be used by anyone without detailed understanding of the execution plan. Also consider that as data distribution changes over time (or on statistics recomputed) the Optimize For values might be rendered ineffective. The SQL Server optimizer could also check whether each Optimizer For option actually leads to a significantly different plan, and discard irrelevant options.
Microsoft might take the position that statement level recompile is the right solution for this. My argument only stands if the cost of recompile is high relative to the cost of execute for the desired plan. So maybe an alternative is I want to use the OPTIMIZE FOR in the low row count scenario, but a RECOMPILE if the actual parameter value has high row count.
Correlated Column Statistics
I posted this a comment in another blog, but am moving here. I do not consider a top issue, but desirable. The reason SQL Server needs correlated column statistics (it is in at the vector level, but not at the histogram).
Consider the query:
SELECT * FROM Women
WHERE Attractiveness > 9 AND MonthlyShoppingExpenses < $3000
SELECT * FROM Men
WHERE FinancialStability > 9 AND Fidelity > 9 AND IsSingle > 50%
The individual column statistics may indicate 10% on each column condition, leading one to think that the combined AND conditions yields 1%, which will still have a respectable row count output. But in fact, if correlation is taken into account, a drastically lower row count is the result. Yes, I know: get a life, and I have been called MCP before the MS had the MCP program.
Index Seek with Key Lookup versus Table/Clustered Index Scan
In previous blogs, I also complained about the static approximate 4:1 (1350:320) ratio in table scan versus index seek followed key lookup. That is, the cross-over point from an index seek with key lookup to table scan occurs when the number of rows requiring key lookups reaches approximately one-fourth the number of pages. SQL Server 2008 has the hint OPTION (TABLE HINT (table,FORCE SEEK)), which pushes the index seek with key lookup regardless of the number of rows.
I commented that this was rather like giving us a sledge hammer when a chisel was preferred. My preference is to have a hint that changes the cross-over ratio from 4:1 to 1:1.
Hash and Sort spill to tempdb
Back when SQL Server 2005 was in late beta, I looked at SQL Server 2000 and 2005 execution plans with Hash (Join and Match) and Sort operations. I noted that the point at which hash and sort operations incurred an IO cost was function of the size of the intermediate table and system memory. Naturally AWE memory in 32-bit system does not count, i.e. only directly addressable memory counts. I did not verify that this was the actual point at which a hash or sort spooled to tempdb, and I did not notice any performance issues on my systems with adequate storage performance. Of course, many people have really crappy storage performance. Should there be a tunable parameter for handling this? Especially now that full 64-bit systems with huge (128GB) memory (and crappy storage performance) are common? Or does this fall under matters beyond the power of Microsoft?
Default Settings for Cost Threshold for Parallelism and Max Degree of Parallelism
Oh yeah, probably a tier 2 item. It is high time Microsoft change the default settings for these. A default of 0 or all available is potentially silly and severely adverse on systems with multi-core processors. The current cost threshold for parallelism of 5 is also really silly. I recall seeing queries that run in 50ms (non-parallel) get a parallel plan. This is way too low considering the overhead of setting up parallel execution. At minimum, the cost threshold should be recalibrated so that parallelism is not initiated until true CPU (on Core 2 or later architecture) is around 300-500 milli-sec. This is almost getting like how Oracle left the default buffer cache setting to 8-16MB, when system memory typically reached GBs, and the "DBA" neglected to change the default.
Multi-level Cost Threshold and Max Degree of Parallelism
I am thinking this should really be a bi-level or even multi-level setting. That is, a plan cost of 50-1000 allows MAXDOP 2-4, and plan cost 1000-10000 targets MAXDOP 8-16, and so on. (I am winging these numbers. If considered, then I can a more serious study on proper values.)
New Parallel Execution Plan Formulas
I have previously griped about the formulas used in parallel execution plans (OK, so you name it, and I have complained about it!). Anyways, the current execution plan cost model is preponderantly IO weighted, and IO costs are not reduced in parallel execution plans (except for large hash and sort operations). This has the effect of inhibiting parallel execution when in fact parallel execution would be beneficial, especially when almost all hot data is memory, so the IO argument does not hold. One more complaint/request. The bitmap filter operation in large parallel hash operations really help. In some cases, the parallel operation must be inhibited, but it would still be nice to employ the bitmap filter, even though large hash operations should be parallel.