THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Joe Chang

Join Row Estimation in Query Optimization

This topic is titled to specifically consider only row estimation after joins, precluding discussion of row estimates at the source table, which has already been addressed in papers covering the new Cardinality Estimator in SQL Server 2014 and other statistics papers for previous versions of SQL Server.

There are certain situations in which the query compile time can be excessively long even for queries of moderate complexity. This could be true when the plan cost is very high, so that the query optimizer expends more effort to find a better plan before reaching the time out, that is, the time out setting appear to be a function of the plan cost. Even then, the query optimizer could still make row estimation errors in the operations after the source table (for which data distribution statistics are kept on columns and indexes) of sufficient consequence that renders the remainder of the plan un-executable in a practical time frame.

The new cardinality estimator in SQL Server 2014 is helpful in resolving known issues, but has little to improve row estimates after the initial access at the data source beyond fixed rules which may be more generally true than the rule used before. That said, the query optimizer only attempts to estimate rows (and other cost factors) using a combination of the information it has, and rules for situations where there are no grounds for making a knowledge based estimate.

So why be bound by this rule of estimating only? The other company (only sometimes in the news concerning databases) has (recently introduced?) adaptive query optimization that can make run-time adjustments to the execution plan. Sure that’s nice, but I am thinking something more sophisticated could done in forming the plan but less complex than the ability to change course in the midst of execution.

Obviously, if a query has an obviously correct execution plan that can be determined by existing techniques, and the plan cost is fairly low, then no change is necessary. Further special technique should only be considered for expensive queries, particularly in which the row estimates at the intermediate steps are difficult to assess.

Consider query below
SELECT bunch of columns
FROM Nation n
JOIN Customers c ON c.NationId = n.NationId
JOIN Orders o ON o.CustumerId = c.CustomerId
WHERE n.Country LIKE ‘Z%’

From statistics, we know approximately how many countries have a name beginning with Z. SQL Server also has a histogram for the NationId column in the Customers table. If we had specified the list of NationId values (with equality on to both tables), SQL Server could use the more detailed information from the histogram to make a row estimate. But because we specified the Country name column that is only in the Nation table, we must use the average number of customers per county (in the density vector) multiplied by the number of countries to estimate rows after the join to customers.

And next of course, all customers are not alike. There are customers who place a small, medium or large number of orders. So why expend a great deal of effort to find a plan from all the possible join orders and index combination based on only estimates of rows when it is known that data distribution in each succeeding table is heavily skewed. Why not pre-execute the tables for which a SARG was specified to get the column values used in the join to next table so that the more detailed histogram distribution information. This technique could be pushed to multiple levels depending on the initially assessed plan cost, and perhaps controlled by a query hint.

For example, in the above query, suppose NationId’s 19, 37, 42, and 59 meet the criteria Country beginning with Z. The optimizer would next look at the Customers table histogram on NationId for these values to estimate rows after the join. If the situation warrants, the next level could be examined as well.

It could be argued that the query optimizer should not execute the query to determine the plan, but why follow that principle if the cost of query optimization is excessively high (several seconds) in relation to relatively minor effort to make a more extensive reconnaissance (of tens or hundreds of milli-seconds)? Especially considering that the reliability of row estimates becomes progressively worse after each join or other operation beyond the original source?

This technique should probably be used when there are tables with search arguments joining to tables on columns with highly skewed distribution. The first implementation might be activated only be a query hint until some maturity is achieved, followed by greater use.

Presumably there might be a cost threshold as well. I would prefer not to tie it with parallelism. Of course, given the nature of modern systems, it really is time for the cost threshold for parallelism and max degree of parallelism to have graduated controls, instead of the single setting on-off.

Side note 1

OK, now forget what I said at the beginning and I will gripe about SQL Server default statistics. It has been discussed else where that SQL Server uses random page samples and not random row samples, as this is a much less expensive way to collect data. It does use an index for which the column is not a lead key if available, to improve randomness. Still, I have notice a severe sensitivity to sampling percentage in cases where the column value is correlated with (page) storage location.

So I suggest that as the page sampling is in progress, a count of new values found in each successive page sampled versus existing values be kept. If the number of new values found falls off sharply, then most distinct values have probably been found, and the interpretation of the existing sample is that its distribution should be scaled by ratio of total rows to the rows sampled. If almost of the values in the last few pages sampled are new (not previously found), then the interpretation should be that the number of distinct values should be scaled by the total to sample ratio. And some blend when there is an intermediate number of new values versus previously found values in each successive page.


The query optimization/plan compile process is single threaded. The modern microprocessor might be 3GHz, so a 10 sec compile is 30 Billion cpu-cycles. And I have seen compiles run more than 100 sec? One query even broke SQL Server, of course that was a set of deeply nested, repeating CTE's that should have been PIVOT/UNPIVOT. So why the principle of optimizing based on unreliable estimates when an index seek is a mere 10 micro-sec? Presumably the key column statistics have already been decoded.

It would be nice to have a more powerful query optimizer, but there is a method for writing SQL to get a specific execution plan, Bushy Joins. Of course the other element in this is known what the correct execution plan is. This involves not what the query optimizer uses for cost model, but what the true cost model is.

Published Monday, July 07, 2014 9:43 AM by jchang

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS



Chris Adkin said:


The other database company you allude to has a number of advanced options, the following connect items have been raised for similar functionality to be introduced into SQL Server:

- Connect item 681282

An offline optimisation mode which generates what are akin a cross between enhancement hints and statistics which scale iterator cardinalities, it does this via SQL Profiles. 'Larry' comes up with these by in part executing the query.

- Connect item 676224

The 'Other' database engine has the ability to sample the base data at run time which can be controlled at statement level.

- Connect item 377959

The other database furnishes "Adaptive cursor sharing" whereby plans which associated with skewed data are marked as being bind ( parameter value )  sensitive, plan baselines are maintained for such statements, if new parameter values come along the database engine determines ( based on the stored plan base line ) that the statement in conjunction with the new parameter values would benefit from a new plan ( by looking at the stored plan base line ), a new execution plan will be created.


July 7, 2014 10:51 AM

Joe said:

thanks Chris. I don't know what it takes to sway the SQL Server team these days. I suppose we have make some argument that it is critical for Azure functionality.

On the third item, my preference would be that 2 or perhaps 3-4 compiled plans be retained for a given procedure, then the correct plan being chosen depending in the current parameter value, rather than deciding that a certain value needs a new plan.

July 7, 2014 11:13 AM

pmbAustin said:

I had a related issue with a query that vexed me to no end.

Basically, the plan for small or reasonable size databases (thousands to tens of thousands of rows) was exactly right.  But for some reason, when run against a large database (millions of rows), the plan was completely wrong.  And at root of this 'wrongness' was a cardinality estimate, according to the plan, of a return of just a few rows, when the ACTUAL execution of the plan did a full table scan of millions of rows.

I couldn't figure out how to influence the optimizer to stop making this really obvious bone-headed mistake.  There was a join of a very small temp table with this million+ row table in the middle of the query... and then a WHERE clause that filtered.  Because of the cardinality estimate being wrong, the optimizer chose to "scan" the results of the large table query first (scanning millions of rows) rather than do a direct index filter via the join with the small table (and then only having to scan a dozen rows).

All the correct indexes were there, and as I sad before, the plan was exactly as expected when there were more 'normal' counts of rows.  Nothing I did (rebuilding indexes, statistics, or tweaking the SQL) made any difference.  

Having SOME sort of control to say "No dummy, stop doing that stupid thing you're doing" would be really nice.

July 7, 2014 2:45 PM

jchang said:

temp tables can be particularly troublesome. I have seen the optimized assumed that the join to a temp table column was many-to-many, example 1000 rows in the temp table, expected to match to 1:1 or 1:few,

but the optimizer assumed each row from temp mapped to 1000 in the permanent. Sometime just creating an index you don't need, but it fixed this.

The nuclear solution is to, gulp, implement a cursor, like we were told not to. then in the cursor loop, execute a stored proc or parameterized sql with a recompile on each to optimize for the parameter value.

Of course, all this goes away with the "take a sneak peak" (TK's words)

July 7, 2014 3:24 PM

SomewhereSomehow said:

Hello, Joe. It might be interesting and relevant to the topic, the new CE is able to create filtered statistics "on the fly" under some circumstances. That might be helpful sometimes in the join estimation, but of course it is not free and has overhead. Here, I've described some details:

July 8, 2014 4:06 AM

jchang said:

great investigation. it looks like CE is doing what I described above. presumably this would not be invoked unless the plan cost was high, and perhaps also if it is known that the column in the next table was heavily skewed, making the effort worth while. If there is an index on the SARG column of the first table, followed by the join column, then the filtered statistics could be generated quickly.

July 8, 2014 3:24 PM

Leave a Comment


About jchang

Reverse engineering the SQL Server Cost Based Optimizer (Query Optimizer), NUMA System Architecture, performance tools developer - SQL ExecStats, mucking with the data distribution statistics histogram - decoding STATS_STREAM, Parallel Execution plans, microprocessors, SSD, HDD, SAN, storage performance, performance modeling and prediction, database architecture, SQL Server engine

This Blog


Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement