One of the core assumptions made by the SQL Server query optimiser’s model is that clients will consume all of the rows produced by a query. This results in plans that favour the overall execution cost, though it may take longer to begin producing rows. Let’s look at an example:
The optimiser chooses to perform the logical join using a Hash Match physical iterator, resulting in a plan with a total estimated cost of around 1.4 units. By forcing alternative physical joins using a query hint, we see that a plan based on a merge join would have an estimated cost of just under 10, and a nested loops join would cost over 18 units. All these cost estimates are based on the assumption that all rows are required.
As detailed in a previous post, the Hash Match iterator starts by consuming all rows produced by its build input (the Product table) in order to build a hash table. This makes Hash Match a semi-blocking iterator: it can only start producing output rows once the build phase is complete. If we need the first few rows from the query quickly, this join type may not be optimal.
How quickly a merge join starts producing rows depends on whether any of its inputs need to be explicitly sorted. If the merge can acquire ordered rows on both inputs from an index or a prior operation without sorting, it can stream rows to its parent without a start-up delay.
The sort query plan iterator is always a blocking operation, since we can’t know which row will sort first without examining them all. In the example query, the rows from the history table do need to be explicitly sorted, so merge join would not be a good choice to get the first rows quickly.
Our third join option is nested loops, which is a generally a pipelined iterator. This means no start-up delay, and the first matching rows are returned quickly. Unfortunately, it also has the highest per-row estimated cost of the three physical join options.
The nested loops iterator executes its inner (lower) input once for every row produced by its outer (top) input, making its overall cost proportional to the product of the number rows received from its child inputs. By contrast, the hash and merge join strategies scan each of their inputs once, resulting in a cost which is proportional to the sum of the input rows.
The TOP Clause
Returning the first few rows produced by a query is a problem easily solved from the query-writer’s perspective: just add a TOP clause. Imagine that we are just interested in seeing one row of output from our query:
SELECT TOP (1) *
FROM AdventureWorks.Production.TransactionHistory H
JOIN AdventureWorks.Production.Product P
ON H.ProductID = P.ProductID;
The original query returns 113,443 rows. This query returns just one, and does so very quickly. So far so very obvious, you might be thinking – but all is not what it seems.
A Brief Digression Concerning TOP and Deterministic Results
Before I go into that, I should mention that the TOP (1) query above contains a ‘bug’: There is no ORDER BY clause to determine which row qualifies as ‘top’. In principle, the row produced by that query could be selected at random. In practice, the results depend on the query plan and internal implementation details in SQL Server. Since either of those things could change at any moment, you should almost always specify a deterministic ORDER BY clause when using TOP.
If you’re wondering what can be non-deterministic about an ORDER BY clause, consider what might happen if we added “ORDER BY P.ProductID” to the query. A difficulty arises since ProductID is not unique for each row of the result set (the transaction history table may contain many rows for each product). As a concrete example, assume that product #1 is the lowest ProductID, and there are 75 history records for that product. These 75 records all sort equally on ProductID, so which record will be returned by a TOP (1) operation?
There are two ways to make the operation deterministic. First, we could rewrite the ORDER BY to include a combination of columns that form a unique key over the output rows; adding Transaction ID from the history table would achieve that here. Second, we could add WITH TIES to the TOP specification, though this would produce more than one row in case of duplicates.
I have omitted the ORDER BY on our sample query because I truly do not mind which row we get, and I need to avoid changing the semantic of the original query to demonstrate a point.
The TOP (1) Plan
The TOP (1) query plan looks like this:
That looks very different from the original query plan, which featured a hash join. Clearly, when we add a TOP clause to a query, the optimiser does much more than just add a TOP iterator to the root of the plan it would have produced normally. For comparison purposes, here is a query plan produced for the TOP (1) query when a hash join is forced:
The important thing to notice is that the build input to the hash join must consume all 504 rows in the Product table before it can start probing the hash table for matches using rows from the TransactionHistory table. Like all hash operations, the hash join also requires a memory grant before the plan can start executing. The estimated cost of the hash join plan is 0.1, whereas the loops join plan selected by the optimiser has an estimated cost of 0.01 units.
The challenges involved in producing an optimised query plan for row-limited queries, while retaining good general optimisation performance for full-result queries, are more complex than simply replacing hash or merge join iterators with nested loops. It would be reasonably straightforward to cater for queries with a TOP at the root of a plan, using specific code designed to recognise specific scenarios. However, that approach would miss wider opportunities for plan optimisation in more general cases.
The TOP clause can be specified multiple times, in multiple places in a query declaration: in the outermost scope (as in the example); in a sub-query; or in a common table expression – all of which may be arbitrarily complex. The FAST ‘n’ query hint can also be used to ask the optimiser to prefer a plan which will produce the first ‘n’ rows quickly, while not restricting the total number of rows returned overall, as is the case with TOP. As a final example, consider that a logical semi-join (such as a sub-query introduced with EXISTS) shares the overall theme: it should be optimised to find the first matching row quickly.
The SQL Server query optimiser provides a way to meet all these requirements by introducing the concept of a ‘row goal’, which simply establishes a number of rows to ‘aim for’ at a particular point in the plan.
The optimiser changes required to enable the row goal feature are largely self-contained, sitting in a thin layer on top of the normal optimisation activities. This functional separation from common components (such as costing engine and the cardinality estimation ) reduces the risk of breaking changes, minimises development effort, and simplifies testing. The design also allows the optimiser to re-use existing facilities to explore the space of possible plans without special handling for regions constrained by a row goal. The plan choices for the sub-tree below the row goal are generated using the common exploration rules used for full-result queries. For example, optimisation of our simple query will consider all three physical join implementations, different access paths to the physical tables (indexes), and alternative orders for the join inputs.
Costing and Cardinality
An important difference arises when the optimiser comes to decide which candidate plan is ‘best’. The normal routines produce iterator costs and cardinality estimates based on the query running to completion. In our example query, this results in a plan using a hash join, with the Product table chosen as the build input.
Normal cardinality & selectivity estimation starts at the plan leaves (an index scan, for example) and works up the plan modifying the statistics (frequencies and histograms) and cardinality estimates as it encounters iterators. Some iterators (like a Filter) have a simple effect on statistics and cardinalities, whereas others (such as joins, unions) may require more complex calculations and histogram adjustments. When a row goal is present, the optimiser subsequently works down the plan (from the row goal toward the plan leaves) modifying iterator costs and cardinality estimates to reflect that goal. To explain, let’s run our test query again, but this time specifying TOP (50):
Working down the plan from the Top iterator, notice that the cardinality estimate is 50, as required by the row goal. The estimate for the number of rows from each execution of the Index Seek is 1 (since the Product ID column is a key). The estimated cardinality of the other join input is more interesting and requires a little more explanation…
Reverse Cardinality Estimation
One more query, to show the same query plan but without a row goal:
Regular cardinality estimation (without a row goal) determines that the 113,443 TransactionHistory rows, when joined to the Product table, should produce 92,284.3 rows. Leaving aside the fact that this calculation is wrong (and ignores the PK-FK relationship between the two tables), we can see that 113,443 / 92,284.3 = 1.22928 (the calculation uses floating-point arithmetic for speed). That factor is used in the row goal calculation when working back past the nested loops join iterator. The output cardinality is 50 (the row goal) so the outer input to the loop join is estimated to produce 50 * 1.22928 = 61.46400 rows. That matches the estimated number of rows in the row goal plan, and illustrates the working-backwards nature of row goal cardinality estimation. It is possible for cardinality estimation errors to produce a new estimate which is greater than the total number of rows in the source. This condition is detected by the routine, and cardinality estimations are capped to the known maximum row count in that case.
A similar calculation applies to the inner side of the loops join, but it is reported slightly differently. In the plan with a row-goal, the inner input cardinality is estimated to be 1, but the estimated number of executions is a row-goal modified value:
Notice also that the estimated I/O and CPU costs for the operator are not scaled; they show the per-execution cost as if a row goal were not in effect. The estimated operator cost is scaled to reflect the row goal; that value is used in the sub-tree cost calculations. The differences are easier to see on the outer input:
The Estimated I/O and CPU costs reflect the cost of returning all rows. The estimated ‘operator’ and ‘sub-tree’ costs are modified by the row goal. This feature of row goal plans often causes confusion, but next time someone asks you how the operator cost can be so much lower that its two components, you’ll have the answer! The operator and sub-tree costs are used to determine which of the candidate plans is ‘best’. By performing these simple modifications, the row goal feature ensures that a plan is produced which reflects the row goal requirement.
Faced with seemingly contradictory requirements to optimise for full-query execution and early termination based on a specified row goal, the SQL Server query optimisation team have produced an elegant solution. Queries which use a row goal (TOP, FAST N, EXISTS and so on) benefit from all the transformation and optimising tricks at SQL Server’s disposal, while still producing a cost-based plan.
To illustrate that last point, our test query chooses a loops join up to TOP (97). When TOP (98) is specified, the optimiser chooses a hash join on cost grounds. (The exact tipping point may be different on different versions of the product, and is dependent on statistics). Hopefully this post has provided interesting insights into the internal workings of this very useful feature. Thanks for reading.