The
optimizer is the part of SQL Server that takes your query and reorders and
rearranges your query to find the optimal execution plan. In theory.
In
practice, that doesn’t always work out well. Often, the optimizer manages to
come up with brilliant ways to execute a complex query very efficiently – but sometimes,
it misses an option that appears to be so simple that you can only stare in
utter amazement at the execution plan before going to the Connect site.
Here is an
example I recently ran into. I tested it on SQL Server 2012 and on SQL Server
2008 R2, and it reproduces on both. Execute the query below in the
AdventureWorks sample database, with the option to Include Actual Execution Plan
enabled (Ctrl+M), or request an Estimated Execution Plan (Ctrl-L).
SELECT TerritoryID,
Name,
SalesLastYear,
SalesYTD,
RANK() OVER (ORDER BY SalesLastYear) AS Rank1,
RANK() OVER (ORDER BY SalesYTD) AS Rank2
FROM Sales.SalesTerritory
ORDER BY SalesLastYear;
When
following the flow of data (by reading the execution plan from right to left),
we see that the data is first read (with an unordered clustered index scan,
since there is no better index available), then sorted by SalesLastYear, so
that Rank1 can be computed (using two Segment operators and a Sequence Project operator
– don’t ask). After that, the rows are sorted again, now by SalesYTD, and we
see another combination of two Segment and one Sequence Project, for the
calculation of Rank2. And then, finally, the rows are re-sorted by SalesLastYear
so that they can be returned in the requested order.
Now the big
question is: why does the plan include two sort operators that both sort the
data in the same (SalesLastYear) order? If the task of the optimizer is to find
smart ways to rearrange computation order for better performance, why doesn’t
it simply compute Rank2 first and Rank1 after that? In that case, the rows are
already in the SalesLastYear order after the last Sequence Project, so no extra
sort is needed. The execution plan of the query below confirms this suspicion:
SELECT TerritoryID,
Name,
SalesLastYear,
SalesYTD,
RANK() OVER (ORDER BY SalesYTD) AS Rank2,
RANK() OVER (ORDER BY SalesLastYear) AS Rank1
FROM Sales.SalesTerritory
ORDER BY SalesLastYear;
Indeed, the
execution plan of this query includes only two Sort operators instead of the
three we had in the first execution plan. If you include both queries in a single
batch, you’ll see an estimated cost of 59% for the first query, and 41% for the
second. (Some people think that the percentages shown in an Actual Execution
Plan are an indication of the actual cost; that is not the case – the percentages
are computed from the Estimated Subtree Cost property of
the left-most SELECT operator). The SalesTerritory table is too small to
measure any actual performance differences, but I tried queries with a similar
pattern on the SalesOrderDetail table (121,317 rows), and on
FactProductInventory in AdventureWorksDW (776,286 rows) and I did see an actual
difference in execution time. No surprise, but now I know for sure!
So, we have
seen that simply reordering the two columns that use an OVER clause reduces the
query cost by about 30%. How is it possible that such a simple, basic
reordering strategy is not considered by the optimizer? Surely, this can only be
a bug?
That’s what
Fabiano Neves Amorim thought when he filed this bug
on Connect. But, as you can see, the bug has been closed as “By design”. That
probably doesn’t mean that someone wrote a design document telling the
optimizer team to make sure that OVER clauses must always be evaluated in the
order in which they appear in the query, even if a different order would be
cheaper. I rather think of it as “missing an optimization opportunity is not a
bug; the results are still correct, just a bit slower – so we’re going to close
this “bug” report”. In this case, maybe the optimization opportunity was not
identified during the design phase, or it was just too hard to implement. The
latter statement may sound ridiculous at first (how can such a basic rewrite be
too hard?), but you have to be aware that the optimizer does not operate on the
original query text, but on an internal representation that is based on
mathematics and set theory. Rewrites that may be complex in the query text may
be obvious in this representation, but the reverse can also be true – so I’m
prepared to accept the comment that Andrew Richardson made on behalf of
Microsoft to the above Connect item: that it would be fairly complicated for
the Query Optimizer.
That does
not mean I agree with the rest of Andrew’s comment. He suggests that this is a
case where we should not rely on the optimizer, but rewrite our queries
ourselves, especially since it’s such an easy rewrite in this case. Well, I
would agree with that, except that: (a) this missed optimization opportunity is
not documented, so how are developers supposed to know that they may need to
reorder columns in a SELECT clause for optimal performance? (that is one of the
reasons for this blog post); and (b) the behavior of the optimizer in this
situation is not documented, so it can change at any time; I’d hate to rewrite
all my queries and then find that the sysadmin just installed a patch and now
the optimizer always starts with the last
instead of the first OVER clause (or, worse, I don’t find it and just get all
the bad performance right back).
However,
Andrew is right in so far that, at this time, rewriting queries does
consistently improve performance in all my tests. So at this time, rewriting
does seem to be the right thing to do. Just keep in mind that you have to test
all your queries, not only on your test server but also on your production
hardware, and that you’ll have to repeat these tests on a regular basis (at
least after each patch, CU, service pack, or other change).
The basic
rewrite pattern is simple – for each query that uses OVER clauses with
different ORDER BY subclauses as well as an ORDER BY clause on the query that
matches one of the ORDER BY subclauses, make sure that the OVER clause that
uses the matching ORDER BY comes last in the SELECT list. If you have a client
that expects the columns in a particular order, the rewrite becomes a bit more
complex – in that case, you have to use a CTE that includes the OVER clauses in
the optimized order, and then you can reorder the columns in the query that
references the CTE – as shown in this example:
WITH MyCTE
AS (SELECT TerritoryID,
Name,
SalesLastYear,
SalesYTD,
RANK() OVER (ORDER BY SalesYTD) AS Rank2,
RANK() OVER (ORDER BY SalesLastYear) AS Rank1
FROM Sales.SalesTerritory)
SELECT TerritoryID,
Name,
SalesLastYear,
SalesYTD,
Rank1,
Rank2
FROM MyCTE
ORDER BY SalesLastYear;
These
rewrites are indeed the best option for the time being – but I still think that
the optimizer should be improved to do these rewrites internally. So I decided
to file a
new item on Connect, this time labeling it as a suggestion instead of a
bug. If you agree with me that this would be a great investment of Microsoft’s
engineering time, then please add your vote to this item. (Or vote it down if
you think I’m just being ridiculous). But don’t stop there! Microsoft knows me;
they know I’m a geek who plays around with options like this and then runs into
this issue. No real production databases were hurt during the production of
this blog. And if I am the only one, then, frankly, I myself will say that they
have better things to do with their engineering time. However, if I know that this
affects real people, I can make a much stronger case to Microsoft for getting
this fixed.
So – go out
to your production servers and find if you use queries with this pattern (two OVER
clauses with different ORDER BY and an ORDER BY on the final query), then check
to see if you should rewrite them. And then report back – add a comment here or
on the Connect item; share if this affected you, and how much performance you
were able to gain as a result of the rewrite.
If
Microsoft knows that their customers would actually benefit, they’ll be much
more inclined to add this improvement to the optimizer then if it’s just about
me, a geek moaning about an edge case that no one in the real world would ever
run into.