THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Paul White: Page Free Space

A technical SQL Server blog from New Zealand.

Heaps of Trouble?

If you’re not already a regular reader of Brad Schulz’s blog, you’re missing out on some great material.  In his latest entry, he is tasked with optimizing a query run against tables that have no indexes at all.  The problem is, predictably, that performance is not very good.  The catch is that we are not allowed to create any indexes (or even new statistics) as part of our optimization efforts.

In this post, I’m going to look at the problem from a slightly different angle, and present an alternative solution to the one Brad found.  Inevitably, there’s going to be some overlap between our entries, and while you don’t necessarily need to read Brad’s post before this one, I do strongly recommend that you read it at some stage; he covers some important points that I won’t cover again here.

The Example

We’ll use data from the AdventureWorks database, copied to temporary unindexed tables.  A script to create these structures is shown below:

CREATE  TABLE #Custs
        (
        CustomerID          INTEGER NOT NULL,
        TerritoryID         INTEGER NULL,
        CustomerType        NCHAR(1) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL,
        );
GO        
CREATE  TABLE #Prods
        (
        ProductMainID       INTEGER NOT NULL,
        ProductSubID        INTEGER NOT NULL,
        ProductSubSubID     INTEGER NOT NULL,
        Name                NVARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL,
        );
GO        
CREATE  TABLE #OrdHeader
        (
        SalesOrderID        INTEGER NOT NULL,
        OrderDate           DATETIME NOT NULL,
        SalesOrderNumber    NVARCHAR(25) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL,
        CustomerID          INTEGER NOT NULL,
        );
GO
CREATE  TABLE #OrdDetail
        (
        SalesOrderID        INTEGER NOT NULL,
        OrderQty            SMALLINT NOT NULL,
        LineTotal           NUMERIC(38,6) NOT NULL,
        ProductMainID       INTEGER NOT NULL,
        ProductSubID        INTEGER NOT NULL,
        ProductSubSubID     INTEGER NOT NULL,
        );
GO
INSERT  #Custs
        (
        CustomerID, 
        TerritoryID, 
        CustomerType
        )
SELECT  C.CustomerID,
        C.TerritoryID,
        C.CustomerType
FROM    AdventureWorks.Sales.Customer C 
        WITH (TABLOCK);
GO
INSERT  #Prods 
        (
        ProductMainID, 
        ProductSubID, 
        ProductSubSubID,
        Name
        )
SELECT  P.ProductID,
        P.ProductID,
        P.ProductID,
        P.Name
FROM    AdventureWorks.Production.Product P
        WITH (TABLOCK);
GO 
INSERT  #OrdHeader
        (
        SalesOrderID, 
        OrderDate, 
        SalesOrderNumber, 
        CustomerID
        )
SELECT  H.SalesOrderID,
        H.OrderDate,
        H.SalesOrderNumber,
        H.CustomerID
FROM    AdventureWorks.Sales.SalesOrderHeader H
        WITH (TABLOCK);
GO
INSERT  #OrdDetail
        (
        SalesOrderID, 
        OrderQty, 
        LineTotal, 
        ProductMainID, 
        ProductSubID, 
        ProductSubSubID
        )
SELECT  D.SalesOrderID,
        D.OrderQty,
        D.LineTotal,
        D.ProductID,
        D.ProductID,
        D.ProductID
FROM    AdventureWorks.Sales.SalesOrderDetail D
        WITH (TABLOCK);

The query itself is a simple join of the four tables:

SELECT  P.ProductMainID AS PID,
        P.Name,
        D.OrderQty,
        H.SalesOrderNumber,
        H.OrderDate,
        C.TerritoryID
FROM    #Prods P
JOIN    #OrdDetail D 
        ON  P.ProductMainID = D.ProductMainID
        AND P.ProductSubID = D.ProductSubID
        AND P.ProductSubSubID = D.ProductSubSubID
JOIN    #OrdHeader H 
        ON  D.SalesOrderID = H.SalesOrderID
JOIN    #Custs C 
        ON  H.CustomerID = C.CustomerID 
ORDER   BY
        P.ProductMainID ASC
OPTION  (RECOMPILE, MAXDOP 1);

Remember that these tables have no indexes at all, and only the single-column sampled statistics SQL Server automatically creates (assuming default settings).  The estimated query plan produced for the test query looks like this (click to enlarge):

image

The Problem

The problem here is one of cardinality estimation – the number of rows SQL Server expects to find at each step of the plan.  The lack of indexes and useful statistical information means that SQL Server does not have the information it needs to make a good estimate.  Every join in the plan shown above estimates that it will produce just a single row as output.  Brad covers the factors that lead to the low estimates in his post.

In reality, the join between the #Prods and #OrdDetail tables will produce 121,317 rows.  It should not surprise you that this has rather dire consequences for the remainder of the query plan.  In particular, it makes a nonsense of the optimizer’s decision to use Nested Loops to join to the two remaining tables. 

Instead of scanning the #OrdHeader and #Custs tables once (as it expected), it has to perform 121,317 full scans of each.  The query takes somewhere in the region of twenty minutes to run to completion on my development machine.

A Solution

At this point, you may be thinking the same thing I was: if we really are stuck with no indexes, the best we can do is to use hash joins everywhere.

We can force the exclusive use of hash joins in several ways, the two most common being join and query hints.  A join hint means writing the query using the INNER HASH JOIN syntax; using a query hint involves adding OPTION (HASH JOIN) at the bottom of the query.  The difference is that using join hints also forces the order of the join, whereas the query hint gives the optimizer freedom to reorder the joins at its discretion.

Adding the OPTION (HASH JOIN) hint results in this estimated plan:

image

That produces the correct output in around seven seconds, which is quite an improvement!  As a purely practical matter, and given the rigid rules of the environment we find ourselves in, we might leave things there.  (We can improve the hashing solution a bit – I’ll come back to that later on).

Faster Nested Loops

It might surprise you to hear that we can beat the performance of the hash join solution shown above using nested loops joins exclusively, and without breaking the rules we have been set.

The key to this part is to realize that a condition like (A = B) can be expressed as (A <= B) AND (A >= B).  Armed with this tremendous new insight, we can rewrite the join predicates like so:

SELECT  P.ProductMainID AS PID,
        P.Name,
        D.OrderQty,
        H.SalesOrderNumber,
        H.OrderDate,
        C.TerritoryID
FROM    #OrdDetail D
JOIN    #OrdHeader H
        ON  D.SalesOrderID >= H.SalesOrderID
        AND D.SalesOrderID <= H.SalesOrderID
JOIN    #Custs C
        ON  H.CustomerID >= C.CustomerID
        AND H.CustomerID <= C.CustomerID
JOIN    #Prods P
        ON  P.ProductMainID >= D.ProductMainID
        AND P.ProductMainID <= D.ProductMainID
        AND P.ProductSubID = D.ProductSubID
        AND P.ProductSubSubID = D.ProductSubSubID
ORDER   BY 
        D.ProductMainID
OPTION  (RECOMPILE, LOOP JOIN, MAXDOP 1, FORCE ORDER);

I’ve also added LOOP JOIN and FORCE ORDER query hints to ensure that only nested loops joins are used, and that the tables are joined in the order they appear.  The new estimated execution plan is:

image

This new query runs in under 2 seconds.

Why Is It Faster?

The main reason for the improvement is the appearance of the eager Index Spools, which are also known as index-on-the-fly spools.  If you read my Inside The Optimiser series you might be interested to know that the rule responsible is called JoinToIndexOnTheFly.

An eager index spool consumes all rows from the table it sits above, and builds a index suitable for the join to seek on.  Taking the index spool above the #Custs table as an example, it reads all the CustomerID and TerritoryID values with a single scan of the table, and builds an index keyed on CustomerID.  The term ‘eager’ means that the spool consumes all of its input rows when it starts up.  The index is built in a work table in tempdb, has no associated statistics, and only exists until the query finishes executing.

The result is that each unindexed table is only scanned once, and just for the columns necessary to build the temporary index.  From that point on, every execution of the inner side of the join is answered by a seek on the temporary index – not the base table.

A second optimization is that the sort on ProductMainID (required by the ORDER BY clause) is performed early, on just the rows coming from the #OrdDetail table.  The optimizer has a good estimate for the number of rows it needs to sort at that stage – it is just the cardinality of the table itself.  The accuracy of the estimate there is important because it helps determine the memory grant given to the sort operation.  Nested loops join preserves the order of rows on its outer input, so sorting early is safe.  (Hash joins do not preserve order in this way, of course).

The extra lazy spool on the #Prods branch is a further optimization that avoids executing the seek on the temporary index if the value being joined (the ‘outer reference’) hasn’t changed from the last row received on the outer input.  It takes advantage of the fact that rows are still sorted on ProductMainID, so if duplicates exist, they will arrive at the join operator one after the other.

The optimizer is quite conservative about introducing index spools into a plan, because creating and dropping a temporary index is a relatively expensive operation.  It’s presence in a plan is often an indication that a useful index is missing.

I want to stress that I rewrote the query in this way primarily as an educational exercise – I can’t imagine having to do something so horrible to a production system.

Improving the Hash Join

I promised I would return to the solution that uses hash joins.  You might be puzzled that SQL Server can create three new indexes (and perform all those nested loops iterations) faster than it can perform three hash joins.  The answer, again, is down to the poor information available to the optimizer.  Let’s look at the hash join plan again:

image

Two of the hash joins have single-row estimates on their build inputs.  SQL Server fixes the amount of memory available for the hash table based on this cardinality estimate, so at run time the hash join very quickly runs out of memory.

This results in the join spilling hash buckets to disk, and any rows from the probe input that hash to the spilled buckets also get written to disk.  The join process then continues, and may again run out of memory.  This is a recursive process, which may eventually result in SQL Server resorting to a bailout join algorithm, which is guaranteed to complete eventually, but may be very slow.  The data sizes in the example tables are not large enough to force a hash bailout, but it does result in multiple levels of hash recursion.  You can see this for yourself by tracing the Hash Warning event using the Profiler tool.

The final sort in the plan also suffers from a similar problem: it receives very little memory and has to perform multiple sort passes, saving intermediate runs to disk (the Sort Warnings Profiler event can be used to confirm this).  Notice also that because hash joins don’t preserve sort order, the sort cannot be pushed down the plan toward the #OrdDetail table, as in the nested loops plan.

Ok, so now we understand the problems, what can we do to fix it?  We can address the hash spilling by forcing a different order for the joins:

SELECT  P.ProductMainID AS PID,
        P.Name,
        D.OrderQty,
        H.SalesOrderNumber,
        H.OrderDate,
        C.TerritoryID
FROM    #Prods P
JOIN    #Custs C
JOIN    #OrdHeader H
        ON  H.CustomerID = C.CustomerID
JOIN    #OrdDetail D
        ON  D.SalesOrderID = H.SalesOrderID
        ON  P.ProductMainID = D.ProductMainID
        AND P.ProductSubID = D.ProductSubID
        AND P.ProductSubSubID = D.ProductSubSubID
ORDER   BY 
        D.ProductMainID
OPTION  (MAXDOP 1, HASH JOIN, FORCE ORDER);

image

With this plan, each of the inputs to the hash joins has a good estimate, and no hash recursion occurs.  The final sort still suffers from the one-row estimate problem, and we get a single-pass sort warning as it writes rows to disk.  Even so, the query runs to completion in three or four seconds.  That’s around half the time of the previous hashing solution, but still not as fast as the nested loops trickery.

Final Thoughts

SQL Server’s optimizer makes cost-based decisions, so it is vital to provide it with accurate information.  We can’t really blame the performance problems highlighted here on anything other than the decision to use completely unindexed tables, and not to allow the creation of additional statistics.

I should probably stress that the nested loops solution shown above is not one I would normally contemplate in the real world.  It’s there primarily for its educational and entertainment value.  I might perhaps use it to demonstrate to the sceptical that SQL Server itself is crying out for an index.

Be sure to read Brad’s original post for more details.  My grateful thanks to him for granting permission to reuse some of his material.

Paul White
Email: SQLkiwi@gmail.com
Twitter: @SQL_kiwi

Published Friday, December 10, 2010 6:17 AM by Paul White

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Brad Schulz said:

Awesome!  Excellent!  What a terrific follow-up!

I especially like the explanation of the memory grants for the hash joins... that makes a lot of sense and explains why the OPTION (HASH JOIN) query takes a little bit longer than I expected.

I'm kind of kicking myself for not using FORCE ORDER in my original post... especially because I had blogged about it before.  Instead I used parentheses around the various JOINs to force the topology of the plan to be what you illustrated... however, many people don't know that can be done, so it worked out in the end.  But the FORCE ORDER is certainly more readable than the convoluted nested JOINs I put together.

Thanks again... I'm updating my blog post to point here so that everyone can benefit.

Best...

--Brad

December 9, 2010 12:51 PM
 

Alejandro Mesa said:

Paul,

Excellent posts, both of them.

--

AMB

December 9, 2010 2:11 PM
 

Geri Reshef said:

I *stared* this article in my Google Reader for repeated reading in the future: I learnt a lot of it!

December 10, 2010 11:03 PM
 

Vincent Salard said:

Very interesting post !

A = B transformation is a brilliant (or crazy ?) idea I would have never thought about. Which past experience brought you to this method ?

I always appreciate your very valuable posts Paul, thx !

Vincent.

December 14, 2010 4:29 AM
 

Paul White said:

Hi Vincent,

Like all optimizer rules, the one that adds an index spool as an alternative strategy only matches certain sub-plan shapes, and whether it ends up in the final plan depends on estimated costs.

Rewriting the equality as the intersection of two inequalities, and forcing the loops join, is the only way I can think of to convince the optimizer to try the index spool.

Thanks for a great question!

Paul

December 22, 2010 5:12 PM

Leave a Comment

(required) 
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement