THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

Did You Know? Forcing a nonclustered index scan to avoid sorting

 

Hopefully, if you know enough about SQL Server to understand most of my posts here, you're aware of the fact that using a nonclustered index is not always an optimal choice for accessing data, particularly if there are thousands (or more) rows of data you need to retrieve.  However, we should never say 'never'.

In general, if you have a query that is sorting on a column that has a nonclustered index, and needs to retrieve columns that are not part of the index, the optimizer will choose to scan the table (or clustered index, which is the same thing) and sort the data. The optimizer usually estimates that the time needed to scan the table plus the time to sort is going to be less than the time to follow each bookmark individually from the nonclustered index. And in some cases it might be right.

In other cases, it might not be, and the only way to verify that is to force SQL Server to use the nonclustered index, with a Lookup into the table or clustered index for each row.  However, if you try to force this behavior with an INDEX hint, you might be surprised. Using a hint that tells the optimizer what index to use does not tell the optimizer HOW to use that index.  It might perform a scan with lookup for every single row, or it might scan the nonclustered index, sort the index rows in the order of the clustered index and then access the clustered index using the sorted data. I've seen other plans as well and in this post, I'm not going to what the different plans actually mean.

What I want to tell you now is a way to force SQL Server to perform a scan of the nonclustered index and then a bookmark lookup for every row. You can do that with a table hint called FASTFIRSTROW. This hint tells the optimizer to choose a plan that returns the first row as quickly as possible. The default plan has to sort the entire table before any rows are returned at all. However, the nonclustered index already has the nonclustered key values sorted, and if that is the column used in the ORDER BY, following the leaf level of the nonclustered index should give us the rows already in correct sorted order. Yes, in some cases it might take longer to get ALL the rows in the table, following the bookmark for each row, but the first row will always be available fast.

It turns out that there are some cases where the FASTFIRSTROW hint can actually return all the rows more quickly than the default SORT. If you have lots of rows in your table (hundreds of thousands, or millions) you might find the FASTFIRSTROW hint ends up being faster. The only way to know for sure is to test it with your data.

Itzik Ben-Gan did some testing with different ways of getting sorted data, and if you are a subscriber to SQL Server Magazine, you can read about his tests here:

http://www.sqlmag.com/Articles/Index.cfm?ArticleID=94775&DisplayTab=Article

If you want to run some tests of your own, here is some code to get you started.

-- First, make a copy of the Sales.SalesOrderDetail table in the AdventureWorks database

USE AdventureWorks
GO
IF OBJECT_ID('dbo.details', 'U') IS NOT NULL
    DROP TABLE dbo.details
GO
CREATE TABLE details (
[SalesOrderID] [int] NOT NULL,
[SalesOrderDetailID] [int] NOT NULL,
[CarrierTrackingNumber] [nvarchar](25) NULL,
[OrderQty] [smallint] NOT NULL,
[ProductID] [int] NOT NULL,
[SpecialOfferID] [int] NOT NULL,
[UnitPrice] [money] NOT NULL,
[UnitPriceDiscount] [money] NOT NULL CONSTRAINT [DF_SalesOrderDetail_UnitPriceDiscount] DEFAULT ((0.0)),
[LineTotal] AS (isnull(([UnitPrice]*((1.0)-[UnitPriceDiscount]))*[OrderQty],(0.0))),
[rowguid] [uniqueidentifier] ROWGUIDCOL NOT NULL CONSTRAINT [DF_SalesOrderDetail_rowguid] DEFAULT (newid()),
[ModifiedDate] [datetime] NOT NULL CONSTRAINT [DF_SalesOrderDetail_ModifiedDate] DEFAULT (getdate()),
)
GO

-- Next, insert data into the new table

INSERT INTO details
(SalesOrderID, SalesOrderDetailID, CarrierTrackingNumber, OrderQty, ProductID, SpecialOfferID,
UnitPrice, UnitPriceDiscount, rowguid, ModifiedDate)
        SELECT SalesOrderID, SalesOrderDetailID, CarrierTrackingNumber, OrderQty, ProductID, SpecialOfferID,
             UnitPrice, UnitPriceDiscount, rowguid, ModifiedDate
        FROM Sales.SalesOrderDetail
GO

-- Now create a clustered and nonclustered index on the new details table

CREATE CLUSTERED INDEX
idx_cl_col1 ON dbo.details(SalesOrderID);
CREATE NONCLUSTERED INDEX
idx_nc_col2 ON dbo.details(ProductID);
GO

-- First look at the plan with no hints

SELECT * FROM dbo.details  
   ORDER BY ProductID

-- Now look at the plans with the two different index hints.

SELECT * FROM dbo.details
WITH (FASTFIRSTROW)
ORDER BY ProductID

SELECT * FROM dbo.details
WITH  (INDEX (idx_nc_col2))
ORDER BY ProductID

-- As you add more rows to your table, the plan for the query using the INDEX hint might change.
-- You can add more rows by just rerunning the INSERT statement from above, as many times as you would like.

Have fun!

~Kalen

Published Sunday, July 22, 2007 11:13 AM by Kalen Delaney

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Adam Machanic said:

Just a friendly reminder to readers to make sure to test these solutions under load if you use this for a production system!  I discovered some interesting behavioral differences between the solutions running on a dead server vs. a more heavily loaded system.

Kalen: Any idea why the plan when you force the NC index sorts on the clustering key before doing the lookups?  Is the idea to potentially maximize rewinds?  Or just a hope that the same page will be sure to stick around in cache between repeated reads?  Or..?  It seems like a strange plan choice in light of the ORDER BY clause.

July 22, 2007 1:57 PM
 

Kalen Delaney said:

At first I thought I would just add another sentence to my previous post, but then I realized that this

July 22, 2007 6:42 PM
 

Kyle Gress said:

I ran your test with ~120,000 rows and ~600,000 rows. In both cases, the default plan had the lowest number of reads and lowest cpu and elapsed times. The FASTFIRSTROW query was second in both cases with reasonably close times, but the number of logical reads was 300 times greater!

July 23, 2007 9:41 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement