THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

Geek City: Using a Nonclustered Index to Avoid a Sort

Most of you are probably aware that having a clustered index on the column(s) in an ORDER BY clause means that SQL Server can avoid having to sort your data, because it is already logically stored in order of the clustered index, and SQL Server can just access the data in order to get the sorted data .

For example, consider the SalesOrderHeader table in the AdventureWorks database. The clustered index is on SalesOrderID, so this query doesn't need to do a sort, just a clustered index scan:

SELECT * FROM Sales.SalesOrderHeader
ORDER BY SalesOrderID

image

But what about a nonclustered index? Its leaf level stores the index key values in order, so it can help avoid a sort if it completely covers the query, i.e. all the data your query needs is in the nonclustered index. The following query is covered by the nonclustered index on CustomerID, because the nonclustered index always includes the clustered key, in this case, SalesOrderID.

SELECT CustomerID, SalesOrderID
FROM Sales.SalesOrderHeader
ORDER BY CustomerID

image

But what about if the query is not covered? What if we wanted every column returned:

SELECT * FROM Sales.SalesOrderHeader
ORDER BY CustomerID

The default plan for this query will be to perform a sort.

image

However, the leaf level of the nonclustered index on CustomerID does have all the CustomerID values already sorted, so why can't that index be used? The answer is, it CAN be used, but it just isn't the default. SQL Server's optimizer tries to find the plan that will run to completion in the least amount of time. That sounds good, right? But with the SORT operator in the plan, everything stops while the sorting is taking place, and no data can be returned until all the data is sorted.

However,  another alternative would be to scan the nonclustered index, where the CustomerID values are already in order. For each row, SQL Server would have to do a key lookup into the clustered index and the total time to do a key lookup for every row would probably be more than the time required to sort all the data. The first few rows can be returned very quickly. How can we get such a plan? SQL Server provides us with a hint called FASTFIRSTROW that tells the optimizer to come up with a plan that returns the first row in a minimum amount of time. It's a table hint, so it looks like this:

SELECT * FROM Sales.SalesOrderHeader WITH (FASTFIRSTROW)
ORDER BY CustomerID

The plan looks like this:

image

So you have to decide what is most important to you.  Do you want the total processing time to be minimized, or do you want the time to have the first row returned to be minimized? It's up to you. The default is to minimize total processing time, but you can  use the FASTFIRSTROW hint if you want to take advantage of the nonclustered index to avoid the sort, and have the first few rows returned quickly.

Be careful if you try to do a cost comparison of queries with and without the FASTFIRSTROW hint. Look at the two plans below:

image

Comparing the two plans makes it look like the query with the hint is infinitely faster the query without the hint. However, if you look at the details for the nonclustered index scan, as shown in the pop-up properties box, you can see that the way the optimizer comes up with this plan is by assuming only one row will be returned. It optimizes as if only one row will be accessed, which is why the nonclustered index is chosen, but during execution it will retrieve all the rows. So of course a plan for accessing one row will be considered MUCH faster than a plan to access the entire table (31465 rows).

The documentation for this hint can be a little misleading. You might think using the hint indicates that the plan should get the first row quickly, and then get all the rest of rows using perhaps a different access method, so they will ALL come back as quickly as possible (which would then mean a sort), it actually means that the optimizer should just come up with a plan for getting the first row as quickly as possible. Period. (Then whatever plan was chosen for the first row will also be used for all the rest of the rows.)

This FASTFIRSTROW hint is listed in the SQL Server Books Online as a deprecated feature, which means it may be removed in a future version. However, it is still available in SQL Server 2008.  Microsoft recommends using the FAST N query hint (in the OPTION clause):

SELECT * FROM Sales.SalesOrderHeader
ORDER BY CustomerID
OPTION (FAST 1)

With this hint, you can specify an value for N, and the optimizer just assumes there are N rows, and comes up with the best plan as if there were that number. As an exercise, you might want to try determining at what value for N the optimizer will switch from using an nonclustered index scan to using a clustered scan plus SORT.

Today, a reader asked me a question about the FASTFIRSTROW hint and wanted to know if we should "use such a hint for large data sets [e.g. OLTP queries]".

Of course, the answer is the usual: It depends.

On the one hand, using this hint to avoid a sort can be a good thing, because sorting a large data set can use a lot of system resources (time and tempdb). But on the other hand, if you're really running OLTP queries, there should only be a few rows you're dealing with in any query, and then the value of this hint might not be as noticeable. So you should run your own tests and see if you like the results. Note that FAST N or FASTFIRSTROW is not the default, and that is probably for a good reason. Try running your queries first with whatever plan the optimizer comes us with, and only if you're not satisfied with the performance, you can try using a hint.

Have fun!

~Kalen

Published Tuesday, December 02, 2008 8:20 PM by Kalen Delaney

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Dan Forster said:

Thanks for that Kalen - especially the gotcha with the query plan... very useful stuff (as always)

December 3, 2008 4:34 AM
 

Alexander Kuznetsov said:

Hi Kalen,

The post is extremely interesting, thanks!

Can you increase the size of your images. I'm using Firefox, and all the text in your images is unrecognizably small, and View/Zoom increases only the size of the font, not the size of images.

December 3, 2008 6:00 PM
 

Kalen Delaney said:

If I try to enlarge the images in the Blog Writer they become too pixelated to read. Can you double-click the image and get an enlarged view?

If not, try the ZoomIt tool, free from Microsoft:

http://sqlblog.com/blogs/kalen_delaney/archive/2008/04/01/the-most-amazing-tool.aspx

Thanks

December 3, 2008 6:18 PM
 

jerryhung said:

A new thing learned = FAST N

I do wonder, when the N goes over say 50% or close to 100% of the # of rows in the index/table, does the optimizer just ignore it and pick the best plan it thinks?

or is there a magical threshold that tips it in either using FAST N or the default (Sort) way... magical as in it's in the Optimizer Black Box

December 4, 2008 1:50 PM
 

Kalen Delaney said:

Hi Jerry

I think I addressed your question in the post. Whatever you put for N will be the number of rows the optimizer will work with.

(The optimizer doesn't care how close that number is to what is actually there. It just uses the number you give it and picks the best plan for that number.)

~Kalen

December 4, 2008 2:07 PM
 

James Luetkehoelter said:

Great post Kalen! Any thoughts or testing to see if that hit plays merrry hobb with concurrent queries?

Sorting is a horrid thing to do. One of personal pet peeves is when someone sorts the output of a query, then has it ordered dynamically anyway in some grid control in the app! Sheesh.

I think I lot of people overlook the use of the Include clause as well. Gives you the power of a covering index without as much overhead. Do you have an opinion on those?

One last peeve - when someone has an SSIS package that includes a SORT transformation even though sorting is down in SSAS. "Gee, I wonder why this package is so slow".

If anyone hasn't tried it, it can be a great exercise to try to write a sorting algorithm. It seems straightforward at first glance, but you'll end up insane by the end.

And THANK YOU very much for pointing out tempdb and sorts. I feel so bad for tempdb now...it has to do just about everything under the sun...

December 4, 2008 8:08 PM
 

bfrasca said:

Great post and very useful to know in certain scenarios.  I did want to comment because I think I may have made an incorrect assumption about something you mentioned.  As you said, SQL Server doesn't need to do a sort if you want the data sorted by the clustered index.  I noticed that your query still included the ORDER BY clause.

Typically, if I want the data sorted by the clustered index, I don't bother with an ORDER BY clause.  I just ran a test on the AdventureWorks database and I got the same query plan both times.

select * from person.address

select * from person.address order by addressid

Is there some reason why I should specify the ORDER BY clause that I'm not aware of?  I ready a blog post by, I think it was Paul Randal, that mentioned that the data may not actually be physically sorted correctly on the page in certain obscure situations but I can't seem to find it so it may have been someone else or I'm feeling my age (again).

December 5, 2008 10:26 AM
 

AaronBertrand said:

bfrasca, while you may observe the same order most of the time, this behavior is not guaranteed.  The plan could change due to factors beyond your control or knowledge, either in your current system or due to optimizer changes when you install a service pack, hotfix/cumulative update, or upgrade to the next version.  Essentially, SQL Server is free to return the data in any order it deems "best," it just so happens that "best" is usually the same as clustered index order.

I cannot stress this loud enough (or often enough, it seems): if you need your data in a specific order, ALWAYS, ALWAYS, ALWAYS use an ORDER BY clause!

(Unless you can sort the data on your own later, e.g. presentation tier, Excel, etc.)

December 5, 2008 11:54 AM
 

bfrasca said:

AaronBertrand,

Thanks for your comments.  My point was that I thought that creating a clustered index effectively guaranteed that behavior, i.e. that was one of the benefits of a clustered index.  That being said, your point is well taken about mysterious factors beyond my control or knowledge.

I had always had some vague notion (don't ask me why) that an ORDER BY clause added some additional overhead but I'd never bothered to test that until I read Kalen's post today.  Since it doesn't appear to and because of the mysterious factors you mentioned, I will hop on the ALWAYS, ALWAYS, ALWAYS bandwagon with you.

December 5, 2008 12:13 PM
 

Linchi Shea said:

> Today, a reader asked me a question about the FASTFIRSTROW hint and wanted to know if we should "use such a hint for large data sets [e.g. OLTP queries]".

Shouldn't that be [e.g. reporting queries] instead? OLTP data sets are generally small.

December 9, 2008 8:22 PM
 

Nagaraj said:

Hi Kalen,

That was really interesting. Thanks.

Regards,

S.V.Nagaraj

December 13, 2008 9:38 AM
 

Eladio Rincón said:

Nice post Kalen; interesting... the limit is "option (fast 3578)" for using the NCI. How optimizer decide go/no-go to NCI? I guess that a % is the key.

using FASTFIRSTROW or OPTION FAST causes high number of logical reads because of the lookups between the NCI and CI; what would be worse logical reads vs sort... test, test, and test...

Regards, Eladio

December 19, 2008 9:53 AM
 

Mike Suarez said:

bfrasca,

I was recently at a seminar at the most recent PASS Summit where Maciej Pilecki demonstrated a case where having a query without an order by will not return the results ordered by the clustered index key.

He created a table, and updated it in such a way that it was heavily fragmented. Then he set the isolation level to read uncommitted. The optimizer scanned the data, just going from one page to the next, paying no mind to the pointers that would dicated the logical order of the data, and the order was about as random as you can imagine.

He did mention that when the isolation level is set to read committed, the default, the behavior is such that it should always return the data in the order of the clustering key w/o an order by. However it's not documented. And if it's not documented, its not garunteed. And if they want to change that for SQL 11 (or as aaron points out a hotfix, cu, or sp), they have every right to.

As a rule of thumb, when writing any declarative statement, be as explicit as possible. Assume nothing, and you can never get burnt. A couple extra words of code isn't a bad thing.

-Mike

December 22, 2008 3:03 PM
 

Eilonvi said:

Thanks for the article - really interesting stuff.

I have a question though - why is sql server performing a "key lookup" on the clustered key, if the non-clustered key already supposed to point to the clustered key?

Thanks!

October 23, 2011 1:06 AM
 

Kalen Delaney said:

Hi Eilonvi

Yes, the NC index points to the clustered key (it actually stores the clustered key value as the pointer), and The key lookup is what SQL Server does to follow the pointer and find the actual row of data being pointed to.

~Kalen

October 23, 2011 1:15 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement