THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

Geek City QUIZ: What happens when you change a column in an index from a key column to an included column?

This quiz is not really a generic question about changing an index key column to be an included column; it's about a behavior noticed by a reader in one particular query.  Dejan Nakarada-Kordic from New Zealand sent me a very interesting puzzle. He had a reproducible query for which he thought an existing nonclustered index should be used, and it only ended up being used if the column was defined as an INCLUDED column instead of a key column. Here is the script Dejan sent me:

-- First, create the table
IF  EXISTS (SELECT * FROM sys.tables
            WHERE schema_id = 1 and name = 'Table1')
     DROP TABLE dbo.Table1;
GO
CREATE TABLE dbo.Table1(
    Transaction_Serial_No [bigint IDENTITY(1,1) NOT NULL,
    Channel] char(4) NOT NULL,
    Retry_Counter int NOT NULL,
CONSTRAINT PK_Table1 PRIMARY KEY CLUSTERED
       (Transaction_Serial_No));
GO

--populate the table with 10000 rows
SET NOCOUNT ON;
declare @counter int;
set @counter = 0;
while (@counter < 10000)
begin
    set @counter = @counter + 1;
    insert into  dbo.Table1(channel, retry_counter)
    values('ch1', 0);
    insert into  dbo.Table1(channel,  retry_counter)
    values('ch2', 0);
end;

-- Now build a nonclustered index on
IF  EXISTS (SELECT * FROM sys.indexes
            WHERE object_id = OBJECT_ID('dbo.Table1') AND name = 'IX_Channel')
    DROP INDEX IX_Channel ON dbo.Table1;
GO
CREATE NONCLUSTERED INDEX IX_Channel ON dbo.Table1 (Channel, Retry_Counter);
GO

-- Examine the query plan for this query:
SELECT TOP(20) Transaction_Serial_No
        FROM dbo.Table1 
WHERE Channel = 'ch2'
         AND Retry_Counter <= 10 
ORDER BY Transaction_Serial_No;

The plan should show you a Clustered Index Scan being performed, even though the nonclustered index appears to be a covering index.

image

The query contains all 3 columns used in the query. The columns Channel and Retry_Counter are defined index keys, and because the table has a clustered index, the clustered key Transaction_Serial_No is also part of the index.

If you rebuild the index to use Retry_Counter as an INCLUDED column, the plan changes.

CREATE NONCLUSTERED INDEX IX_Channel ON dbo.Table1 (Channel)
    INCLUDE ( Retry_Counter)  WITH   (DROP_EXISTING = ON);

-- Examine the plan again:

SELECT TOP(20) Transaction_Serial_No
        FROM dbo.Table1 
WHERE Channel = 'ch2'
         AND Retry_Counter <= 10 
ORDER BY Transaction_Serial_No;

The plan now shows a nonclustered index seek:

image

WHY should changing the column from a key column to an INCLUDED column change the plan?

You can try to figure out the answer for yourself, or read on.

 

-------------------------------------------------------------------------------------------------------------------------

It turns out that there are two extra factors to consider.

1) Although covering indexes are a great thing, and the optimizer will choose them over other possible indexes most of the time, the cost of sorting can sometimes outweigh the benefit of a covering index.

2) When a column is an INCLUDED column in an index, it comes after all the key columns and after the clustered index key that is always part of a nonclustered index on a table that has a clustered index.

So, the original index on (Channel, Retry_Counter) is the same as an index on (Channel, Retry_Counter, Transaction_Serial_No), but moving Retry_Counter to be an INCLUDED column changes the index to be on the columns (Channel,  Transaction_Serial_No), with Retry_Counter in last place, and unsorted.

The query wants the data sorted on Transaction_Serial_No, and scanning the clustered index gives us the data in Transaction_Serial_No order.  The first  nonclustered index is sorted first by Channel, then by Retry_Counter, then by Transaction_Serial_No so that index is not useful for the sort. 

The second nonclustered index is sorted first by Channel, then by Transaction_Serial_No. BUT the query has limited the data to only data with the Channel value equal to 'ch2', so since the first column values we're retrieving are all the same, returning data in the nonclustered index order will return the data in Transaction_Serial_No order and no sorting needs to be done.

You might notice in this data set that all the values for Retry_Counter are also identical, so you might then think we should be able to use the first nonclustered index to avoid having to sort the data, but the optimizer cannot be sure that the Retry_Counter values are all identical. The statistics might indicate that all the values are the same, but the optimizer can't be sure the statistics are 100% up to date.

Here are some things to try:

1) If you remove the ORDER BY, you will see that the first index, with no INCLUDED column, is used.

2) If you change the second condition to searching for a constant (AND Retry_Counter = 0), the first index can be used. If both of the first two index columns are constants in the retrieved data, it will be sorted by the third column (Transaction_Serial_No).

3) If you force the query to use the nonclustered index when Retry_Counter is a key, you'll see that SQL Server has to do a sort:

DROP INDEX IX_Channel ON dbo.Table1;
GO
CREATE NONCLUSTERED INDEX IX_Channel ON dbo.Table1 (Channel, Retry_Counter);
GO
SELECT TOP(20) Transaction_Serial_No
        FROM dbo.Table1 WITH (index = ix_channel)
    WHERE Channel = 'ch2'
        AND Retry_Counter <=  10
ORDER BY Transaction_Serial_No;
GO


image

Notice that the SORT operation is estimated to be 95% of the total query cost, so SQL Server would definitely like to avoid that if possible. 

So the issue wasn't really because of INCLUDED columns, it was because of the order the columns appeared in the index, and the fact that SQL Server wants to try to avoid sorting whenever possible.

Have fun!

~Kalen

Published Monday, November 24, 2008 12:15 AM by Kalen Delaney

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Uri Dimant said:

Hi Kalen

I think that it does also matter how selective is NONCLUSTERED INDEX IX_Channel. If it isn't , SQL Server probably will ended up wit lots of logical reads and cpu , so then perhaps we can create a clustered index on those columns. But I still have doubts , we will have  SORT operation which is pretty high. What do you think?

November 24, 2008 3:28 AM
 

Kalen Delaney said:

Hi Uri

Yes of course, there are a lot of things that could make a difference. I was just looking at this particular query with this particular data set and these two different nc index definitions.

Understanding why we get the behavior in THIS case will help you make indexing decisions in other cases.

~Kalen

November 24, 2008 12:00 PM
 

Norm said:

Why can't you provide included columns on a nonclustered Primary Key index?

March 25, 2010 3:27 PM
 

Kalen Delaney said:

Norm

Declaring a PK is not the same as building an index. Declaring a PK is a logical operation, building an index is a physical operation.

Declaring a PK just says you want the column(s) to have certain properties, like uniqueness and non-nullability. Right now, SQL Server enforces a PK by building a unique index, but at some point in a future version there might be a different way.

Building an index is usually done for performance reasons and you are requesting that SQL Server create a specific b-tree structure. The included columns then enhance the physical structure at the leaf level.

If you specified included columns in a PK, you would mixing up logical and physical design. If you like, you can keep it all physical by just building the unique index you need for your PK, and don't declare it as a PK. You can then specify included columns.

March 25, 2010 11:36 PM
 

Norm said:

Thank you Kalen.  The problem I'm encountering is that I want to partition a legacy database, but the natural partitioning key is a non-primary Foreign Key. This FK is in all the 200 tables that I want to partition.  It would have been wonderful to add this as an included column in the PK index because then I would be able to partition that as well and use the sliding window scenario, without massive and disruptive schema changes.  

Referential integritty is a requirement for the system and so the PK's are all required.  The only options I can think of to partition would be to partition each of the transaction tables on their individual PK's and then employ the sliding window which will allow partition switching with metadata changes only.  I have to do a lot of analysis on all the corresponding FK's, so that the dependent records in subsidiary tables are switched also.

Thank you very much for your thoughtful reply.

March 26, 2010 12:54 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement