THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

Geek City: Nonclustered Index Keys

I recently received a question about the storage of nonclustered index keys. I am assuming you are aware of the fact that if your table has a clustered index, SQL Server uses the clustered index key (all of its columns, if it is a composite index) as a 'bookmark' in your nonclustered indexes,  to allow your nonclustered indexes to uniquely identify the row that the index is pointing to.

In particular, this question was about a couple of statements in Inside SQL Server 2005: Query Tuning and Optimization, Chapter 3. Now, I didn't write Chapter 3, but I thought I knew all about how index keys were stored.  But (you may want to be sitting down for this) I WAS WRONG.

Here are the statements:

“…when you create a nonunique nonclustered index on a table with a clustered index, we append the clustered index keys to the nonclustered index keys if they are not explicitly part of the nonclustered index keys”

And then a bit later on, there is another quote discussing this same effect:

“Note that the key columns for each of the nonclustered indexes on T_clu include the clustered index key column a with the exception of T_clu_f, which is a unique index.”

Although I had read this chapter when we were putting the book together, I must have read this last sentence too fast. The first quoted sentence is basically what I said in my lead-in paragraph. But the second sentence now seemed wrong. I thought it was saying that if your nonclustered index is UNIQUE, the clustered key columns are not stored in the index at all. I knew that wasn't true, so I contacted the author of Chapter 3. It turns out he was using a very specific definition of key.

KEY COLUMN: a column that is stored in the leaf pages and the interior (node, or non-leaf) pages of the B-tree and that you can use in an index seek. 

He also uses another term in some of the related sections of the chapter:

COVERED COLUMN: a column that can be returned by the index without performing a bookmark lookup; it may or may not be a key column but it is stored in the leaf pages of the B-tree.

The definition of an index key includes the the property that it is stored at all levels of an index, not just the leaf level. Covered columns are in the leaf, so we have the potential of a covering index, and the possibility of retrieving the data we need without accessing the table itself. But not all covered columns are key columns.

This was news to me. I assumed that the fact that the clustered index key was in the nonclustered index meant that it was part of the key. And all my previous testing seemed to bear that out; I must never have tested at this level of detail with a unique nonclustered index. What this means is that I have been wrong in the following claim: "Explicitly declaring your clustered key columns as part of your nonclustered indexes does not make any difference in the storage of the index." That statement is true for nonunique indexes, but not true for unique.

What does this really mean?

Consider a TabA with a clustered index on col1.  These two index definitions are identical:

CREATE INDEX nc_index ON TabA(col2)

CREATE INDEX nc_index ON TabA(col2, col1)

For a nonunique, nonclustered index (which the above index is), the clustered key col1 is automatically included at the leaf, and at all other levels, just as if we explicitly declared col1 to be part of the nonclustered index key.

However, these two index definitions are not exactly the same:

CREATE UNIQUE INDEX nc_index ON TabA(col2)

CREATE UNIQUE INDEX nc_index ON TabA(col2, col1)

The first index definition will create a nonclustered index on col2, and add col1 only at the leaf. The second index definition will  also create an nonclustered index on col2, but it will add col1 as a true key column and it will be stored at all index levels.

Here's a specific example

Let's look at a specific example. I will be demonstrating the contents of the index rows using DBCC PAGE, and determine the index pages I am interested in by using DBCC IND. For more information about these two undocumented commands, one of the best places to look is at a couple of blog posts by Paul Randal:

http://www.sqlskills.com/blogs/paul/2007/10/01/InsideTheStorageEngineUsingDBCCPAGEAndDBCCINDToFindOutIfPageSplitsEverRollBack.aspx

http://blogs.msdn.com/sqlserverstorageengine/archive/2006/12/13/More-undocumented-fun_3A00_-DBCC-IND_2C00_-DBCC-PAGE_2C00_-and-off_2D00_row-columns.aspx

DBCC IND returns one row for every page of a table or index. The rows contain values indicating what type of page is listed (2 is an index page), what index the page belongs to, what level of an index the page is at (0 is the leaf), and then the file ID and page number. These last two values are then used in DBCC PAGE to get the contents of the page.  The output also includes the file ID and page number for the next and previous pages in logical order. A page with no previous page is the first page in order.

When running DBCC IND on a large table, you can get lots of rows returned, and it can be tricky to find the exact rows you're interested in. For this reason, I usually take the output of DBCC IND and save it to a table, and then I can query the table. I can included the script to build a table called sp_index_info in the master database. Because of the sp_ prefix, this table can be accessed from any database.

I'm going to use a table that I copy from the AdventureWorks database into a test database. You can use any database you like to run this code on your own server.

USE testdb
GO
IF EXISTS (SELECT name FROM sys.tables WHERE name = 'Sales')
   DROP TABLE Sales;
GO
SELECT  * INTO Sales
FROM AdventureWorks.Sales.SalesOrderDetail;
GO

Now build a clustered index on SalesOrderID and three similar nonclustered indexes on SalesOrderDetailID, which is unique.


CREATE CLUSTERED INDEX Sales_ID_Index ON Sales(SalesOrderID);
GO
-- The first index is not declared as a unique index
CREATE INDEX Sales_DetailID_Index1 ON Sales(SalesOrderDetailID);
GO
-- The second index is declared as a unique index
CREATE UNIQUE INDEX Sales_DetailID_Index2 ON Sales(SalesOrderDetailID);
GO
-- The third index is declared as a unique index and explicitly includes the clustered index key
CREATE UNIQUE INDEX Sales_DetailID_Index3 ON Sales(SalesOrderDetailID, SalesOrderID);
GO

Now, populate the sp_index_info table with the results of DBCC IND

TRUNCATE TABLE sp_index_info
INSERT INTO sp_index_info
    EXEC ('DBCC IND ( testdb, Sales, -1)'  );
GO

Find the first leaf page from each index, by looking for pages with no previous page, and with an IndexLevel value of 0.


SELECT PageFID, PagePID, IndexID
FROM sp_index_info
WHERE PageType = 2 AND IndexLevel = 0
  AND PrevPagePID = 0
ORDER BY IndexID;
GO

Here's my output:

PageFID PagePID     IndexID
------- ----------- -------
1       3416        2
1       1248        3
1       1680        4

To look at the actual rows, we need to use DBCC PAGE. An option value of 3 is very useful for index pages, as it gives us tabular output showing each individual index row.  I'll take each of the file and page numbers and use them in DBCC PAGE, and of course you'll have to substitute whatever page numbers you get.


DBCC TRACEON (3604);
DBCC PAGE(testdb, 1, 3416, 3);
DBCC PAGE(testdb, 1, 1248, 3);
DBCC PAGE(testdb, 1, 1680, 3);

Here are my results:

image

The results shown are for the leaf level. Note that because my clustered index is not unique, it includes a uniqueifier for every row. The uniqueifier is never visible using SELECT statements, but is a hidden column stored within the row to make sure each row in the clustered index is unique. This uniqueifier column is always considered part of the clustered key internally, so every nonunique clustered index is really a composite index.

There is the exact same data in the leaf for all three nonclustered indexes (except for page numbers of course; all three indexes have their own pages). The same index keys are in each row of the first leaf level index page. However, you might note a couple of differences int the column headers. Just as the nonunique clustered index includes a uniqueifier, SQL Server has to have some way to make sure each nonclustered index row is unique. The column headers actually index which columns of my nonclustered indexes make up the key. For the nonunique index, SQL Server has to consider the entire clustered key as part of the nonclustered key. It knows the clustered key is unique, so adding it to the nonclustered guarantees uniqueness. You can see that all 3 columns are labeled as (key): SalesOrderDetailID, SalesOrderID (the clustered key) and the clustered index uniqueifier.

For the second index, which was declared as unique, but did not explicitly include the clustered key, the only key the nonclustered index needs is the key the index was built on. Because that single key was declared as unique, no further information is needed to be part of the key. However, the clustered key stored in the leaf level rows, because SQL Server needs that to be able to find the row in the table data.

The third index explicitly declared the clustered key to be part of the nonclustered index key. So the column labels indicate that both the column SalesOrderDetailID and SalesOrderID are key columns. Since the index was declared to be unique, only the explicitly declared key columns are part of the key. The uniqueifier is there, because as part of the clustered key is must be in the leaf level pages, but it is not marked as part of the nonclustered key.

So how are these indexes different?

The indexes differ in the upper levels. The table sp_index_info still contains the output of DBCC IND, so we can find an upper level page for each index. 

At the level above the leaf the IndexLevel is 1. I don't want to see the rows for the upper level pages of the clustered index, so I filter them out.


SELECT PageFID, PagePID, IndexID
FROM sp_index_info
WHERE PageType = 2 AND IndexLevel > 0 and IndexID > 1
  AND PrevPagePID = 0
ORDER BY IndexID;

Here are my results: 

PageFID PagePID     IndexID
------- ----------- -------
1       3520        2
1       1360        3
1       1744        4

And then I use those results to look at the pages:

DBCC TRACEON (3604);
DBCC PAGE(testdb, 1, 3520, 3);
DBCC PAGE(testdb, 1, 1360, 3);
DBCC PAGE(testdb, 1, 1744, 3);

Here are my results:

image

Now you should see something different. The first index, nonunique, still has all three columns in the upper level page: the nonclustered key SalesOrderDetailID, and the two columns of the clustered key: SalesOrderID and the uniqueifier.

The second index is the unique nonclustered on a single column, and only has SalesOrderDetailID in the upper level page.

The third index is unique and composite, so both of the declared keys are in the upper level.

What other possibilities are there?

One additional question you might have is what happens if the clustered index is also unique. How do the nonclustered index rows look different? I think with the tools and examples presented here, you should be able to figure that out for yourself!

Have fun!

~Kalen

http://DVD.KalenDelaney.com

www.SQLCommunity.com

 

 

------------------------------Script to create a table to hold DBCC IND output--------------------------------------------------------

-- Create a table to hold the output of DBCC IND
USE master
GO
IF EXISTS (SELECT name FROM sys.tables
            WHERE name = 'sp_index_info')
   DROP TABLE sp_index_info;

GO
CREATE TABLE sp_index_info
(PageFID  tinyint,
  PagePID int,  
  IAMFID   tinyint,
  IAMPID  int,
  ObjectID  int,
  IndexID  tinyint,
  PartitionNumber tinyint,
  PartitionID bigint,
  iam_chain_type  varchar(30),   
  PageType  tinyint,
  IndexLevel  tinyint,
  NextPageFID  tinyint,
  NextPagePID  int,
  PrevPageFID  tinyint,
  PrevPagePID int,
  Primary Key (PageFID, PagePID));

Published Sunday, March 16, 2008 7:53 PM by Kalen Delaney

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Dave D. said:

If the clustered index is also unique in your example, this should not make much of a difference.  Just don't implicity include the cluster key.

Dave

http://www.mssqlonline.com/

March 20, 2008 12:37 AM
 

Louis Davidson said:

This kind of detail is so wonderful :) This is why in all of my design books, when I get past the "conceptual" stuff on indexes (and any other internal topics) there is always a note that says something like "If you really want to get deep into how things work, there is a person named Kalen whose book you need to get!"  Thanks!  

March 22, 2008 8:37 PM
 

Mattias Lind said:

I'm so happy I'm not the only one caring about them indexes... ;)

April 1, 2008 2:08 PM
 

Denis Gobo said:

Wow, it has been already a year since I wrote A year in review, The 21 + 1 best blog posts on SQLBlog

December 31, 2008 10:37 AM
 

Bhargav Patel said:

Hi Kalen,

very nicely presented.

It now really make sense to create Unique non-clustered index rather than non-unique non-clustered index if possible.

Great article !.

May 6, 2009 11:06 AM
 

Simon Sabin UK SQL Consultant's Blog said:

When ever I teach about SQL Server performance tuning I try can get across the message that there is no such think as a table. Does that sound odd, well it isn't, trust me. Rather than tables you need to consider structures. You have 1. Heaps 2. Indexes

March 5, 2010 11:50 AM
 

Kalen Delaney said:

I thought I had said almost all that could be said about nonclustered index keys in a post made almost

March 7, 2010 7:04 PM
 

Kalen Delaney said:

I thought I had said almost all that could be said about nonclustered index keys in a post made almost

March 7, 2010 7:06 PM
 

Rajesh said:

Thanks, Kalen..

got it..in nutshell every index entry must be unique, if not then a cluster key/RID is added at each level(in case od non clustere) or uniqifier in case of cluster,

else if non cluster is unique then no need to add cluster keys/RID at non leaf level, they r just added at leaf level so as to refer actual data row :)

very nicely presented..

October 1, 2012 3:07 AM
 

Kalen Delaney said:

Thank YOU Rajesh ...

October 1, 2012 1:11 PM
 

Geri Reshef said:

Just to be sure about the terminology-

1.

Create Index MyIdx On MyTbl(Col1,Col2);

Col2 is stored at all index levels.

2.

Create Index MyIdx On MyTbl(Col1) Include(Col2);

Col2 is stored only at the leaf.

Am I right?

Thanks!

December 7, 2013 3:02 PM
 

Kalen Delaney said:

Hi Geri

Yes, UNLESS Col2 is also part of the clustered index key.

~Kalen

December 7, 2013 5:14 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement