THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

Geek City: More About Nonclustered Index Keys

I thought I had said almost all that could be said about nonclustered index keys in a post made almost exactly two years ago, on March 16, 2008.  But there's more.

To get all the benefit from today's post, you'll really have to read that one, but I'll synthesize the crucial details here.

Every index needs to be unique, in some way or another, so that every index row can be referenced uniquely. A clustered index can be declared unique, which it might be if you built an index on EmployeeID_number. If you build the index on a non-unique column, such as LastName, SQL Server will uniquify it for you, by adding an extra hidden column as a second index key.

Nonclustered indexes also need to be unique. In addition, each nonclustered index has to have a way of 'pointing to' the row in the table it references, and if the table has a clustered index, this pointer is a copy of the clustered index key(s). Since the clustered key(s) must be unique, as stated in the previous paragraph, nonclustered indexes are automatically made unique when you add the clustered key. My above referenced post refers to the fact that if you if you explicitly add the clustered key column(s) to the nonclustered index definition, those columns will only be part of the nonclustered key if the nonclustered index was not declared unique. Otherwise, the clustered key column(s) will be treated like INCLUDE columns.

The definition of a KEY column is one that is propagated up through all levels of the index tree. (This is how SQL Server builds an index, by taking the key column(s) from the first row on every page at one level, and building a set of node pages to contain all those 'first row' values, with pointers to the pages those rows came from.) So, if a column is shows up in the intermediate, or node, index levels, it is a KEY column.

So the other post showed you that when the nonclustered index was unique, clustered key(s) are stored as INCLUDE columns, which means they just show up at the leaf level. They are not needed as part of the nonclustered key.

However, if the clustered index is unique and you have a nonunique nonclustered index, we get the opposite behavior. Even if we declare the clustered key column as an INCLUDE column, it is actually stored as a KEY column.

Here is the first part of the script I used in the other post, that copies a table from the AdventureWorks database. However, in this case I am building a unique clustered index:

USE testdb
GO
IF EXISTS (SELECT name FROM sys.tables WHERE name = 'Sales')
   DROP TABLE Sales;
GO
SELECT  * INTO Sales
FROM AdventureWorks.Sales.SalesOrderDetail;
GO

Now build a clustered index on SalesOrderID and three similar nonclustered indexes on SalesOrderDetailID, which is unique.

CREATE UNIQUE CLUSTERED INDEX Sales_ID_Index ON Sales(SalesOrderDetailID);
GO

Next, build three similar nonclustered indexes:

-- The first index is on a single, nonunique column
CREATE INDEX SalesOrderID_Index1 ON Sales(SalesOrderID);
GO
-- The second index is composite and explicitly adds the clustered key column
CREATE  INDEX SalesOrderID_Index2 ON Sales(SalesOrderID, SalesOrderDetailID);
GO
-- The third index adds the clustered index key as an INCLUDE column
CREATE  INDEX Sales_DetailID_Index3 ON Sales(SalesOrderID);
      INCLUDE (SalesOrderDetailID);
GO

Using the table I showed you in the previous post, I can capture the output of DBCC IND and find an upper level page from each nonclustered  index.

TRUNCATE TABLE sp_index_info
INSERT INTO sp_index_info
    EXEC ('DBCC IND ( testdb, Sales, -1)'  );
GO

SELECT PageFID, PagePID, IndexID
FROM sp_index_info
WHERE IndexID > 1 AND IndexLevel > 0
  AND PrevPagePID = 0
ORDER BY IndexID;
GO

Here are my results:

PageFID PagePID     IndexID
------- ----------- -------
1       4056        2
1       904         3
1       1888        4

Now use DBCC PAGE to look at each of these pages

DBCC TRACEON (3604);
DBCC PAGE(testdb, 1, 4056, 3);
DBCC PAGE(testdb, 1, 904, 3);
DBCC PAGE(testdb, 1, 1888, 3);

image

You can see that all 3 indexes are identical, even at the non-leaf level. Of course, the Child Pages they point to are different, because they are different indexes on different pages. But the rows and columns on each page are the same.

Each nonclustered index has to store the clustered key. And since my clustered index is unique, and the nonclustered indexes aren't declared as unique, the clustered key MUST be part of the key of the nonclustered indexes. Even though the 3rd nonclustered tries to make the clustered key an INCLUDE column, since it is needed for the key, it is treated as a key. It will not be stored twice just to make is also an INCLUDE column.

 

So now I must have said all that could be said about nonclustered index keys, right?

Maybe not….

Have fun!

~Kalen

Published Sunday, March 07, 2010 5:04 PM by Kalen Delaney
Filed under: , ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Fabricio said:

Very interesting post.

March 7, 2010 7:51 PM
 

James Luetkehoelter said:

Kalen, the day you have said all there is to say about indexes or pages or extents or hobts is the day I cease being a cynical and take everything said to me at face value :)

March 7, 2010 7:59 PM
 

Rob Farley said:

Kalen - hopefully you can clear something up for me which I keep wondering about, but never when I'm somewhere I can easily check. On a SQL 2000 box (prior to included columns being available), can I simulate included columns by making the keys hit a unique point. Such as (Col1, Col2, UniqueKey, Col3, Col4). In SQL 2000, do Col3 and Col4 get treated as included columns, since they offer no additional functionality to the key?

March 7, 2010 8:33 PM
 

Kalen Delaney said:

James -- is that a good thing or a bad thing?

Rob -- Sorry, you'll have to test this out for yourself; I have no SQL 2000 installations available. I am 99% sure of my answer, but no 100% and that is that once you actually list the column as part of the index definition, it is a KEY column.  

March 7, 2010 11:02 PM
 

Rob Farley said:

No worries. My theory is that the database engine should be able to see that a particular set of columns is unique, and then figure that everything after that in the key definition can be left to appear only at the leaf level. But I'll do some research on that and write a blog post myself.

March 7, 2010 11:48 PM
 

James Luetkehoelter said:

I dunno Kalen, it was meant as a compliment to you and sort of a bash on myself - more or less when you stop being you I'll stop being me :)

March 8, 2010 8:31 AM
 

Uri Dimant said:

Kalen,thank you for that great post

My best congratulations to you and all female.Happy 8 march, dear lady's.

March 8, 2010 8:40 AM
 

Martin said:

I understand that at the leaf level each index row needs to be able to be uniquely mapped to a row in the table but why does it matter to SQL Server that the key values in the higher level pages are unique?

If one leaf level page was full of index records for LastName "Smith" and the next leaf level page also contained records for "Smith" what problem would be caused by simply having the parent index page having two "Smith" values?

February 20, 2011 10:34 AM
 

Martin said:

... Actually regarding my comment above I assume that this is just for delete/update performance. If an Update/Delete is carried out on a table row it would need some way of efficiently locating the corresponding non clustered index row?

February 20, 2011 12:39 PM
 

Kalen Delaney said:

Hi Martin... I think you got it! I actually had to think about it for a bit after your first question, and finally remembered learning way back in SQL 4.x, when I asked a similar question about why upper levels needed RIDs in them when we never went right from the upper levels to the rows, that it was to help optimize index maintenance.

Thanks for reading!

~Kalen

February 21, 2011 1:11 PM
 

guest said:

thanx for good article

August 3, 2011 10:46 PM
 

Marty Solomon said:

For a nonclustered nonunique index in SQL Server 2008, is there any feature like the key compression feature of Oracle, which eliminates the redundancy of repeating a "secondary key" value n times in the case that there are n data records match that same secondary key value?

Based on the following from http://msdn.microsoft.com/en-us/library/aa174537(v=sql.80).aspx,

it looks like this was automatically accomplished in SQL Server 2000 by having more than one pointer in the leaf-level index rows:

Nonclustered Indexes

Nonclustered indexes have the same B-tree structure as clustered indexes, with two significant differences:

•The data rows are not sorted and stored in order based on their nonclustered keys.

•The leaf layer of a nonclustered index does not consist of the data pages. Instead, the leaf nodes contain index rows. Each index row contains the nonclustered key value and one or more row locators that point to the data row (or rows if the index is not unique) having the key value.

P.S. I'm a computer science professor specializing in database for over 30 years, and it is clear to me that your articles and books are the best!

January 25, 2012 8:36 AM
 

Kalen Delaney said:

Hi Marty

Having never looked at Oracle internals, I can't say what is or is not 'like' an Oracle feature. But in general, I think the answer is that in general, there is no automatic key compression, but as of SQL Server 2008, we do have the ability to compress an index, and that would definitely take advantage of many keys referencing the same rows.

I'm not sure what you're seeing in the documentation, but I do not see the definition you quoted as giving us any kind of redundancy elimination,

Regards,

Kalen

January 25, 2012 10:17 PM
 

Marty Solomon said:

Kalen,

The phrase in the cited SQL Server 2000 documentation that caused me to view a leaf index row  as containing one (nonclustered) key value together with a pointer to each data row that matches that key value, is the parenthetical (or rows if the index is not unique) in:

Each index row contains the nonclustered key value and one or more row locators that point to the data row (or rows if the index is not unique) having the key value.

Due to my lack of knowledge concerning SQL Server, I don't know if that parenthetical was not really true (as understand it) for SQL Server 2000 or if the story changed with subsequent versions of SQL Server.  

For SQL Server 2008, would row or page compression eliminate the repeated key values?

My interest in this that I'm comparing the Oracle and SQL Server storage strucures in class, and I don't want to provide incorrect information.

Thanks for your time,

Marty

January 26, 2012 9:08 AM
 

Kalen Delaney said:

Marty

I think if you work through the examples in this blog post that actually show you how to view the indexes rows and pages, and the earlier post that began this topic, you would be able to answer your questions on your own.

In the nonclustered leaf level, there is one row for each data row. If there are duplicates, there is one whole index row for each occurrence of the duplicate values. There is never one index key followed by an arbitrary number of row locators. In fact, that would not even be in 1NF as there would be a repeating group.

Page compression in SQL 2008 will not elimate the repeated values per se, but it will store them in a more storage-efficient format, with a single dictionary entry per page, and then a single byte referencing the dictionary entry for the repeated values.

There are many blog posts that describe how SQL Server compression works, and I go into detail in my Internals book as well.

Regards,

Kalen

January 26, 2012 12:43 PM
 

Marty Solomon said:

Kalen,

In order for a DBMS to be relational, its physical structures, such as indexes, do not have to be flat (i.e., in 1NF), so the SQL Server 2000 parenthetical could conceivably have been correct in its day (not that it actually was).

But you nicely answered my second question, in that you described exactly how SQL 2008 page compression will lessen the duplicate key redundancy.

Thanks,

Marty

January 26, 2012 2:31 PM
 

Kalen Delaney said:

I understand that the physical structures don't have to be in any particular structure (I was a University Computer Science Lecturer back when dinosaurs roamed the earth) but in SQL Server they are.

Best regards,

Kalen

January 26, 2012 2:44 PM
 

Rajesh said:

Hi Kalen,

I still beleive that every page of non clustred index, no matter whether it is leaf or non leaf, always contains clustered index keys columns along with non cluster keys, whether clustred index is unique or not.

Is it so?

September 28, 2012 9:22 AM
 

Kalen Delaney said:

HI Rajesh

Did you read the other nonclustered index article that I pointed to at the top of this one? I think that answers your question.

~Kalen

September 28, 2012 4:07 PM
 

Ramana said:

Hi Kalen,

   When is it required to include columns for a non clustered index? Is it always a good idea to include all columns from select query excluding clustered index and non clustered index columns?

What would happen when clustered index on a table and unique vs non unique values?

     Thank you very much your help.

thanks

ramana

November 21, 2012 12:17 PM
 

Ramana said:

i meant to ask

What would happen when clustered index on a table is unique vs non unique?

November 21, 2012 12:35 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement