When an issue comes up more than once in a short period of time, I figure it is a message from the Universe. I might even assume it is a gift of blog fodder. So I won't turn down the gift, even if it means another juicy post before the end of year, much to Denis' disappointment.
The issue is SQL Server 2005 "Included Columns". There was a post on the public newsgroups about them a couple of days ago, and just yesterday, there was a similar post on the SQL Server MVP private newsgroup. Simply put, the questions were expressing concern that there were no built-in tools in SQL Server 2005 to list which columns were included columns. The information is of course available in the metadata views, but it can require a join of at least three of them to get a simple list of which columns are "included columns" in a given index. The poster on the MVP newsgroup wanted to know if someone had already written such a query and could share it, to save him some work.
There actually was a third post about included columns that I came across, that was asking about the reasons for using included columns. I will address that question first, and then provide some scripts to return information about included columns.
You need to be aware of two 'features' of indexes to fully appreciate included columns.
First: all indexes have a limit of no more than 16 key columns, with a combined total of no more than 900 bytes.
Second: a 'covering' index, which is a nonclustered index that contains all the columns referenced in a query from one table, can provide an incredible performance advantage. If all the information a query needs is contained in the index keys, SQL Server will never need to actually access the table data, and not having to do this table lookup can be a very good thing. (Disclaimer: covering indexes are not the solution to ALL query performance problems, and I don't have time today to provide a full discussion of covering indexes.)
SQL Server 2005 allows you to get around the 16 column and 900 byte limit and add additional column to a nonclustered index to provide greater opportunity for covering indexes. The syntax would look something like this:
CREATE INDEX bigindex on mybigrowtable(keycolumn) INCLUDE (bigcolumn1, bigcolumn2)
If all the columns you would like to have in your index fit within the limits, there is no technical reason for having included columns; all your columns could just be regular key columns. However, there are some other issues to consider.
1) Space requirements: Normal key columns, as part of the key, are propagated up through all levels of your indexes. So if you have some very large columns, even if they fit in the 900 byte limit, you can save space by defining them as included columns. Here is an example, using the AdventureWorks database:
USE AdventureWorks
-- set the database to bulk_logged recovery
-- prior to copying tables
ALTER DATABASE AdventureWorks
SET RECOVERY bulk_logged
GO
---------------------
IF ( OBJECT_ID('Person.Address1') is not null)
DROP TABLE Person.Address1
GO
IF ( OBJECT_ID('Person.Address2') is not null)
DROP TABLE Person.Address2
--------------------------------------
-- Create two copies of the Person.Address table and enlarge one of the columns
SELECT * INTO Person.Address1
FROM Person.Address
GO
ALTER TABLE Person.Address1
ALTER COLUMN AddressLine2 nchar(350)
GO
-- Create an index with two included columns
CREATE INDEX IX_Address_City
on [Person].[Address1] (City, StateProvinceID)
INCLUDE(AddressLine1, AddressLine2)
GO
-------------------
SELECT * INTO Person.Address2
FROM Person.Address
GO
ALTER TABLE Person.Address2
ALTER COLUMN AddressLine2 nchar(350)
GO
-- Create a 'regular' index with no included columns;
-- All four columns are keys, and included at all index levels
CREATE INDEX IX_Address_City
on [Person].[Address2]
(City, StateProvinceID, AddressLine1, AddressLine2)
GO
-- Note that the Address2 index uses about 1.5 MB
-- more than the index on Address1, because it contains
-- all 4 keys in all levels of the index.
EXEC sp_spaceused 'Person.Address1'
EXEC sp_spaceused 'Person.Address2'
GO
2) Maintenance costs: Normal key columns are maintained in sorted order. An index on (lastname, state, city) would have 3 sort columns and every new row would have to be put in the proper location based on all 3 columns. But an index on lastname, with (state, city) as included columns, would not have to maintain sort order on state and city and that could improve the performance of data modification operations. I haven't done any tests yet to determine how much savings you might realize, but it will be faster with included columns.
The downside of included columns not being sorted, is that those columns will not be useful for searching if those columns are in the WHERE clause, in particular if those columns are using in an inequality expression.
So now, how can you retrieve information about which columns are included and which are key columns? Sp_helpindex does not display this information, in fact, it makes no mention of included columns. The information is available in a catalog view called sys.index_columns, in a column called 'is_included_column'. The code below creates a view that you can use to return information about all the columns in your indexes.
CREATE VIEW index_column_info
AS
SELECT object_name = object_name(ic.object_id),
index_name = i.name,
'column' = c.name,
'column usage' = CASE ic.is_included_column
WHEN 0 then 'KEY'
ELSE 'INCLUDED'
END
FROM sys.index_columns ic JOIN sys.columns c
ON ic.object_id = c.object_id
AND ic.column_id = c.column_id
JOIN sys.indexes i
ON i.object_id = ic.object_id
AND i.index_id = ic.index_id
You can select from this view and supply a table name and/or an index name or id in a WHERE clause:
SELECT * FROM index_column_info
WHERE object_name = 'Address1'
A student in one of my classes earlier this year wanted a way to get input back that looked like the sp_helpindex output. I am attaching a script to build a stored procedure called new_helpindex which will do that. However, the new procedure is used a bit differently than sp_helpindex. sp_helpindex requires a schema name and new_helpindex does not. So to get information about my new Address1 table created above, the two procedures would be called as shown:
EXEC new_helpindex 'address1'
EXEC sp_helpindex 'person.address1'
If you want the new_helpindex procedure to accept schema names, and to return exactly the same details as sp_helpindex (plus the included columns), it is certainly possible, and I leave that as an exercise to my readers.
Have fun!
~Kalen