THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Merrill Aldrich

Technical content about Microsoft data technologies. All opinions expressed are purely my own and do not reflect positions of my employer or associates.

Simple Redundant Index Query

I have been working with a customer that has an existing database for a home-grown application, and like many databases it’s had various tuning efforts over a period of time. Those were not necessarily coordinated, which resulted in indexes with some redundancy – and the database has tables that are into billions of rows, so the cost of that redundancy in RAM and storage was not optimal. Some of these indexes were added by the Database Engine Tuning Advisor to target specific queries, so they tended to be wide/covering and also similar to one another.

Several years ago Kimberly Tripp did a great blog series on this and also published some stored procedure code to locate duplicate indexes. I have a “lighter duty” query that does some of this, and I thought I’d share as long as I am doing this work. If you want all the details on what makes an index redundant, she does a wonderful explanation.

The query starts with a simple interrogation of sys.index_columns to list the columns participating in different components of the index (for example, the key columns, the included columns, the cluster key.) This example is for the key columns for every user table index:

SELECT 
    s.name SchName,
    o.name ObjName,
    i.name IndName,
    i.index_id,
    c.name  
        + case when is_descending_key > 0 then ' (D)' else '' end 
        + case when partition_ordinal > 0 then ' {P}' else '' end 
    as ColumnName,
    ic.key_ordinal
FROM sys.index_columns ic
    JOIN sys.columns c on ic.object_id = c.object_id and ic.column_id = c.column_id
    JOIN sys.objects o on ic.object_id = o.object_id
    JOIN sys.indexes i on i.object_id = ic.object_id and i.index_id = ic.index_id
    JOIN sys.schemas s on o.schema_id = s.schema_id 
WHERE
    o.type = 'U'
    and ic.is_included_column = 0;

Ideally the output I want is in a pivoted form – to paste into a spreadsheet – where the columns in each role in an index are listed from left to right instead of top to bottom. The indexes also should be listed in an order where the redundancy is apparent, i.e. the indexes that are similar are together in the list. To get to that output, we need to repeat a query like the one above for each “role” that is represented in an index (the key, the includes, the clustering key, partitioning) and we need to pivot the column names from a single column out into multiple ordered columns left to right. Having this type of output allows one to scan down the list of indexes in Excel, mark the indexes that are similar, and then quickly develop a design to merge them together or remove the unnecessary ones.

The resulting query is built on top of variations of the simple one above but has some complexity, as it ends up being a pivot for each subset of columns that make up an index, wrapped in a CTE to join and order the complete list:

with KeyColumns as (
    SELECT 
        SchName,
        ObjName,
        IndName,
        index_id,
        has_filter,
        STUFF (
            ISNULL ( ', ' + [1], '' ) 
            + ISNULL ( ', ' + [2], '' ) 
            + ISNULL ( ', ' + [3], '' ) 
            + ISNULL ( ', ' + [4], '' ) 
            + ISNULL ( ', ' + [5], '' ) 
            + ISNULL ( ', ' + [6], '' ) 
            + ISNULL ( ', ' + [7], '' ) 
            + ISNULL ( ', ' + [8], '' ) 
            + ISNULL ( ', ' + [9], '' ) 
            + ISNULL ( ', ' + [10], '' )
            + ISNULL ( ', ' + [11], '' )
            + ISNULL ( ', ' + [12], '' )
            + ISNULL ( ', ' + [13], '' )
            + ISNULL ( ', ' + [14], '' )
            + ISNULL ( ', ' + [15], '' )
            + ISNULL ( ', ' + [16], '' )
            , 1, 2, '' ) 
        as Keys,
        [1] k1, [2] k2, [3] k3, [4] k4, [5] k5, [6] k6, [7] k7, [8] k8, [9] k9, [10] k10,
        [11] k11, [12] k12, [13] k13, [14] k14, [15] k15, [16] k16
    FROM (
        SELECT 
            s.name SchName,
            o.name ObjName,
            i.name IndName,
            i.index_id,
            i.has_filter,
            c.name  
                + case when is_descending_key > 0 then ' (D)' else '' end 
                + case when partition_ordinal > 0 then ' {P}' else '' end 
            as ColumnName,
            ic.key_ordinal
        FROM sys.index_columns ic
            JOIN sys.columns c on ic.object_id = c.object_id and ic.column_id = c.column_id
            JOIN sys.objects o on ic.object_id = o.object_id
            JOIN sys.indexes i on i.object_id = ic.object_id and i.index_id = ic.index_id
            JOIN sys.schemas s on o.schema_id = s.schema_id 
        WHERE
            o.type = 'U'
            and ic.is_included_column = 0
    ) as IndexMeta
    PIVOT (
        max( ColumnName )
        FOR key_ordinal in ( [1], [2], [3], [4], [5], [6], [7], [8], [9], [10],
            [11], [12], [13], [14], [15], [16] )
    ) as pvt
), 
IncludedColumns as (
    SELECT 
        SchName,
        ObjName,
        IndName,
        index_id,
        STUFF (
            ISNULL ( ', ' + [1], '' ) 
            + ISNULL ( ', ' + [2], '' ) 
            + ISNULL ( ', ' + [3], '' ) 
            + ISNULL ( ', ' + [4], '' ) 
            + ISNULL ( ', ' + [5], '' ) 
            + ISNULL ( ', ' + [6], '' ) 
            + ISNULL ( ', ' + [7], '' ) 
            + ISNULL ( ', ' + [8], '' ) 
            + ISNULL ( ', ' + [9], '' ) 
            + ISNULL ( ', ' + [10], '' )
            + ISNULL ( ', ' + [11], '' )
            + ISNULL ( ', ' + [12], '' )
            + ISNULL ( ', ' + [13], '' )
            + ISNULL ( ', ' + [14], '' )
            + ISNULL ( ', ' + [15], '' )
            + ISNULL ( ', ' + [16], '' )
            , 1, 2, '' ) 
        as IncludedCols,
        [1] i1, [2] i2, [3] i3, [4] i4, [5] i5, [6] i6, [7] i7, [8] i8, [9] i9, [10] i10,
        [11] i11, [12] i12, [13] i13, [14] i14, [15] i15, [16] i16
    FROM (
        SELECT 
            s.name SchName,
            o.name ObjName,
            i.name IndName,
            i.index_id,
            case when partition_ordinal > 0 then c.name + ' {P}' else c.name end as ColumnName,
            ic.index_column_id --,
            --ic.key_ordinal
        FROM sys.index_columns ic
            JOIN sys.columns c on ic.object_id = c.object_id and ic.column_id = c.column_id
            JOIN sys.objects o on ic.object_id = o.object_id
            JOIN sys.indexes i on i.object_id = ic.object_id and i.index_id = ic.index_id
            JOIN sys.schemas s on o.schema_id = s.schema_id 
        WHERE
            o.type = 'U'
            and ic.is_included_column = 1
    ) as IndexMeta
    PIVOT (
        max( ColumnName )
        FOR index_column_id in ( [1], [2], [3], [4], [5], [6], [7], [8], [9], [10],
            [11], [12], [13], [14], [15], [16] )
    ) as pvt
),
ClusterKeyColumns as (
    SELECT 
        SchName,
        ObjName,
        IndName,
        index_id,
        STUFF (
            ISNULL ( ', ' + [1], '' ) 
            + ISNULL ( ', ' + [2], '' ) 
            + ISNULL ( ', ' + [3], '' ) 
            + ISNULL ( ', ' + [4], '' ) 
            + ISNULL ( ', ' + [5], '' ) 
            + ISNULL ( ', ' + [6], '' ) 
            + ISNULL ( ', ' + [7], '' ) 
            + ISNULL ( ', ' + [8], '' ) 
            + ISNULL ( ', ' + [9], '' ) 
            + ISNULL ( ', ' + [10], '' )
            + ISNULL ( ', ' + [11], '' )
            + ISNULL ( ', ' + [12], '' )
            + ISNULL ( ', ' + [13], '' )
            + ISNULL ( ', ' + [14], '' )
            + ISNULL ( ', ' + [15], '' )
            + ISNULL ( ', ' + [16], '' )
            , 1, 2, '' ) 
        as ClusterKeys,
        [1] ck1, [2] ck2, [3] ck3, [4] ck4, [5] ck5, [6] ck6, [7] ck7, [8] ck8, [9] ck9, [10] ck10,
        [11] ck11, [12] ck12, [13] ck13, [14] ck14, [15] ck15, [16] ck16
    FROM (
        SELECT 
            s.name SchName,
            o.name ObjName,
            i.name IndName,
            i.index_id,
            c.name ColumnName,
            ic.index_column_id --,
            --ic.key_ordinal
        FROM sys.index_columns ic
            JOIN sys.columns c on ic.object_id = c.object_id and ic.column_id = c.column_id
            JOIN sys.objects o on ic.object_id = o.object_id
            JOIN sys.indexes i on i.object_id = ic.object_id and i.index_id = ic.index_id
            JOIN sys.schemas s on o.schema_id = s.schema_id 
        WHERE
            o.type = 'U'
            and i.index_id = 1
    ) as IndexMeta
    PIVOT (
        max( ColumnName )
        FOR index_column_id in ( [1], [2], [3], [4], [5], [6], [7], [8], [9], [10],
            [11], [12], [13], [14], [15], [16] )
    ) as pvt
)
SELECT Kc.SchName, 
    kc.ObjName,
    kc.IndName,
    kc.Keys,
    kc.has_filter,
    --kc.k1,
    --kc.k2,
    --kc.k3,
    --kc.k4,
    --kc.k5,
    --kc.k6,
    --kc.k7,
    --kc.k8,
    --kc.k9,
    --kc.k10,
    --kc.k11,
    --kc.k12,
    --kc.k13,
    --kc.k14,
    --kc.k15,
    --kc.k16
    Incl.IncludedCols,
    --Incl.i1, 
    --Incl.i2,
    --Incl.i3,
    --Incl.i4,
    --Incl.i5,
    --Incl.i6,
    --Incl.i7,
    --Incl.i8,
    --Incl.i9,
    --Incl.i10,
    --Incl.i11,
    --Incl.i12,
    --Incl.i13,
    --Incl.i14,
    --Incl.i15,
    --Incl.i16
    Clk.ClusterKeys --,
    --Clk.ck1,
    --Clk.ck2,
    --Clk.ck3,
    --Clk.ck4,
    --Clk.ck5,
    --Clk.ck6,
    --Clk.ck7,
    --Clk.ck8,
    --Clk.ck9,
    --Clk.ck10,
    --Clk.ck11,
    --Clk.ck12,
    --Clk.ck13,
    --Clk.ck14,
    --Clk.ck15,
    --Clk.ck16
FROM KeyColumns Kc
 LEFT JOIN IncludedColumns Incl
   ON Kc.SchName = Incl.SchName
      AND Kc.ObjName = Incl.ObjName
      AND Kc.IndName = Incl.IndName
 LEFT JOIN ClusterKeyColumns Clk
   ON Kc.SchName = Clk.SchName
      AND Kc.ObjName = Clk.ObjName
      AND Kc.IndName != Clk.IndName
ORDER BY SchName, ObjName, K1, K2, K3, K4;

Notes:

  1. Pivot obviously has a limit to the number of values that can be rotated. I wrote this query out to 16 columns, feeling like that certainly met the requirements for this project, and probably for most databases. If you have more than 16 columns in indexes you may have more pressing issues :-).
  2. The query STUFF()s the list of column names, comma separated, into one column for the keys, one for the included columns, and one for the cluster keys, which are implicitly included in a nonclustered index over a clustered table. If you would rather have them output in separate columns, it’s possible to just uncomment the distinct columns in the select list above.
  3. Columns that are descending are labeled with a (D) suffix in the output – the order can matter a great deal when combining indexes that otherwise look similar.
  4. For partitioned tables the partitioning columns are labeled with a {P} suffix.

I hope this is useful or can be incorporated into your index optimization toolset.

Published Thursday, December 04, 2014 1:33 PM by merrillaldrich

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Mark said:

Useful query, however I'd recommend including `has_filter` from sys.indexes on the resultset, as you may have two similar indexes filtering on a different value.

This wouldn't be immediately obvious with the output from your query and could lead you to accidentally combine two indexes that should be kept separate

December 5, 2014 6:54 AM
 

merrillaldrich said:

Excellent idea, Mark. I've updated the query to add that.

December 6, 2014 11:15 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Privacy Statement