THE SQL Server Blog Spot on the Web
Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Paul Nielsen

www.SQLServerBible.com

Find Duplicate Indexes

These queries mix a few basic DMVs with a little relational division to indentify every set of duplicate indexes.
 
This is the result of a collaborative effort. I started with a group by version of this query about a week ago and posted it on the private MVP Newsgroups. Itzik Ben-Gan replied with a pretty cool variation, and then I made some minor edits to handle partial dups and include column logic.
 
The first query finds exact matches. The indexes must have the same key columns in the same order, and the same included columns but in any order. These indexes are sure targets for elimination. The only caution would be to check for index hints. 
 
-- exact duplicates
with indexcols as
(
select object_id as id, index_id as indid, name,
(select case keyno when 0 then NULL else colid end as [data()]
from sys.sysindexkeys as k
where k.id = i.object_id
and k.indid = i.index_id
order by keyno, colid
for xml path('')) as cols,
(select case keyno when 0 then colid else NULL end as [data()]
from sys.sysindexkeys as k
where k.id = i.object_id
and k.indid = i.index_id
order by colid
for xml path('')) as inc
from sys.indexes as i
)
select
object_schema_name(c1.id) + '.' + object_name(c1.id) as 'table',
c1.name as 'index',
c2.name as 'exactduplicate'
from indexcols as c1
join indexcols as c2
on c1.id = c2.id
and c1.indid < c2.indid
and c1.cols = c2.cols
and c1.inc = c2.inc;
 
The second variation of this query finds partial, or duplicate, indexes that share leading key columns, e.g. Ix1(col1, col2, col3) and Ix2(col1, col2) would be considered duplicate indexes. This query only examines key columns and does not consider included columns.
These types of indexes are probable dead indexes walking.

-- Overlapping indxes
with indexcols as
(
select object_id as id, index_id as indid, name,
(select case keyno when 0 then NULL else colid end as [data()]
from sys.sysindexkeys as k
where k.id = i.object_id
and k.indid = i.index_id
order by keyno, colid
for xml path('')) as cols
from sys.indexes as i
)
select
object_schema_name(c1.id) + '.' + object_name(c1.id) as 'table',
c1.name as 'index',
c2.name as 'partialduplicate'
from indexcols as c1
join indexcols as c2
on c1.id = c2.id
and c1.indid < c2.indid
and (c1.cols like c2.cols + '%'
or c2.cols like c1.cols + '%') ;

Be careful when dropping a partial duplicate index if the two indexes differ greatly in width. For example, if Ix1 is a very wide index with 12 columns, and Ix2 is a narrow two-column index that shares the first two columns, you may want to leave Ix2 as a faster, tighter, narrower index.
 
Subscribe to my free e-Newsletter.

Published Wednesday, June 25, 2008 12:47 AM by Paul Nielsen
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

albTotxo said:

Hi, Paul

Very useful scripts, but I've noticed there is a problem in the second one. If you have indexes with an index_id over 9 it can return false positives. For instance, if you have an index on column 1 and another one on columns 15 16 it will return them as overlapping, but they're not.

I think changing the last two lines for these solves the problem:

and ((c1.cols like c2.cols + '%' and SUBSTRING(c1.cols,LEN(c2.cols)+1,1) = ' ')

or (c2.cols like c1.cols + '%' and SUBSTRING(c2.cols,LEN(c1.cols)+1,1) = ' ')) ;

I added a check to verify if the char next to the last in the column set is an space, which means it is not a false positive.

June 25, 2008 4:28 AM
 

albTotxo said:

One more thing:

I think it woulk make sense to add one more condition to delete from the resultset the indexes that are exact duplicates, because they are covered by the first script, so the last part of the script would be something like this:

and c1.cols <> c2.cols -- remove exact duplicates

and ((c1.cols like c2.cols + '%' and SUBSTRING(c1.cols,LEN(c2.cols)+1,1) = ' ')

or (c2.cols like c1.cols + '%' and SUBSTRING(c2.cols,LEN(c1.cols)+1,1) = ' ')) ;

June 25, 2008 6:04 AM
 

csm said:

Very, very useful script, really. But it could be more if it includes the use of the indexes (user_seeks, user_scans, user_lookups and user_updates) from sys.dm_db_index_usage_stats's dmv.

The second script shows those indexes that could be dropped. But, in case you're not sure about drop it or not, the use of that index could answer any doubt.

June 25, 2008 9:12 AM
 

Paul Nielsen said:

albTotxo

- Thanks for the over 9 check, good call.

- I considered filtering out exact matches from the partial matches, but decided against becasue a shop might employ only the partial dups query and a DBA later on might miss eact matches.

csm

- I considered it, maybe in the next version. Even if the partial is being used sometimes, it's still probably a good candidate to drop.

Originally I set out to creaste a stored procedure that would automatically drop exact dups and close partial dups. Is this something that would be useful?

-Paul

June 25, 2008 11:54 AM
 

Phillip Senn said:

Happy birthday, Paul!

How many duplicates is this?

June 25, 2008 2:15 PM
 

Merrill Aldrich said:

June 25, 2008 6:53 PM
 

Merl said:

June 25, 2008 6:57 PM
 

Steve Dassin said:

If only developers had to be concerned with keys instead of indexes. Would you write a blog about duplicate keys? Those damn relational ideas :-)

June 26, 2008 1:04 AM
 

Log Buffer #103: a Carnival of the Vanities for DBAs said:

June 27, 2008 12:40 PM
 

Carl Federl said:

From some testing, noticed that there were some statistics that matched some indexes but not sure of the impact.

Tried to use the provided SQL and had a number of problems:

1.  The SQL does not work with SQL Server version 7 - as I am supporting 40 SQL Server 7 and only 7 with 2005, this lack of functionallity is critical to my environment.

2.  Performance of the SQL when there are a lot of indexes is not optimal.

Here is a different SQL Solution that uses a traditional divide solution of "the count of the index columns" is equal to "the count of the matching index columns"  As you can see, the traditional DIVIDE uses about half the resources of the XML solution:

Match    Solution CPU Logical IO Duration

Full XML 1362 125909 1533

Full DIVIDE 470 60965 712

Partial  XML 1161 93465 1953

Partial Divide 360 48237 524

-- Create test data:

-- For the TALLY table, see http://www.sqlservercentral.com/articles/TSQL/62867/

declare @CreateTableSQL varchar(max)

set @CreateTableSQL =

'CREATE TABLE dbo.Foo_!seq

(c1 int NOT NULL,c2 int NOT NULL,c3 int NOT NULL,c4 int NOT NULL

, CONSTRAINT foo_!seq_p PRIMARY KEY CLUSTERED (c1,c2)

, CONSTRAINT foo_!seq_u1 UNIQUE NONCLUSTERED (c1,c2,c3))

CREATE NONCLUSTERED INDEX foo_!seq_x1 ON dbo.foo_!seq (c1 ,c2 )

CREATE NONCLUSTERED INDEX foo_!seq_x2 ON dbo.foo_!seq (c1 ,c2 ) INCLUDE ( c3 ,c4)

CREATE STATISTICS foo_!seq_s1 ON dbo.foo_!seq (c1 ,c2 )'

--select @CreateTableSQL

IF OBJECT_ID('tempdb..#DDL') is not null drop table #DDL

select  REPLACE(@CreateTableSQL  ,'!seq',cast(tally.n as varchar(8))) as SQLStmt

into #DDL

from master.dbo.tally as tally

where tally.n between 1 and 1000

exec dbo.sp_execresultset @cmd = 'select SQLStmt from #DDL'

select count(*) from information_schema.tables

-- Match on all index columns

select sysusers.name as SchemaName

, sysobjects.name as TableName

, sysindexes.name as IndexName

, DupIndex.name As DuplicateIndexName

, sysindexes.indid as IndexInd

, DupIndex.indid as DuplicateIndid

FROM sysusers

join sysobjects

on sysobjects.uid = sysusers.uid

join sysindexes

on sysindexes.id = sysobjects.id

join (

select sysindexkeys.id

, sysindexkeys.indid

--, object_name(sysindexkeys.id) -- For Debugging

, count(*) as BaseKeyCnt

from dbo.sysindexkeys

where sysindexkeys.indid not in (0,255)

group by sysindexkeys.id

, sysindexkeys.indid

-- order by sysindexkeys.id , sysindexkeys.indid -- For Debugging

) as IndexKeys

on IndexKeys.id = sysindexes.id

and IndexKeys.indid = sysindexes.indid

join (

select sysindexkeys.id

, sysindexkeys.indid

, DupIndexkeys.indid as Dupindid

, count(*) as DupKeyCnt

--, object_name(sysindexkeys.id) -- For Debugging

from dbo.sysindexkeys

join dbo.sysindexkeys as DupIndexkeys

on  DupIndexkeys.id = sysindexkeys.id

and DupIndexkeys.indid > sysindexkeys.indid

and DupIndexkeys.colid = sysindexkeys.colid

and DupIndexkeys.keyno = sysindexkeys.keyno

where sysindexkeys.indid not in (0,255)

and DupIndexkeys.indid not in (0,255)

group by sysindexkeys.id

, sysindexkeys.indid

, DupIndexkeys.indid

-- order by sysindexkeys.id , sysindexkeys.indid , DupIndexkeys.indid -- For Debugging

) as DupKeys

on  DupKeys.id = IndexKeys.id

and DupKeys.indid = IndexKeys.indid

and DupKeys.DupKeyCnt = IndexKeys.BaseKeyCnt

join sysindexes as DupIndex

on DupIndex.id = DupKeys.id

and DupIndex.indid = DupKeys.Dupindid

-- Match on first two index columns:

select sysusers.name as SchemaName

, sysobjects.name as TableName

, sysindexes.name as IndexName

, DupIndex.name As DuplicateIndexName

, DupKeys.DupKeyCnt As DuplicateLeadingColumnCnt

, sysindexes.indid as IndexInd

, DupIndex.indid as DuplicateIndid

FROM sysusers

join sysobjects

on sysobjects.uid = sysusers.uid

join sysindexes

on sysindexes.id = sysobjects.id

join (

select sysindexkeys.id

, sysindexkeys.indid

, DupIndexkeys.indid as Dupindid

, count(*) as DupKeyCnt

--, object_name(sysindexkeys.id) -- For Debugging

from dbo.sysindexkeys

join dbo.sysindexkeys as DupIndexkeys

on  DupIndexkeys.id = sysindexkeys.id

and DupIndexkeys.indid > sysindexkeys.indid

and DupIndexkeys.colid = sysindexkeys.colid

and DupIndexkeys.keyno = sysindexkeys.keyno

where sysindexkeys.indid not in (0,255)

and DupIndexkeys.indid not in (0,255)

and sysindexkeys.keyno > 0

and DupIndexkeys.keyno > 0

group by sysindexkeys.id

, sysindexkeys.indid

, DupIndexkeys.indid

HAVING COUNT(*) >= 2

-- order by sysindexkeys.id , sysindexkeys.indid , DupIndexkeys.indid -- For Debugging

) as DupKeys

on  DupKeys.id = sysindexes.id

and DupKeys.indid = sysindexes.indid

join sysindexes as DupIndex

on DupIndex.id = DupKeys.id

and DupIndex.indid = DupKeys.Dupindid

June 30, 2008 9:19 AM
 

Paul Nielsen said:

Hi Carl, Glad to see you here - long time !

My first pass at this query was the relational division (is count = count) method. Itzik suggested the XML path() method and what I like about it is that it's very elegent in testing for partial matches in the order of the key columns. But I'll take a hard look at what you've written as well - always looking for a better way. But as you point out the most elegent solution sometimes doesn't perform.

June 30, 2008 11:56 PM
 

zillablog said:

Index pruning Last week I ran across two different blog posts discussing removing duplicate and useless indexes from a database. Coincidentally, I have a nagging TODO item to clean up some indexes in the schema on one of the applications we have been

July 4, 2008 11:20 AM
 

Maildir.net Mailling list archive » Blog Archives » Robert Treat: Index pruning techniques said:

July 13, 2008 4:20 PM

Leave a Comment

(required) 
(optional)
(required) 
Submit

About Paul Nielsen

Paul Nielsen believes SQL is the romance language of data. As such he’s a hands-on database developer, Microsoft SQL Server MVP, trainer, and author of SQL Server Bible series (Wiley). As a data architect, he developed the concepts of Smart Database Design and Nordic – an open source O/R dbms for SQL Server. You may have seen Paul speaking at a PASS Summit, SQL Open in Denmark, or SQL Teach in Canada. He lives in Colorado Springs.

This Blog

Syndication

News

news item test
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement