THE SQL Server Blog Spot on the Web
Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Alexander Kuznetsov

Performance of inserts and IGNORE_DUP_KEY


If your data has a small percentage of duplicates, then IGNORE_DUP_KEY may speed up your inserts. For larger amounts of duplicates, IGNORE_DUP_KEY may slow them down significantly. I set up two tables, stripping down all the irrelevant details, as follows:

 

CREATE TABLE t1(n INT NOT NULL PRIMARY KEY)

GO

CREATE TABLE [dbo].[t2](

      [n] [int] NOT NULL,

PRIMARY KEY CLUSTERED

(

      [n] ASC

)WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

 

If the incoming data had no duplicates, the performance of both inserts was consistently the same:

 

INSERT t1(n)

SELECT n FROM dbo.Numbers

 

INSERT t2(n)

SELECT n FROM dbo.Numbers

 

(Note that dbo. Numbers has 1 million rows.)  Of course, I always truncated both tables between my tests.

If the incoming data had 1% of duplicates, the insert with IGNORE_DUP_KEY consistently performed approximately 5% faster:

 

INSERT t1(n)

SELECT DISTINCT n FROM(

SELECT n FROM dbo.Numbers

UNION ALL

SELECT n FROM dbo.Numbers WHERE n <10000

) AS t

 

INSERT t2(n)

SELECT n FROM dbo.Numbers

UNION ALL

SELECT n FROM dbo.Numbers WHERE n <10000

 

On the other hand, if the incoming data had 100% of duplicates, the insert with IGNORE_DUP_KEY consistently performed at least 300% slower, both for a large set of 2 million rows:

 

 

INSERT t1(n)

SELECT DISTINCT n FROM(

SELECT n FROM dbo.Numbers

UNION ALL

SELECT n FROM dbo.Numbers

) AS t

 

INSERT t2(n)

SELECT n FROM dbo.Numbers

UNION ALL

SELECT n FROM dbo.Numbers

 

As well as for a smaller set of 200K rows:

 

INSERT t1(n)

SELECT DISTINCT n FROM(

SELECT n FROM dbo.Numbers WHERE n<100000

UNION ALL

SELECT n FROM dbo.Numbers WHERE n<100000

) AS t

 

INSERT t2(n)

SELECT n FROM dbo.Numbers WHERE n<100000

UNION ALL

SELECT n FROM dbo.Numbers WHERE n<100000

 

 

Overall, I decided not to use IGNORE_DUP_KEY in my particular case. I decided that small savings for a small amount of duplicates do not justify the risk of a huge performance drop for larger amounts of duplicate data.

 


Published Wednesday, July 30, 2008 10:39 PM by Alexander Kuznetsov

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

No Comments

Leave a Comment

(required) 
(optional)
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement