THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Alexander Kuznetsov

Performance of inserts and IGNORE_DUP_KEY


If your data has a small percentage of duplicates, then IGNORE_DUP_KEY may speed up your inserts. For larger amounts of duplicates, IGNORE_DUP_KEY may slow them down significantly. I set up two tables, stripping down all the irrelevant details, as follows:

 

CREATE TABLE t1(n INT NOT NULL PRIMARY KEY)

GO

CREATE TABLE [dbo].[t2](

      [n] [int] NOT NULL,

PRIMARY KEY CLUSTERED

(

      [n] ASC

)WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY]

) ON [PRIMARY]

GO

 

If the incoming data had no duplicates, the performance of both inserts was consistently the same:

 

INSERT t1(n)

SELECT n FROM dbo.Numbers

 

INSERT t2(n)

SELECT n FROM dbo.Numbers

 

(Note that dbo. Numbers has 1 million rows.)  Of course, I always truncated both tables between my tests.

If the incoming data had 1% of duplicates, the insert with IGNORE_DUP_KEY consistently performed approximately 5% faster:

 

INSERT t1(n)

SELECT DISTINCT n FROM(

SELECT n FROM dbo.Numbers

UNION ALL

SELECT n FROM dbo.Numbers WHERE n <10000

) AS t

 

INSERT t2(n)

SELECT n FROM dbo.Numbers

UNION ALL

SELECT n FROM dbo.Numbers WHERE n <10000

 

On the other hand, if the incoming data had 100% of duplicates, the insert with IGNORE_DUP_KEY consistently performed at least 300% slower, both for a large set of 2 million rows:

 

 

INSERT t1(n)

SELECT DISTINCT n FROM(

SELECT n FROM dbo.Numbers

UNION ALL

SELECT n FROM dbo.Numbers

) AS t

 

INSERT t2(n)

SELECT n FROM dbo.Numbers

UNION ALL

SELECT n FROM dbo.Numbers

 

As well as for a smaller set of 200K rows:

 

INSERT t1(n)

SELECT DISTINCT n FROM(

SELECT n FROM dbo.Numbers WHERE n<100000

UNION ALL

SELECT n FROM dbo.Numbers WHERE n<100000

) AS t

 

INSERT t2(n)

SELECT n FROM dbo.Numbers WHERE n<100000

UNION ALL

SELECT n FROM dbo.Numbers WHERE n<100000

 

 

Overall, I decided not to use IGNORE_DUP_KEY in my particular case. I decided that small savings for a small amount of duplicates do not justify the risk of a huge performance drop for larger amounts of duplicate data.

 

Published Wednesday, July 30, 2008 10:39 PM by Alexander Kuznetsov

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

kapil said:

I am back to the SQL expert with another cracking problem but it might be a small thing for you.

Recently we migrated our application from SQL 2000 to SQL 2005 64 Bit standard edition.

For some of the clients we are facing one issue in the stored procedure which is causing the application’s night process to break. Below is the detail about the problem

1.       In our nightwork process, we have multiple stored procedures which are executed in sequence using the SQLCMD command.

2.       One of the stored procedure gets completed but the next statement of the nightwork process does not get executed.

3.       As part of the stored procedure multiple records are inserted into a table.

4.       The table has an index with the IGNORE_DUP_KEY option. This index prevents the entry of duplicate rows in the table.

5.       Whenever a duplicate row is tried to be inserted in the table, the warning is raised and the next row to be inserted is processed.

6.       When the warnings for the duplicate rows to be inserted are very high then the stored procedure does not return the control back to the nightwork process and hangs. For small number of warnings there is no problem and the nightwork process works fine.

In SQL 2000 the same process was working fine with the IGNORE_DUP_KEY index in place but after the migration to SQL 2005 this behavior has caused the process to break.

July 14, 2009 1:48 PM
 

Alexander Kuznetsov said:

If there are lost of duplicates, I would use a staging table to eliminate them before they get inserted into my permanent table. The whole point of my post is that IGNORE_DUP_KEY is a feasible option only when the number of duplicates is small.

HIH

July 14, 2009 4:46 PM

Leave a Comment

(required) 
(required) 
Submit

About Alexander Kuznetsov

Alex Kuznetsov has been working with object oriented languages, mostly C# and C++, as well as with databases for more than a decade. He has worked with Sybase, SQL Server, Oracle and DB2. He regularly blogs on sqlblog.com, mostly about database unit testing, defensive programming, and query optimization. Alex has written a book entitled "Defensive Database Programming with Transact-SQL" and several articles on simple-talk.com and devx.com. Currently he works as an agile developer.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement