If your data has a small percentage of duplicates, then IGNORE_DUP_KEY
may speed up your inserts. For larger amounts of duplicates, IGNORE_DUP_KEY may
slow them down significantly. I set up two tables, stripping down all the
irrelevant details, as follows:
CREATE TABLE t1(n INT NOT NULL PRIMARY KEY)
GO
CREATE TABLE [dbo].[t2](
[n] [int] NOT NULL,
PRIMARY KEY CLUSTERED
(
[n] ASC
)WITH (IGNORE_DUP_KEY = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
If the incoming data had no duplicates, the performance of both
inserts was consistently the same:
INSERT t1(n)
SELECT n FROM dbo.Numbers
INSERT t2(n)
SELECT n FROM dbo.Numbers
(Note that dbo. Numbers has 1 million rows.) Of course, I always truncated both tables
between my tests.
If the incoming data had 1% of duplicates, the insert with IGNORE_DUP_KEY
consistently performed approximately 5% faster:
INSERT t1(n)
SELECT DISTINCT n FROM(
SELECT n FROM dbo.Numbers
UNION ALL
SELECT n FROM dbo.Numbers WHERE n <10000
) AS t
INSERT t2(n)
SELECT n FROM dbo.Numbers
UNION ALL
SELECT n FROM dbo.Numbers WHERE n <10000
On the other hand, if the incoming data had 100% of
duplicates, the insert with IGNORE_DUP_KEY consistently performed at least 300%
slower, both for a large set of 2 million rows:
INSERT t1(n)
SELECT DISTINCT n FROM(
SELECT n FROM dbo.Numbers
UNION ALL
SELECT n FROM dbo.Numbers
) AS t
INSERT t2(n)
SELECT n FROM dbo.Numbers
UNION ALL
SELECT n FROM dbo.Numbers
As well as for a smaller set of 200K rows:
INSERT t1(n)
SELECT DISTINCT n FROM(
SELECT n FROM dbo.Numbers WHERE n<100000
UNION ALL
SELECT n FROM dbo.Numbers WHERE n<100000
) AS t
INSERT t2(n)
SELECT n FROM dbo.Numbers WHERE n<100000
UNION ALL
SELECT n FROM dbo.Numbers WHERE n<100000
Overall, I decided not to use IGNORE_DUP_KEY in my
particular case. I decided that small savings for a small amount of duplicates
do not justify the risk of a huge performance drop for larger amounts of
duplicate data.