Suppose that you need to implement the following logic:
IF(row exists)
Update the row
ELSE
Insert a new row.
If you already are on 2008, you should use MERGE command,
and you don’t need to read this post. Prior to 2008 this logic has to be
implemented using UPDATE and INSERT commands. I will stress test several UPSERT
implementations in a high concurrency environment. I will demonstrate that
under most circumstances the best approach is to serialize your modifications.
Prerequisites
The following tables are used in this post:
CREATE TABLE [dbo].[TwoINTs](
[ID] [int] NOT NULL,
[i1] [int] NOT NULL,
[i2] [int] NOT NULL,
[i3] [int] NOT NULL
);
CREATE TABLE [dbo].[TwoINTsErrorLog](
[ID] [int] NOT NULL,
[i1] [int] NOT NULL,
[i2] [int] NOT NULL,
[error] [int] NOT NULL
);
Implementing optimistic approach
Optimistic approach means that the procedure does not
prevent any race conditions up front – it handles errors caused by concurrency
as they occur. The following procedure implements optimistic approach and logs
all the errors:
CREATE PROCEDURE dbo.SaveTwoINTs(@ID INT, @i1 INT, @i2 INT)
AS
BEGIN
SET NOCOUNT
ON;
SET XACT_ABORT
OFF;
SET TRANSACTION
ISOLATION LEVEL
READ COMMITTED;
DECLARE @ret INT;
SET @ret=0;
BEGIN TRAN;
INSERT INTO
dbo.TwoINTs(ID, i1, i2, i3)VALUES(@ID, @i1, @i2, @i1);
SET @ret=@@ERROR;
IF @ret=2601
BEGIN
INSERT INTO [dbo].[TwoINTsErrorLog](ID, i1, i2, error)VALUES(@ID, @i1, @i2, @ret);
UPDATE dbo.TwoINTs SET i1=i1+@i1, i2=i2+@i2 WHERE ID=@ID;
SET @ret = @@ERROR;
IF @ret<>0 BEGIN
INSERT INTO [dbo].[TwoINTsErrorLog](ID, i1, i2, error)VALUES(@ID, @i1, @i2, @ret);
END
END ELSE
BEGIN
IF @ret<>0 BEGIN
INSERT INTO [dbo].[TwoINTsErrorLog](ID, i1, i2, error)VALUES(@ID, @i1, @i2, @ret);
END
END;
COMMIT;
RETURN @ret;
END
GO
Stress testing optimistic approach
The following two procedures upsert 50K values each, the
first one into column i1, and the second one into column i2:
CREATE PROCEDURE Testers.UpsertLoop1
AS
BEGIN
DECLARE @ID INT, @i1 INT, @i2 INT, @count INT, @ret INT;
SET @count = 0;
WHILE @count<50000 BEGIN
SELECT @ID = COALESCE(MAX(ID),0) + 1 FROM dbo.TwoInts;
EXEC @ret=dbo.SaveTwoINTs @ID, 1, 0;
SET @count = @count + 1;
END;
END;
CREATE PROCEDURE Testers.UpsertLoop2
AS
BEGIN
DECLARE @ID INT, @i1 INT, @i2 INT, @count INT, @ret INT;
SET @count = 0;
WHILE @count<50000 BEGIN
SELECT @ID = COALESCE(MAX(ID),0) + 1 FROM dbo.TwoInts;
EXEC @ret=dbo.SaveTwoINTs @ID, 0, 1;
SET @count = @count + 1;
END;
END;
When I simultaneously ran these two test procedures from two
connections, all values were saved, and in more than 90% cases a row was
actually updated from both connections:
SELECT SUM(i1), SUM(i2) FROM [dbo].[TwoINTs];
SELECT SUM(i1), SUM(i2) FROM [dbo].[TwoINTs] WHERE i1+i2=2;
----------- -----------
50000 50000
----------- -----------
46030 46030
As you have seen, all values were saved. Was it because the
procedure actually works in high concurrency environments, or was it just because
the stress test harness was too weak to expose problems?
Does this test harness actually work?
Whenever you do stress testing, you need to demonstrate that
your harness actually works. You need to come up with a procedure which work
without concurrency, and have your harness expose problems in it. Here is a
faulty implementation:
ALTER PROCEDURE dbo.SaveTwoINTs(@ID INT, @i1 INT, @i2 INT)
AS
BEGIN
SET NOCOUNT
ON;
SET TRANSACTION
ISOLATION LEVEL
READ COMMITTED;
DECLARE @ret INT;
SET @ret=0;
BEGIN TRAN;
UPDATE dbo.TwoINTs SET i1=i1+@i1, i2=i2+@i2 WHERE ID=@ID;
IF @@ROWCOUNT=0 BEGIN
INSERT INTO dbo.TwoINTs(ID, i1, i2, i3)VALUES(@ID, @i1, @i2, @i1);
SET @ret = @@ERROR;
IF @ret<>0 BEGIN
INSERT INTO [dbo].[TwoINTsErrorLog](ID, i1, i2, error)VALUES(@ID, @i1, @i2, @ret)
END
END;
COMMIT;
RETURN @ret;
END
Set up a fresh start before you rerun the loop tests:
TRUNCATE TABLE [dbo].[TwoINTsErrorLog];
TRUNCATE TABLE [dbo].[TwoINTs];
Rerun the loop tests. Clearly the faulty procedure does not
work under high concurrency:
SELECT SUM(i1), SUM(i2) FROM [dbo].[TwoINTs];
----------- -----------
46433 39017
If it worked, I would get 50000 in both cases.
Implementing and stress testing pessimistic
approach.
Pessimistic approach means that you acquire some kind of
exclusive lock before you start modifications and hold it until you commit – once
the lock is acquired, this eliminates all concurrency. Let us reuse the technique
published by SQL Server MVP Tony Rogerson in his blog:
here
ALTER PROCEDURE dbo.SaveTwoINTs(@ID INT, @i1 INT, @i2 INT)
AS
BEGIN
SET NOCOUNT
ON;
SET TRANSACTION
ISOLATION LEVEL
READ COMMITTED;
DECLARE @ret INT;
SET @ret=0;
BEGIN TRAN;
DECLARE
@result INT,
@iteration INT,
@resourceName VARCHAR(50);
SELECT
@result = -1, @iteration = 0, @resourceName = 'dbo.TwoINTs '+CAST(@ID AS VARCHAR(10));
WHILE
@result NOT IN ( 0, 1 ) BEGIN -- Only successful
return codes
EXEC
@result = sp_getapplock
@Resource = @resourceName, @LockMode = 'Exclusive';
SET
@iteration = @iteration +1;
IF
@iteration>1000 BEGIN
RAISERROR
( 'Lock failed to
acquire.', 16,
1 );
ROLLBACK;
RETURN -1;
END;
END;
UPDATE dbo.TwoINTs SET i1=i1+@i1, i2=i2+@i2 WHERE ID=@ID;
IF @@ROWCOUNT=0 BEGIN
INSERT INTO dbo.TwoINTs(ID, i1, i2, i3)VALUES(@ID, @i1, @i2, @i1);
SET @ret = @@ERROR;
IF @ret<>0 BEGIN
INSERT INTO [dbo].[TwoINTsErrorLog](ID, i1, i2, error)VALUES(@ID, @i1, @i2, @ret)
END
END;
COMMIT;
RETURN @ret;
END
GO
If you stress test this procedure in the same way as before,
you will see that no saves are lost and no errors are raised.
Performance considerations.
Under high concurrency, the pessimistic approach repeatedly
runs noticeably faster. Typical costs on my laptop are as follows:
Pessimistic: CPU – 12600, duration - 35500
Optimistic: CPU – 16400, duration – 50800
Without any concurrency, the optimistic approach repeatedly
runs slightly faster. Typical costs on my laptop are as follows:
Pessimistic: CPU – 12200, duration - 29500
Optimistic: CPU – 10900, duration – 26800
Note that for this benchmarking I commented out logging
errors into a table, which made very little difference.
Isolation level considerations
As usual in such cases, if you up isolation level to
REPEATABLE READ or to SERILIZABLE, you will get a lot of deadlocks.
This post continues my series on defensive database programming.The next post is:
Defensive database programming: SET vs. SELECT.
Here are the previous posts from the series:
Defensive database programming: fun with UPDATE.
Defensive database programming: eliminating IF statements.
Defensive database programming: fun with changing column widths.
Avoid mixing old and new styles of error handling.
Defensive database programming: adding ESCAPE clauses.
Defensive database programming: qualifying column names.
Defensive database programming: rewriting queries with NOT IN().