THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Alexander Kuznetsov

Stress testing UPSERTs

Suppose that you need to implement the following logic:

IF(row exists)

  Update the row

ELSE

 Insert a new row.

If you already are on 2008, you should use MERGE command, and you don’t need to read this post. Prior to 2008 this logic has to be implemented using UPDATE and INSERT commands. I will stress test several UPSERT implementations in a high concurrency environment. I will demonstrate that under most circumstances the best approach is to serialize your modifications.

 

Prerequisites

 

The following tables are used in this post:

CREATE TABLE [dbo].[TwoINTs](

      [ID] [int] NOT NULL,

      [i1] [int] NOT NULL,

      [i2] [int] NOT NULL,

      [i3] [int] NOT NULL

);

CREATE TABLE [dbo].[TwoINTsErrorLog](

      [ID] [int] NOT NULL,

      [i1] [int] NOT NULL,

      [i2] [int] NOT NULL,

      [error] [int] NOT NULL

);

 

 

Implementing optimistic approach

 

Optimistic approach means that the procedure does not prevent any race conditions up front – it handles errors caused by concurrency as they occur. The following procedure implements optimistic approach and logs all the errors:

 

CREATE PROCEDURE dbo.SaveTwoINTs(@ID INT, @i1 INT, @i2 INT)

AS

BEGIN

      SET NOCOUNT ON;

      SET XACT_ABORT OFF;

      SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

      DECLARE @ret INT;

      SET @ret=0;

      BEGIN TRAN;

      INSERT INTO dbo.TwoINTs(ID, i1, i2, i3)VALUES(@ID, @i1, @i2, @i1);

      SET @ret=@@ERROR;

      IF @ret=2601 BEGIN

          INSERT INTO [dbo].[TwoINTsErrorLog](ID, i1, i2, error)VALUES(@ID, @i1, @i2, @ret);

          UPDATE dbo.TwoINTs SET i1=i1+@i1, i2=i2+@i2 WHERE ID=@ID;

            SET @ret = @@ERROR;

          IF @ret<>0 BEGIN

              INSERT INTO [dbo].[TwoINTsErrorLog](ID, i1, i2, error)VALUES(@ID, @i1, @i2, @ret);

            END

      END ELSE BEGIN

          IF @ret<>0 BEGIN

              INSERT INTO [dbo].[TwoINTsErrorLog](ID, i1, i2, error)VALUES(@ID, @i1, @i2, @ret);

            END

      END;

      COMMIT;

      RETURN @ret;

END

GO

 

Stress testing optimistic approach

 

The following two procedures upsert 50K values each, the first one into column i1, and the second one into column i2:

 

CREATE PROCEDURE Testers.UpsertLoop1

AS

BEGIN

DECLARE @ID INT, @i1 INT, @i2 INT, @count INT, @ret INT;

SET @count = 0;

WHILE @count<50000 BEGIN

      SELECT @ID = COALESCE(MAX(ID),0) + 1 FROM dbo.TwoInts;

    EXEC @ret=dbo.SaveTwoINTs @ID, 1, 0;

      SET @count = @count + 1;

END;

END;

CREATE PROCEDURE Testers.UpsertLoop2

AS

BEGIN

DECLARE @ID INT, @i1 INT, @i2 INT, @count INT, @ret INT;

SET @count = 0;

WHILE @count<50000 BEGIN

      SELECT @ID = COALESCE(MAX(ID),0) + 1 FROM dbo.TwoInts;

    EXEC @ret=dbo.SaveTwoINTs @ID, 0, 1;

      SET @count = @count + 1;

END;

END;

 

When I simultaneously ran these two test procedures from two connections, all values were saved, and in more than 90% cases a row was actually updated from both connections:

 

SELECT SUM(i1), SUM(i2) FROM [dbo].[TwoINTs];

SELECT SUM(i1), SUM(i2) FROM [dbo].[TwoINTs] WHERE i1+i2=2;

----------- -----------

50000       50000

----------- -----------

46030       46030

 

As you have seen, all values were saved. Was it because the procedure actually works in high concurrency environments, or was it just because the stress test harness was too weak to expose problems?

 

Does this test harness actually work?

 

Whenever you do stress testing, you need to demonstrate that your harness actually works. You need to come up with a procedure which work without concurrency, and have your harness expose problems in it. Here is a faulty implementation:

 

ALTER PROCEDURE dbo.SaveTwoINTs(@ID INT, @i1 INT, @i2 INT)

AS

BEGIN

      SET NOCOUNT ON;

      SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

      DECLARE @ret INT;

      SET @ret=0;

      BEGIN TRAN;

      UPDATE dbo.TwoINTs SET i1=i1+@i1, i2=i2+@i2 WHERE ID=@ID;

      IF @@ROWCOUNT=0 BEGIN

            INSERT INTO dbo.TwoINTs(ID, i1, i2, i3)VALUES(@ID, @i1, @i2, @i1);

            SET @ret = @@ERROR;

            IF @ret<>0 BEGIN

                  INSERT INTO [dbo].[TwoINTsErrorLog](ID, i1, i2, error)VALUES(@ID, @i1, @i2, @ret)

            END

      END;

      COMMIT;

      RETURN @ret;

END

 

Set up a fresh start before you rerun the loop tests:

 

TRUNCATE TABLE [dbo].[TwoINTsErrorLog];

TRUNCATE TABLE [dbo].[TwoINTs];

 

Rerun the loop tests. Clearly the faulty procedure does not work under high concurrency:

SELECT SUM(i1), SUM(i2) FROM [dbo].[TwoINTs];

----------- -----------

46433       39017

 

If it worked, I would get 50000 in both cases.

 

Implementing and stress testing pessimistic approach.

 

Pessimistic approach means that you acquire some kind of exclusive lock before you start modifications and hold it until you commit – once the lock is acquired, this eliminates all concurrency. Let us reuse the technique published by SQL Server MVP Tony Rogerson in his blog:

here

 

ALTER PROCEDURE dbo.SaveTwoINTs(@ID INT, @i1 INT, @i2 INT)

AS

BEGIN

      SET NOCOUNT ON;

      SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

      DECLARE @ret INT;

      SET @ret=0;

      BEGIN TRAN;

      DECLARE @result INT, @iteration INT, @resourceName VARCHAR(50);

      SELECT @result = -1, @iteration = 0, @resourceName = 'dbo.TwoINTs '+CAST(@ID AS VARCHAR(10));

 

    WHILE @result NOT IN ( 0, 1 ) BEGIN  -- Only successful return codes

      EXEC @result = sp_getapplock @Resource = @resourceName, @LockMode = 'Exclusive';

      SET @iteration = @iteration +1;

      IF @iteration>1000 BEGIN

        RAISERROR ( 'Lock failed to acquire.', 16, 1 );

        ROLLBACK;

        RETURN -1;

     END;

    END;

 

      UPDATE dbo.TwoINTs SET i1=i1+@i1, i2=i2+@i2 WHERE ID=@ID;

      IF @@ROWCOUNT=0 BEGIN

            INSERT INTO dbo.TwoINTs(ID, i1, i2, i3)VALUES(@ID, @i1, @i2, @i1);

            SET @ret = @@ERROR;

            IF @ret<>0 BEGIN

                  INSERT INTO [dbo].[TwoINTsErrorLog](ID, i1, i2, error)VALUES(@ID, @i1, @i2, @ret)

            END

      END;

      COMMIT;

      RETURN @ret;

END

GO

 

If you stress test this procedure in the same way as before, you will see that no saves are lost and no errors are raised.

 

Performance considerations.

 

Under high concurrency, the pessimistic approach repeatedly runs noticeably faster. Typical costs on my laptop are as follows:

Pessimistic: CPU – 12600, duration - 35500

Optimistic: CPU – 16400, duration – 50800

Without any concurrency, the optimistic approach repeatedly runs slightly faster. Typical costs on my laptop are as follows:

Pessimistic: CPU – 12200, duration - 29500

Optimistic: CPU – 10900, duration – 26800

Note that for this benchmarking I commented out logging errors into a table, which made very little difference.

 

 

Isolation level considerations

 

As usual in such cases, if you up isolation level to REPEATABLE READ or to SERILIZABLE, you will get a lot of deadlocks.

 

 

This post continues my series on defensive database programming.The next post is:

Defensive database programming: SET vs. SELECT.

 

Here are the previous posts  from the series:

 

Defensive database programming: fun with UPDATE. Defensive database programming: eliminating IF statements. Defensive database programming: fun with changing column widths. Avoid mixing old and new styles of error handling. Defensive database programming: adding ESCAPE clauses. Defensive database programming: qualifying column names. Defensive database programming: rewriting queries with NOT IN().
Published Sunday, December 14, 2008 8:30 PM by Alexander Kuznetsov

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Linchi Shea said:

Intuitively, if you expect a lot of collisions, pessimistic locking is better, whereas if you don't expect a lot of collisions, optimistic locking may work better. This is consisent with what you described.

December 15, 2008 3:11 PM
 

Alexander Kuznetsov said:

Comparing SET vs. SELECT is a very popular topic, and much of what I have to say has been said before.

January 25, 2009 5:57 PM
 

Alexander Kuznetsov said:

I have been posting examples of defensive database programming for some time now. I am by no means done

March 8, 2009 9:49 PM
 

Alexander Kuznetsov said:

I have written up two examples when a SET ROWCOUNT command breaks a seemingly working stored procedure

March 21, 2009 11:05 PM
 

Alexander Kuznetsov said:

There are three kinds of triggers: those which blow up and those which fail silently ;). Seriously, there

May 11, 2009 9:19 PM
 

Alexander Kuznetsov said:

My query used to work, but it blows up after I have added an index? The following query is not safe:

July 11, 2009 11:14 PM
 

Alexander Kuznetsov said:

You cannot assume that the conditions in your WHERE clause will evaluate in the left-to-write order -

July 16, 2009 5:41 PM
 

Alexander Kuznetsov : Defensive database programming: fun with UPDATE. said:

October 9, 2009 4:46 PM

Leave a Comment

(required) 
(required) 
Submit

About Alexander Kuznetsov

Alex Kuznetsov has been working with object oriented languages, mostly C# and C++, as well as with databases for more than a decade. He has worked with Sybase, SQL Server, Oracle and DB2. He regularly blogs on sqlblog.com, mostly about database unit testing, defensive programming, and query optimization. Alex has written a book entitled "Defensive Database Programming with Transact-SQL" and several articles on simple-talk.com and devx.com. Currently he works as an agile developer.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement