THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Alexander Kuznetsov

  • Preventing Bob from sales from selecting other people's messages

    Suppose that Bob can retrieve all messages, as follows, and we are not happy with it:

    EXEC Exchange.ShowAllMessages;

    Suppose that we don't want him to read anything related to messages, so our knee-jerk reaction is this:

    REVOKE EXECUTE ON Exchange.ShowAllMessages FROM Bob ;

    This does not work - Bob still can execute Exchange.ShowAllMessages, just as he could before. The reason is simple: everyone were granted the privilege to execute that procedure:

    GRANT EXECUTE ON Exchange.ShowAllMessages TO PUBLIC ;

    This privilege still exists after we have run our REVOKE command, and Bob still belongs to PUBLIC. The following brute force method prevents Bob from reading messages, but at a steep price: no one else can read all messages any more:

    REVOKE EXECUTE ON Exchange.ShowAllMessages FROM PUBLIC ;

    The following script does a better job: Bob cannot invoke Exchange.ShowAllMessages directly any more, and all others can execute the procedure exactly as before:

    GRANT EXECUTE ON Exchange.ShowAllMessages TO PUBLIC ;
    DENY EXECUTE ON
    Exchange.ShowAllMessages TO Bob ;

    Note, however, the emphasis on the word directly: in some cases Bob still can execute the procedure indirectly, if it is called by another procedure he has access to.Suppose, for example. that we have two schemas, Exchange and Sales, both created with the same authorization:

    CREATE SCHEMA Sales AUTHORIZATION dbo;
    GO
    CREATE SCHEMA Exchange AUTHORIZATION dbo;

    Unfortunately, Bob can run the following procedure, and he will still see all the messages:

     CREATE PROCEDURE Sales.ShowMySalesWithBackDoor
    AS
        BEGIN
    ;
    -- do something here
            
    IF SUSER_NAME() = 'Bob'
                
    BEGIN ;
                    
    EXEC Exchange.ShowAllMessages ;
                
    END ;
        
    END ;

    So, we have spent some time trying out different approaches, and nothing works so far. Clearly we are out of our depth - otherwise we should have completed this seemingly trivial task long ago.

    It is time to do some learning - should we have deep understanding of SQL Server security, we would have solved this problem easily. To improve our knowledge, I would suggest the following two resources:
    Expert SQL Server 2005 Development by Adam Machanic,and Protecting SQL Server Data by John Magnabosco.

  • Reads involving UDFs under READ_COMMITTED_SNAPSHOT may seem inconsistent.

    When a select uses scalar or multi-statement UDFs under READ_COMMITTED_SNAPSHOT, we might not get consistent results as of the time our select began - I will provide simple repro scripts.

    At the time of this writing MSDN clearly states the following: "Read committed isolation using row versioning provides statement-level read consistency". However, if our select invokes scalar or multi-statement UDFs, this consistency is ensured per a statement in the body of the UDF, not per the whole statement that invokes the UDF(s).

    Prerequisites

    Before we run the repro scripts, we need the following setup:

     ALTER DATABASE MyTest SET READ_COMMITTED_SNAPSHOT ON ;
    GO

    USE MyTest ;
    GO

    CREATE TABLE dbo.Messages
        
    (
          
    MessageID INT NOT NULL
                        
    CONSTRAINT PK_Messages PRIMARY KEY ,
          
    MessageBody VARCHAR(50) NOT NULL
        ) ;
    GO

    INSERT  INTO dbo.Messages
            
    ( MessageID, MessageBody )
    VALUES  ( 1, 'Hello world' ) ;
    GO

    Getting inconsistent results from a multi-statement UDF

    First, we need to create this UDF:

    CREATE FUNCTION dbo.AllMessagesTwice ( )
    RETURNS @ret TABLE
        
    (
          
    MessageBody VARCHAR(50) NOT NULL ,
          
    SelectedAt DATETIME2(5) NOT NULL
        )
    AS
        BEGIN
    ;
            
    INSERT  @ret
                    
    ( MessageBody ,
                      
    SelectedAt
                    
    )
                    
    SELECT  MessageBody ,
                            
    GETDATE()
                    
    FROM    dbo.Messages ;
            
    DECLARE @d DATETIME2(5), @i INT ;
            
    SET @d = DATEADD(SECOND, 30, GETDATE()) ;
            
    WHILE GETDATE() < @d
                
    BEGIN ;
                  
    SET @i = 1;
                
    END ;
            
    INSERT  @ret
                    
    ( MessageBody ,
                      
    SelectedAt
                    
    )
                    
    SELECT  MessageBody ,
                            
    GETDATE()
                    
    FROM    dbo.Messages ;
            
    RETURN ;
        
    END ;
    GO
     

    Next, let us open a tab and cut and paste the following script, but not run it yet:

    SET TRANSACTION ISOLATION LEVEL READ COMMITTED ;
    SELECT * FROM dbo.AllMessagesTwice ( )

    DELETE FROM dbo.Messages WHERE MessageId = 2 ;
     

    In another tab, cut and paste the following script, but not run it either:

    INSERT  INTO dbo.Messages
            
    ( MessageID, MessageBody )
    VALUES  ( 2, 'Hi there' ) ;

    Run the select script in the first tab - it takes 30 seconds to complete. Immediately return to the second tab and insert the 'Hi there' message.

    When the select in the first tab completes, we shall clearly see the 'Hi there' message, which definitle was inserted after the select began, as well as the timestamp that shows when the row was selected, 30 seconds after the beginning of the statement:

    Hello world 2011-08-02 20:56:44.70000
    Hello world 2011
    -08-02 20:57:14.74000
    Hi there    2011
    -08-02 20:57:14.74000

    As we have seen, a multi-statement UDF can return data that was committed after the time when the select using the UDF began.

    Getting inconsistent results from a scalar UDF

    Here is the scalar UDF we shall be using:

     CREATE FUNCTION dbo.MessageWithLastId ( )
    RETURNS VARCHAR(50)
    AS
        BEGIN
    ;
            
    DECLARE @d DATETIME2(5) ,
                
    @i INT ;
            
    SET @d = DATEADD(SECOND, 30, GETDATE()) ;
            
    WHILE GETDATE() < @d
                
    BEGIN ;
                    
    SET @i = 1 ;
                
    END ;
            
    RETURN (SELECT TOP(1) MessageBody FROM dbo.Messages
            
    ORDER BY MessageID DESC) ;
        
    END ;
    GO

    When we have created the scalar UDF, let us cut, paste, and run the following script:

    SET TRANSACTION ISOLATION LEVEL READ COMMITTED ;

    DELETE FROM dbo.Messages WHERE MessageId = 2 ;
    PRINT 'Data as of the beginning of the dbo.MessageWithLastId ( ) below'
    SELECT MessageBody FROM dbo.Messages
            
    ORDER BY MessageID ;

    PRINT 'Data returned by dbo.MessageWithLastId ( )'
            
    SELECT dbo.MessageWithLastId ( ) ;

    DELETE FROM dbo.Messages WHERE MessageId = 2 ;

    In the second tab we need to immediately run the following, we have 30 seconds to do so:

    INSERT  INTO dbo.Messages
            
    ( MessageID, MessageBody )
    VALUES  ( 2, 'Hi there' ) ;

    When the select in the first tab completes, we shall see that the scalar UDF returned the value that was not in the table when the select has started:

    Data AS OF the beginning OF the dbo.MessageWithLastId ( ) below
    MessageBody
    --------------------------------------------------
    Hello world

    (1 row(s) affected)

    Data returned BY dbo.MessageWithLastId ( )

    --------------------------------------------------
    Hi there
     

    As we have seen, a scalar UDF can read data that was committed after the time when the select using the UDF began.

    Snapshot isolation guarantees consistency

    We can rerun both previous examples under snapshot isolation. All we need to do is add the following line immediately before the select that invokes our UDF:

    SET TRANSACTION ISOLATION LEVEL SNAPSHOT ;

    In both cases we shall see that under snapshot isolation our selects return consistent data as of the time the select started.

    Conclusion

    As we have seen, if we are using scalar or multi-statement UDFs, and we need statement level consistency, we need to use snapshot isolation - READ_COMMITTED_SNAPSHOT may fail to provide the consistency we need.

    Naturally, inline UDFs are esentially macroes - they are flattened into the select they are invoked from. As such, READ_COMMITTED_SNAPSHOT does provide statement level consistency as long as we are only using inline UDFs, and do not use scalar and multi-statement ones.

    I would like to thank Erland Sommarskog, whose feedback inspired this post - I am posting this content with his permission.

  • Entering TRY blocks is not free, but still several times cheaper than catching exceptions.

    The overhead of entering a TRY block is relatively small, but the overhead of catching an exception is more substantial. If we manage to use CASE or IF to prevent an exception, we can noticeably improve performance. We shall run benchmarks and see for ourselves - in our example, catching an exception will be approximately nine times more expensive than exiting the TRY block without it.

    Entering TRY blocks is not free

    In this section we shall wrap the body of a stored procedure in a TRY CATCH block, and measure the overhead. The following procedure does not use a TRY CATCH block:

    CREATE ALTER PROCEDURE dbo.PerformDivision
        
    @dividend INT ,
        
    @divisor INT ,
        
    @Quotient INT OUT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    SELECT  @Quotient = CASE WHEN @divisor = 0 THEN NULL
                        
    ELSE @dividend / @divisor
                    
    END ;
            
    RETURN @@ERROR ;
        
    END ;
    GO

    Let us benchmark it. Whether the divisor is zero or not, the procedure runs at the same speed:

    CREATE PROCEDURE dbo.PerformDivision100kTimes
        
    @dividend INT ,
        
    @divisor INT
    AS
        BEGIN
    ;
            
    DECLARE @i INT ,
                
    @Quotient INT ;
            
    SET @i = 1 ;
            
    WHILE @i <= 100000
                
    BEGIN ;
                    
    EXEC dbo.PerformDivision @dividend = @dividend,
                        
    @divisor = @divisor, @Quotient = @Quotient OUT ;
                    
    SET @i = @i + 1 ;
                
    END ;
        
    END ;
    GO

    EXEC dbo.PerformDivision100kTimes
        
    @dividend = 1 ,
        
    @divisor = 1
        
    -- cpu 1250 ms
    GO
    EXEC dbo.PerformDivision100kTimes
        
    @dividend = 1 ,
        
    @divisor = 0
        
    -- cpu 1250 ms
    GO
      

    Let us remove the CASE expression, so that the division can raise an exception, and wrap the division operator in a TRY CATCH block:

    ALTER PROCEDURE dbo.PerformDivision
        
    @dividend INT ,
        
    @divisor INT ,
        
    @Quotient INT OUT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    BEGIN TRY
                
    SELECT  @Quotient = @dividend / @divisor ;
            
    END TRY
            
    BEGIN CATCH
                
    SELECT  @Quotient = NULL ;
            
    END CATCH ;
            
    RETURN @@ERROR ;
        
    END ;
    GO
      

    Even when the division does not raise an exception, the procedure runs 36% slower, 1703 ms vs 1250 ms:

    EXEC dbo.PerformDivision100kTimes
        
    @dividend = 1 ,
        
    @divisor = 1
        
    -- cpu 1703 ms
    GO

    Catching exceptions is a noticeable overhead 

    When the divisor is zero, and division does raise an exception that is caught in the CATCH block, the procedure runs 4.36 times slower, 5453 ms vs 1250 ms:

    EXEC dbo.PerformDivision100kTimes
        
    @dividend = 1 ,
        
    @divisor = 0
        
    -- cpu 5453 ms
    GO
     

    As we have seen, catching exceptions in CATCH blocks is slow. In this particular scenario, it is approximately nine times more expensive than just entering the TRY block.

    If we manage to use CASE or IF to avoid an exception, we can noticeably improve performance.

  • Comparing the cost of INSERT ... EXEC and its alternatives

    It is well known that INSERT ... EXEC is not free, but how expensive is it as compared to its alternatives? We shall run benchmarks and see that:

    • Invoking a multi-statement UDF is approximately two times faster than capturing the output of a stored procedure in a table variable via INSERT ... EXEC;
    • INSERT ... EXEC into a table variable is approximately two times faster than INSERT ... EXEC into a temporary table;
    • Inline UDFs are much cheaper than multi-statement UDFs;
    • The cost of invoking a multi-statement UDF or running INSERT ... EXEC may be several times higher than the cost of running the DML inside the UDF or stored procedure.

    We shall only run benchmarks and estimate real execution costs. Whenever we choose whether to cut-and-paste code or to reuse it, whether via INSERT...EXEC, or in a multi-statement UDF, knowing these numbers may be very useful.

    For an in-depth analysis of what contributes to the overhead, make sure that you have read the following great post by Adam Machanic:
    The Hidden Costs of INSERT EXEC

    Also when we use INSERT...EXEC a lot, we increase the workload on tempdb, but this is beyond the scope of this post.

    Determining the cost of INSERT ... EXEC into a table variable.

    The following stored procedure is very fast, so that the cost of its execution should be quite low. Also it does not contain any DML and as such it does not perform any reads when it runs.

    CREATE PROCEDURE dbo.GetProduct @int1 INT, @int2 INT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    SELECT  @int1 * @int2 AS Product ;
            
    RETURN @@ERROR ;
        
    END ;
    GO


    Suppose also that we need the product of two integers in an output parameter, not in a result set. To accomplish that, we certainly can capture its output via INSERT ... EXEC, as follows:

    CREATE PROCEDURE dbo.ReuseGetProduct
        
    @int1 INT ,
        
    @int2 INT ,
        
    @product INT OUT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    DECLARE @res TABLE ( product INT ) ;
            
    INSERT  @res
                    
    ( Product
                    
    )
                    
    EXEC dbo.GetProduct @int1 = @int1, @int2 = @int2 ;
            
    SELECT  @product = Product
            
    FROM    @res ;
        
    END ;
    GO

    Alternatively, we can cut and paste some code and get the product of two integers directly into an output parameter:

    CREATE PROCEDURE dbo.GetProductDirectly
        
    @int1 INT ,
        
    @int2 INT ,
        
    @product INT OUT
    AS
        BEGIN
    ;
            
    SET @product = @int1 * @int2 ;
        
    END ;
    GO

    To benchmark these two approaches, let us create and execute the following two procedures:

    CREATE PROCEDURE dbo.LoopReuseGetProduct
    AS
        BEGIN
    ;
            
    DECLARE @i INT ,
                
    @product INT ;
            
    SET @i = 1 ;
            
    WHILE @i <= 100000
                
    BEGIN ;
                    
    EXEC dbo.ReuseGetProduct @int1 = 1, @int2 = 2,
                        
    @product = @product OUT ;
                    
    SET @i = @i + 1 ;
                
    END ;
        
    END ;
    GO
    DROP PROCEDURE dbo.LoopGetProductDirectly
    CREATE PROCEDURE dbo.LoopGetProductDirectly
    AS
        BEGIN
    ;
            
    DECLARE @i INT ,
                
    @product INT ;
            
    SET @i = 1 ;
            
    WHILE @i <= 100000
                
    BEGIN ;
                    
    EXEC dbo.GetProductDirectly @int1 = 1, @int2 = 2,
                        
    @product = @product OUT ;
                    
    SET @i = @i + 1 ;
                
    END ;
        
    END ;
    GO

    EXEC dbo.LoopReuseGetProduct ;
    GO
    EXEC dbo.LoopGetProductDirectly ;

    The difference in performance is dramatic: while LoopGetProductDirectly uses 1094 ms CPU and no reads, LoopReuseGetProduct uses 21000 ms CPU and performs 1.9M reads.

    So the overhead of every INSERT...EXEC into a table variable in this particular case is approximately 19 reads and 0.2 ms of CPU.

    Note: if you are wondering why our benchmarks are wrapped as stored procedures, read an excellent post by Linchi Shea entitled "Performance impact: stored procedures, SQL batches, and CPU usage"

     Comparing INSERT ... EXEC into a table variable vs. INSERT ... EXEC into a temporary table

    As we have seen, INSERT ... EXEC into a table variable is not cheap. Let us replace the table variable with a temporary table and see if it runs faster:

    ALTER PROCEDURE dbo.ReuseGetProduct
        
    @int1 INT ,
        
    @int2 INT ,
        
    @product INT OUT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    CREATE TABLE #res( product INT ) ;
            
    INSERT  #res
                    
    ( Product
                    
    )
                    
    EXEC dbo.GetProduct @int1 = @int1, @int2 = @int2 ;
            
    SELECT  @product = Product
            
    FROM    #res ;
        
    END ;
    GO
      

    When we rerun LoopReuseGetProduct, the performance is even worse: it uses 41600 ms CPU and performs 4.6M reads.

    So the overhead of every INSERT...EXEC into a temporary table in this particular case is approximately 46 reads and 0.4 ms CPU.

     

    Comparing INSERT ... EXEC into a table variable vs. invoking a UDF

    To reuse our code, we also can wrap it in a multi-statement UDF, or in an inline one. As before, we shall wrap our benchmarks in stored procedures:

    CREATE FUNCTION dbo.GetProduct_MultiStatement ( @int1 INT, @int2 INT )
    RETURNS @ret TABLE ( Product INT )
    AS
        BEGIN
    ;
            
    INSERT  @ret
                    
    ( Product )
                    
    SELECT  @int1 * @int2 ;
            
    RETURN ;                
        
    END ;
    GO

    CREATE PROCEDURE dbo.LoopGetProduct_MultiStatement
    AS
        BEGIN
    ;
            
    DECLARE @i INT ,
                
    @product INT ;
            
    SET @i = 1 ;
            
    WHILE @i <= 100000
                
    BEGIN ;
                    
    SELECT  @product = product
                    
    FROM    dbo.GetProduct_MultiStatement(1, 1) ;
                    
    SET @i = @i + 1 ;
                
    END ;
        
    END ;
    GO

    CREATE FUNCTION dbo.GetProduct_Inline ( @int1 INT, @int2 INT )
    RETURNS TABLE
    AS
    RETURN
        
    ( SELECT    @int1 * @int2 AS Product
        
    ) ;
    GO

    CREATE PROCEDURE dbo.LoopGetProduct_Inline
    AS
        BEGIN
    ;
            
    DECLARE @i INT ,
                
    @product INT ;
            
    SET @i = 1 ;
            
    WHILE @i <= 100000
                
    BEGIN ;
                    
    SELECT  @product = product
                    
    FROM    dbo.GetProduct_Inline(1, 1) ;
                    
    SET @i = @i + 1 ;
                
    END ;
        
    END ;
    GO
      

    Here are the results of our benchmarking:

    • dbo.LoopGetProduct_Inline   uses 563 ms of CPU and no reads;
    • dbo.LoopGetProduct_MultiStatement uses 10266 ms of CPU and performs exactly 1M reads.

    Clearly the inline UDF performs much faster.

    In this scenario, the overhead of invoking a multi-statement UDF is 10 reads and 0.1 ms of CPU.

     

    Is this overhead high as compared to the cost of executing DML?

    If our DML is fast, it may take less resources to execute our DML than to invoke a multi-statement UDF or to run an INSERT...EXEC. To demonstrate that, let us create a small table with just 10K rows:

    CREATE TABLE dbo.Products
        
    (
          
    Multiplier1 INT NOT NULL ,
          
    Multiplier2 INT NOT NULL ,
          
    Product INT NOT NULL ,
          
    CONSTRAINT PK_Products PRIMARY KEY(Multiplier1, Multiplier2 )
        ) ;
    GO

    INSERT  INTO dbo.Products
            
    ( Multiplier1 ,
              
    Multiplier2 ,
              
    Product
            
    )
            
    SELECT  n1.n ,
                    
    n2.n ,
                    
    n1.n * n2.n
            
    FROM    dbo.Numbers AS n1
                    
    CROSS JOIN dbo.Numbers AS n2
            
    WHERE   n1.n <= 100
                    
    AND n2.n <= 100 ;
    GO
      

    Also let us alter our procedures and UDFs to read from this table:

    ALTER PROCEDURE dbo.GetProduct @int1 INT, @int2 INT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    SELECT  Product
            
    FROM    dbo.Products
            
    WHERE   Multiplier1 = @int1
                    
    AND Multiplier2 = @int2 ;
            
    RETURN @@ERROR ;
        
    END ;
    GO

    ALTER FUNCTION dbo.GetProduct_MultiStatement ( @int1 INT, @int2 INT )
    RETURNS @ret TABLE ( Product INT )
    AS
        BEGIN
    ;
            
    INSERT  @ret
                    
    ( Product
                    
    )
                    
    SELECT  Product
                    
    FROM    dbo.Products
                    
    WHERE   Multiplier1 = @int1
                            
    AND Multiplier2 = @int2 ;
            
    RETURN ;                
        
    END ;
    GO

    ALTER FUNCTION dbo.GetProduct_Inline ( @int1 INT, @int2 INT )
    RETURNS TABLE
    AS
    RETURN
        
    ( SELECT    Product
          
    FROM      dbo.Products
          
    WHERE     Multiplier1 = @int1
                    
    AND Multiplier2 = @int2
        
    ) ;
    GO
      

    Also let us revert the INSERT...EXEC back to its original version, which uses a table variable and is two times faster than the one with a temporary table:

    ALTER PROCEDURE dbo.ReuseGetProduct
        
    @int1 INT ,
        
    @int2 INT ,
        
    @product INT OUT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    DECLARE @res TABLE ( product INT ) ;
            
    INSERT  @res
                    
    ( Product
                    
    )
                    
    EXEC dbo.GetProduct @int1 = @int1, @int2 = @int2 ;
            
    SELECT  @product = Product
            
    FROM    @res ;
        
    END ;
    GO
      

    Let us rerun our benchmarks:

    EXEC dbo.LoopReuseGetProduct ;
    -- EXEC dbo.LoopReuseGetProduct ; cpu 21906 reads 2100115

    GO
    EXEC dbo.LoopGetProduct_MultiStatement ;
    -- EXEC dbo.LoopGetProduct_MultiStatement ; cpu 11109 reads 1200010

    GO
    EXEC dbo.LoopGetProduct_Inline ;
    --EXEC dbo.LoopGetProduct_Inline ; cpu 1079 reads 200000

    Our table Products is small; it takes just two reads two perform a simple select ad look up the value in it. Because we look up the value 100K times, we use 200K reads to do the actual useful work.

    In this scenario, less than 20% of CPU and reads are spent on executing the DML. The rest is the overhead of invoking a multi-statement UDF or of running an INSERT...EXEC.

    Of course, if we are running more complex and/or slower DML, the ratio of useful work vs the overhead will be higher. As such, we might be more willing to reuse our code via INSERT...EXEC or via a multi-statement UDF, even though the overhead is quite substantial.

    Conclusion

    As we have seen, neither INSERT...EXEC nor the multi-statement UDF are cheap. Whenever we are deciding whether to use them, we need to consider their impact on performance, preferably by running benchmarks.

    One more thing: I am running my benchmarks on SQL 2008 R2, on my laptop. The results I am getting on my servers are similar. However, you might get different results in other environments, on other versions of SQL Server. I encourage you to rerun my scripts, and post here whether you are getting similar results, and what is your environment.

     

    My other posts discussing T-SQL code reuse and performance are as follows:

     
    Will wrapping up code in a reusable stored procedure hurt performance?

     
    Reuse Your Code with Table-Valued UDFs

  • A free book entitled "Version Control by Example" by Eric Sink is available for download

    I have just downloaded and started reading a free PDF book about version control systems (VCS), entitled Version Control by Example, written by Eric Sink. So far I have read a really informative chapter about the advantages of third generation VCS, such as Git, over the second generation ones, such as Subversion and TFS. So far it is a very nice read. I hope you will like it too.
  • More specific queries may run faster, but we should not overdo it.

    It is well known that replacing a generic, one-size-fits-all query with several more specific ones may boost performance. However, in some cases these more specific queries do not perform any faster than the generic one. We shall consider several examples which demonstrate both scenarios.

    Prerequisites

    The following script creates a table and populates it with 4M rows, not much but enough for our test cases.

    CREATE TABLE dbo.Packages
        
    (
          
    ID INT NOT NULL
                
    IDENTITY ,
          
    [length] INT NOT NULL ,
          
    width INT NOT NULL ,
          
    [weight] INT NOT NULL ,
          
    SpaceFiller CHAR(30) NOT NULL,
          
    CONSTRAINT PK_Packages PRIMARY KEY ( ID )
        ) ;
    GO

    INSERT  INTO dbo.Packages
            
    ( [length] ,
              
    width ,
              
    [weight],
              
    SpaceFiller
            
    )
            
    SELECT  n1.n ,
                    
    n2.n ,
                    
    1 ,
                    
    '*'
            
    FROM    ( SELECT    n
                      
    FROM      dbo.Numbers
                      
    WHERE     n <= 2000
                    
    ) AS n1
                    
    CROSS JOIN ( SELECT n
                                
    FROM   dbo.Numbers
                                
    WHERE  n <= 2000
                              
    ) AS n2 ;

    GO

    CREATE INDEX Packages_length_width
      
    ON dbo.Packages([length], width) INCLUDE ( [weight], SpaceFiller ) ;
    GO

    CREATE INDEX Packages_width_length
      
    ON dbo.Packages(width, [length]) INCLUDE ( [weight], SpaceFiller ) ;
    GO

     

    When replacing a generic query with two specific ones boosts performance

    The following generic query performs very poorly - it does not utilize any indexes. Instead, it scans the whole table and sorts it:

    CREATE PROCEDURE dbo.GetTop1Weight
        
    @OrderBy VARCHAR(30) ,
        
    @topWeight INT OUT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    SELECT TOP ( 1 )
                    
    @topWeight = [weight]
            
    FROM    dbo.Packages
            
    ORDER BY CASE WHEN @OrderBy = 'Length,Width' THEN [length]
                          
    ELSE [width]
                    
    END ,
                    
    CASE WHEN @OrderBy = 'Length,Width' THEN [width]
                        
    ELSE [length]
                    
    END ;
        
    END ;
    GO
      

    You can run it or at least see the execution plan, and see for yourself - the query needs about 25K logical reads to complete.

    Let us replace this generic query with two specific ones:

    ALTER PROCEDURE dbo.GetTop1Weight
        
    @OrderBy VARCHAR(30) ,
        
    @topWeight INT OUT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    IF @OrderBy = 'Length,Width'
                
    BEGIN ;
                    
    SELECT TOP ( 1 )
                            
    @topWeight = [weight]
                    
    FROM    dbo.Packages
                    
    ORDER BY [length] ,
                            
    [width] ;
                
    END ;
            
    ELSE
                BEGIN
    ;
                    
    SELECT TOP ( 1 )
                            
    @topWeight = [weight]
                    
    FROM    dbo.Packages
                    
    ORDER BY [width] ,
                            
    [length] ;
                
    END ;
        
    END ;
    GO
      

    As a result the query speeds up dramatically - for both sort orders it executes as one index seek:

    DECLARE @topWeight INT ;
    EXEC dbo.GetTop1Weight @OrderBy = 'Length,Width',
        
    @topWeight = @topWeight OUT ;
    EXEC dbo.GetTop1Weight @OrderBy = 'Width,Length',
        
    @topWeight = @topWeight OUT ;

    Both queries need just 3 logical reads to complete, which is a huge improvement.

    As we have seen, sometimes replacing a generic, one-size-fits-all query with several more specific ones dramatically improves performance. However, in many other cases more specific queries do not perform any better.

    When replacing a generic query with two specific ones does not improve performance at all

    The following one-size-fits-all query performs very poorly:

    CREATE PROCEDURE dbo.GetTotalWeight
        
    @length INT = NULL ,
        
    @width INT = NULL ,
        
    @totalWeight INT OUT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    IF ( ( @length IS NULL
                   AND
    @width IS NULL
                 )
                 OR (
    @length IS NOT NULL
                      AND
    @width IS NOT NULL
                    )
               )
                
    BEGIN ;
                    
    RAISERROR('Must provide exactly one of the following parameters: @width and @length', 16, 1) ;
                    
    RETURN 1 ;
                
    END ;
            
    SELECT  @totalWeight = SUM([weight])
            
    FROM    dbo.Packages
            
    WHERE   [length] = COALESCE(@length, [length])
                    AND
    [width] = COALESCE(@width, [width]) ;
        
    END ;
    GO
     

    Note: for simplicity, let us assume that we must always provide exactly one parameter, either length or width, but never both. In all the subsequent versions we shall skip the verification of parameters, the first IF statement, just to keep our examples shorter.

    We can run the following script and see that every time our stored procedure scans the whole table and performs 25K logical reads:

    DECLARE @totalWeight INT ;
    EXEC dbo.GetTotalWeight @length = 1,
        
    @totalWeight = @totalWeight OUT ;
    EXEC dbo.GetTotalWeight @width = 1,
        
    @totalWeight = @totalWeight OUT ;

    Speeding up this query is very easy. The following query is index friendly. For brevity, we have omitted the first step, the verification that either length or width, but never both, is provided:

    ALTER PROCEDURE dbo.GetTotalWeight
        
    @length INT = NULL ,
        
    @width INT = NULL ,
        
    @totalWeight INT OUT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    SELECT  @totalWeight = SUM([weight])
            
    FROM    dbo.Packages
            
    WHERE   [length] = @length
                    
    OR [width] = @width ;
        
    END ;
    GO

    Every time it uses the right index and needs just 16 logical reads:

    DECLARE @totalWeight INT ;
    EXEC dbo.GetTotalWeight @length = 1,
        
    @totalWeight = @totalWeight OUT ;
    EXEC dbo.GetTotalWeight @width = 1,
        
    @totalWeight = @totalWeight OUT ;
     

    In this case more specific queries do not run any faster. The following queries show several approaches which we can try out. We can keep all the code in one procedure, as shown in the following scripts:

    ALTER PROCEDURE dbo.GetTotalWeight
        
    @length INT = NULL ,
        
    @width INT = NULL ,
        
    @totalWeight INT OUT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    SELECT  @totalWeight = SUM([weight])
            
    FROM    ( SELECT    [weight]
                      
    FROM      dbo.Packages
                      
    WHERE     [length] = @length
                      
    UNION ALL
                      
    SELECT    [weight]
                      
    FROM      dbo.Packages
                      
    WHERE     [width] = @width
                    
    ) AS T ;
        
    END ;
    GO

    ALTER PROCEDURE dbo.GetTotalWeight
        
    @length INT = NULL ,
        
    @width INT = NULL ,
        
    @totalWeight INT OUT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    IF @length IS NOT NULL
                
    BEGIN ;
                    
    SELECT  @totalWeight = SUM([weight])
                    
    FROM    dbo.Packages
                    
    WHERE   [length] = @length ;
                
    END
            ELSE
                BEGIN
                    SELECT  
    @totalWeight = SUM([weight])
                    
    FROM    dbo.Packages
                    
    WHERE   [width] = @width ;
                
    END
        END
    ;
    GO

    Alternatively, we can try out two separate procedures:

    CREATE PROCEDURE dbo.GetTotalWeight_ByWidth
        
    @width INT ,
        
    @totalWeight INT OUT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    SELECT  @totalWeight = SUM([weight])
            
    FROM    dbo.Packages
            
    WHERE   [width] = @width ;
        
    END ;
    GO

    CREATE PROCEDURE dbo.GetTotalWeight_ByLength
        
    @length INT ,
        
    @totalWeight INT OUT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    SELECT  @totalWeight = SUM([weight])
            
    FROM    dbo.Packages
            
    WHERE   [length] = @length ;
        
    END ;
    GO

    DECLARE @totalWeight INT ;
    EXEC dbo.GetTotalWeight_ByLength @length = 1,
        
    @totalWeight = @totalWeight OUT ;
    EXEC dbo.GetTotalWeight_ByWidth @width = 1,
        
    @totalWeight = @totalWeight OUT ;

    Anyway, none of the approaches we have tried out performs any faster than the optimized generic one - they all complete in 16 logical reads and use approximately the same CPU.

    So, in this case replacing a generic procedure with several specific ones does not make any sense at all - it increases complexity without improving performance.

    We are better off with the original generic query:

    ALTER PROCEDURE dbo.GetTotalWeight
        
    @length INT = NULL ,
        
    @width INT = NULL ,
        
    @totalWeight INT OUT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    SELECT  @totalWeight = SUM([weight])
            
    FROM    dbo.Packages
            
    WHERE   [length] = @length
                    
    OR [width] = @width ;
        
    END ;
    GO

    Conclusion

    Although in many cases replacing a generic query with several specific ones does boost performance, we should not use this approach in all cases as a rule of thumb. Instead, we need to prove that splitting a generic query does improve performance before actually doing it.

     

    My other posts discussing T-SQL code reuse and performance are as follows:

     
    Will wrapping up code in a reusable stored procedure hurt performance?

     
    Reuse Your Code with Table-Valued UDFs

    Comparing the cost of INSERT ... EXEC and its alternatives
  • Will wrapping up code in a reusable stored procedure hurt performance?

    In T-SQL programming, stored procedure calls are not free, and nested stored procedure calls cost even more. First, we shall prove it by running benchmarks. Next, we shall determine if the cost of calling stored procedures is high as compared to the rest of workload.

     

    Proving that stored procedure calls are not free

    Let us begin with a very simple benchmark, using a trivial stored procedure that increments a variable:

    CREATE PROCEDURE dbo.IncreaseByOne @i INT OUT
    AS
        BEGIN
    ;
          
    SET @i = @i + 1 ;
        
    END ;
    GO
      

    In order to get more or less meaningful comparison, we shall invoke the procedure 100K times, and compare its real execution costs versus the cost of incrementing the variable directly in the body of the loop, as shown in the following script:

    CREATE PROCEDURE dbo.DirectUpdateInLoop
        
    @NumIterations INT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    DECLARE @i INT ;
            
    SET @i = 0 ;
            
    WHILE @i < @NumIterations
                
    BEGIN ;
                  
    SET @i = @i + 1 ;
                
    END ;
            
    SELECT @i AS [@i] ;
        
    END ;
    GO

    CREATE PROCEDURE dbo.ProcedureCallInLoop
        
    @NumIterations INT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    DECLARE @i INT ;
            
    SET @i = 0 ;
            
    WHILE @i < @NumIterations
                
    BEGIN ;
                    
    EXEC dbo.IncreaseByOne @i = @i OUT ;
                
    END ;
            
    SELECT @i AS [@i] ;
        
    END ;
    GO
      

    EXEC dbo.DirectUpdateInLoop @NumIterations = 100000;
    GO
    EXEC dbo.ProcedureCallInLoop @NumIterations = 100000;
      

    To exclude parse and compile time from our comparison, we can just invoke our stored procedures more than once, and disregard their first executions.

    The second procedure, dbo.ProcedureCallInLoop , consistently runs much slower -  970 ms vs. 93 ms on my laptop, running SQL 2008 R2. Naturally, your mileage may vary.

     

    Proving that nested stored procedure calls are even more expensive

    Clearly, invoking a stored procedure is not free. To determine if nested stored procedure calls are more expensive, we shall create and run the following modules:

    CREATE PROCEDURE dbo.WrappedIncreaseByOne @i INT OUT
    AS
        BEGIN
    ;
          
    EXEC dbo.IncreaseByOne @i = @i OUT ;
        
    END ;
    GO

    CREATE PROCEDURE dbo.WrappedProcedureCallInLoop
        
    @NumIterations INT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    DECLARE @i INT ;
            
    SET @i = 0 ;
            
    WHILE @i < @NumIterations
                
    BEGIN ;
                    
    EXEC dbo.WrappedIncreaseByOne @i = @i OUT ;
                
    END ;
            
    SELECT @i AS [@i] ;
        
    END ;
    GO

    EXEC dbo.WrappedProcedureCallInLoop @NumIterations = 10;
    GO

    EXEC dbo.WrappedProcedureCallInLoop @NumIterations = 100000;

    Nested calls are consistently slower - dbo.WrappedProcedureCallInLoop runs in 1620 ms vs. 970 ms for the previous tests, dbo.ProcedureCallInLoop.

     

    Comparing the overhead of calling procedures to the overhead of other actions

    As we have seen, stored procedure calls are not free, especially nested ones. However, our benchmarks were not very realistic - our stored procedures were extremely trivial.

    If a stored procedure actually executes some DML, how big is the overhead of calling it as compared to the overhead of running the DML inside it?

    The following procedure executes just a little bit of DML - it updates one row:

    CREATE TABLE dbo.Counters
        
    (
          
    CounterId INT NOT NULL
                        
    CONSTRAINT PK_Counters PRIMARY KEY ,
          
    Amount INT NOT NULL
        ) ;
    GO
    INSERT INTO dbo.Counters
            
    ( CounterId, Amount )
    VALUES  ( 1,
              
    0  
              
    ) ;
    GO
    CREATE PROCEDURE dbo.IncreaseCounterByOne @CounterId INT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    UPDATE  dbo.Counters
            
    SET     Amount = Amount + 1
            
    WHERE   CounterId = @CounterId ;
        
    END ;
    GO
      

    As before, we shall be invoking it in a loop, as follows:

    CREATE PROCEDURE dbo.IncreaseCounterViaProcedure
        
    @CounterId INT ,
        
    @NumIterations INT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    DECLARE @i INT ;
            
    SET @i = 0 ;
            
    WHILE @i < @NumIterations
                
    BEGIN ;
                    
    EXEC dbo.IncreaseCounterByOne @CounterId = @CounterId ;

                  
    SET @i = @i + 1 ;
                
    END ;
        
    END ;
    GO
      

    Let us cut-and-paste the body of dbo.IncreaseCounterByOne into the calling module and see if i it improves the performance:

    CREATE PROCEDURE dbo.IncreaseCounterViaDirectUpdate
        
    @CounterId INT ,
        
    @NumIterations INT
    AS
        BEGIN
    ;
            
    SET NOCOUNT ON ;
            
    DECLARE @i INT ;
            
    SET @i = 0 ;
            
    WHILE @i < @NumIterations
                
    BEGIN ;
                    
    UPDATE  dbo.Counters
                    
    SET     Amount = Amount + 1
                    
    WHERE   CounterId = @CounterId ;
                  
                  
    SET @i = @i + 1 ;
                
    END ;
        
    END ;
    GO
     

    Cut-and-pasting the body of dbo.IncreaseCounterByOne into the loop definitely improves the performance, but not very much: CPU goes down from 3400 ms (dbo.IncreaseCounterViaProcedure) to 2400 ms (dbo.IncreaseCounterViaDirectUpdate), and duration does not change very much either.

    For more complex stored procedures, the overhead of making a call should be even less pronounced as compared to the cost of executing DML.

    Conclusion

    As we have seen, sometimes we can eliminate stored procedure calls and save some CPU. However, we should not expect huge savings, unless we are flattening out deeply nested modules.

    Also we have seen that although wrapping up reusable code in a stored procedure does slow down the execution, usually the performance penalty is relatively small.

    My other posts discussing T-SQL code reuse and performance are as follows:

    More specific queries may run faster, but we should not overdo it.

    Reuse Your Code with Table-Valued UDFs

    Comparing the cost of INSERT ... EXEC and its alternatives
  • Demos for my today's presentation at SQL Rally

    Are attached to this short blog post.

    Also I have promised to post the following link:

    This is a benchmark that determines whether SET or SELECT is faster.
  • Demos and slides for my today's presentation in Chicago

    Today I will be speaking at the March 2011 meeting of Chicago user group. Attached are demos and slides.
  • I am not speaking at SQL Connections February 2011 meeting in Chicago suburbs

    Usually it is an honor when we get to present to a user group, but not this time, so let me explain.

    I have no idea how my presentation got briefly mentioned in the invitation which went out today, without my consent. I have never asked or agreed to speak at SQL Connections February 2011 meeting in Chicago suburbs. Yet I apologize for any inconvenience it might have caused.

    I was going to speak at the meeting of December 2010, which was agreed by email with the person in charge. I had spent some time preparing for the presentation, and was very surprised to learn that my presentation was canceled when I received the invitation for December 2010 meeting with a different agenda. Nobody sent me a note telling me that my presentation has been canceled.

    Apparently this way of canceling presentations is very unusual. I have discussed this with quite a few other speakers, and nobody has recalled anything similar.

    I was quite surprised when I had a closer look at the invitation for February 2011 meeting, which I just received:

    You are invited to attend ...

    SQL Connections February 2011

    !!! SQLConnections 1st Birthday Celebration !!!

     

    Please join us at the Microsoft Offices in Oak Brook for our 1st Birthday

     

    BRENT OZAR

     

    Tuning T-SQL Step by Step

    If execution plans look like subway maps of foreign cities, this session is for you. Brent will turn SQL Server inside out by making YOU role-play as the SQL Server engine. He’ll feed you one query at a time, make you write your own execution plans in plain English, and then show how SQL Server draws out execution plans. You’ll learn how sorts and joins rely on indexes, and get your first glimpse into why statistics are so important. No tuning experience necessary, but you do need to understand how to write SELECT statements with joins.

    Brent presented at the chapter’s first meeting a year ago, and we are honored to have him back as the chapter’s First Anniversary speaker!

     

    Afterwards Alex will present the session he did not get to present in Decemeber.

     

    Cant Wait to See You There!

     

    Note that it does not include neither my last name nor the abstract for the presentation. Of course, Brent is a much better speaker than I. Naturally, the invitation may reflect this well known fact. However, failing to mention the speaker's last name and failing to include the abstract seems to be just a little bit over the top.

    Even if I were planning to speak at this event, I would quite likely have canceled after such an invitation.

     

     

  • My Abstracts at SQL Rally Are Actually About Work-Life Balance

    Although technically the sessions I submitted are entitled "Avoid Common Mistakes in T-SQL Programming" and "Developing Robust T-SQL for High Concurrency", the main thing you will take with you is the following: the ability to develop T-SQL that does not break in a changing environment, under real life concurrency. As such, you will not have to wake up at 3AM to make an urgent fix, and you will have your weekends for yourself.

     When you have mastered the art of defensive programming, you will not need to spend 80% of your time making fixes after fixes after fixes - you will enjoy your job better, and you will surely have the time, energy, and motivation to learn all those cool new features.

      Vote!

  • T-SQL Tuesday #13: Clarifying Requirements

    When we transform initial ideas into clear requirements for databases, we typically have to make the following choices:

     

    Frequent maintenance vs doing it once.


    As we are clarifying the requirements, we need to determine whether we want to concinue spending considerable time maintaining the system, or if we want to finish it up and move on to other tasks. Race car maintenance vs installing electric wiring is my favorite analogy for this kind of choice.


    In some cases we need to sqeeze every last bit of performance out of a highly important query, and we are ready to re-optimize as needed. This is similar to maintaining a race car, which is very labor-intensive, and is done very frequently, pretty much all the time. Another similarity is that the goal is to get the best possible performance, and we are ready to pay a high price for it.. When we deal with race cars, long term considerations are totally beyond our horizon – we need to do our best right now, whatever it takes, and we may go for another cool new approach at any moment. If this is the case, then doing anything defensively for the long term is completely irrelevant.


    In other cases we really want to get the job done once, and not to return to it for as long as possible. This is similar to installing electric wiring in a house, because in order to avoid expensive re-wiring, we build in some redundancy into our solution; we have more outlets and more capacity than we actually need right now, and all outlets are grounded even if we only plug table lamps into them.

    Typically our wiring solution does not perfectly fit our current needs: some outlets are not positioned exactly where they would be most convenient right now, some others are useless right now. We are likely to have to use extension cords, and we can tolerate that. Yet we are more willing to cope with the minor inconvenience of using an extension cord than to reinstall outlets every time we move furniture.


    When we deal with electric wiring, safety and long term stability is highly important, and it makes a lot of sense to do everything very defensively. Also it makes sense to build in more capacity than we need right now.

    Of course, most real life situations are not as clearly cut as these two extreme examples – they are typically somewhere in between.
    Yet in many real life cases that we would rather do the query tuning just once, and we are ready for some overkill, for some redundancy – that is the price we are willing to pay for not having to return and re-optimize the query later.

     

     Best performance on average vs. acceptable performance all the time.


    Depending on the circumstances, the criteria for acceptable performance may be quite different in different situations. For example, as I am writing T-SQL code, book chapters, and articles, I can and do take breaks from time to time. As a result, it may take me an hour or more to finish just one sentence, or one method, or one query, and that’s OK. It is the overall, the average performance that matters, as long as I manage to meet my deadlines.

    On the other hand, if I am driving, all my driving decisions must be done quickly, without any exceptions.  


    Let us apply a similar approach to queries. For instance, let us consider the following two queries which always return identical results:

    • Query A runs in 50 ms 99.8% of the time, but takes 1500 ms in 0.2% cases
    • Query B always runs between 350ms and 450ms.

    Which one of these two queries should we choose? It depends. Of course query A is a better citizen; on average it uses up less resources on the server. As such, in many cases we would really want to choose query A over query B. However, if our query is required to always return in 1000ms or less, then, unfortunately, query A does not meet the requirements, and query B is our only choice.


    As we have seen, to choose which query better fits the needs of the customers, we need to know the requirements.

     

    Performance of reads vs. performance of writes.


    As we tune select queries we may need to make yet another choice: how much can we slow down our modifications to speed up our selects? As in two previous subsections, let us consider two extreme examples:

    • Security video must always record real time, and we don’t expect very high quality from it. Most likely this video recording will never be viewed. Only in the unlikely event that it will be analyzed, only then we shall work on enhancing its quality.
    • On the other hand, if we are making a movie, it can take several hours or even days to come up with a short episode. This is OK: because we expect a lot of views, we can afford to invest considerable time and effort in improving quality. Also modifying a DVD is not an option - if we need to change anything, we have to burn another one.

    As in the previous choices, most real life situations fall somewhere between these two extreme cases, but we need to determine where we are on this scale.
     

  • Math with Months Is Not Commutative

    In other words, if we add a month, then subtract a month, we might not get back to the date we started from. For example:

     SELECT  DATEADD(MONTH, 1, DATEADD(MONTH, -1, '20100330')) ,
            
    DATEADD(MONTH, -1, DATEADD(MONTH, 1, '20100330'))


    2010-03-28 00:00:00.000
    2010
    -03-30 00:00:00.000
     

     If our code assumes otherwise, it has a bug.

  • Benefit from Unit Testing T-SQL: Ensure quick turnaround

    The biggest advantage of unit testing is the ability to make changes quickly, and with confidence that we have not broken anything with our change. Whether we need to speed up a query real quick, or to fix a bug, automated testing saves us a lot of time, especially if test failure is exposed in user friendly way.

    This post continues the series on unit testing, the previous posts are

    How to Benefit from Unit Testing T-SQL: choosing what not to test

    How to Benefit from Unit Testing T-SQL: choosing what to test

    How to Benefit from Unit Testing T-SQL. Part One.

    How to Benefit from Unit Testing T-SQL: Reusing Manual Tests as Parts of Unit Tests

    How to Benefit from Unit Testing T-SQL: Exposing Failure in a User-Friendly Way

    Benefit from Unit Testing T-SQL: Reuse Unit Tests as Documentation

    Benefit from Unit Testing T-SQL: Speed up Your Test Harness

     

    To ensure quick turnaround, we typically use the following approaches:

    Utilize version control. Use branches.

    When all the code, including unit tests, is in version control, the failure of our workstation does not delay us too much: we can quickly download source code on any other workstation and continue working. Similarly, when we work on a feature and use a branch, we can quickly switch context to the trunk, make a fix against it, and switch back to the branch.

    Run all tests off local instance. By default, build test database from scratch.

    If we have to share test databases, too much time can be wasted on coordinating, especially if we need to change database schema or test data. So, all the scripts needed to build the test database and populate it with test data must be downloaded from version control. Optionally, we should be able to bypass creating and populating test database - that saves time on subsequent runs.

    Practice, practice, practice.

    To act quickly, we need to be familiar with unit testing. Learning something during an emergency is not very efficient. Unit testing should be practiced frequently, so that our skills do not get rusty.

    Also we need to be familiar with our tools. If we use open source ones, we need to be familiar with the source code. Anyway, if our tools have gotchas, we need to learn them proactively, not when we are in a hurry and those gotchas get in our way. Needless to say, in most cases simple and rock solid tools are preferable to complex ones with gotchas.

     

     

     

     

     

  • Benefit from Unit Testing T-SQL: Speed up Maintenance of Unit Tests Part Two

    In this post we shall see how adding test data breaks existing unit tests, and how to fix them.

    This post continues the series on unit testing, the previous posts are

    How to Benefit from Unit Testing T-SQL: choosing what not to test

    How to Benefit from Unit Testing T-SQL: choosing what to test

    How to Benefit from Unit Testing T-SQL. Part One.

    How to Benefit from Unit Testing T-SQL: Reusing Manual Tests as Parts of Unit Tests

    How to Benefit from Unit Testing T-SQL: Exposing Failure in a User-Friendly Way

    Benefit from Unit Testing T-SQL: Reuse Unit Tests as Documentation

    Benefit from Unit Testing T-SQL: Speed up Your Test Harness

     

    When changing test data breaks some tests

    Suppose that we need to test another stored procedure, which returns total amount of messages sent, grouped by user, month, and year. Typically we prefer to reuse existing test data as much as possible. However, to test this module we need messages sent on different months, so we need to add more messages sent by the same user on another month.

    When we do that, these additional rows may show up in the output of the existing stored procedure, which may break some unit tests . To accommodate for that, let us regenerate expected results, and visually confirm the changes:

     


     

    Typically it is much easier to reuse the same test data and adjust expected results as needed, than to come up with separate test data for each test.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement