THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Alexander Kuznetsov

  • When CHECK constraints using UDFs fail for multirow updates

    I have made up a simple example when a CHECK constraint that uses a UDF should succeed but always fails.

    Suppose you need to move some stuff in your car, and you don't want to load more than 100 pounds into your small car at a time. Here is the DDL:

    CREATE SCHEMA Data AUTHORIZATION dbo;
    GO
    CREATE SCHEMA Readers AUTHORIZATION dbo;
    GO
    CREATE TABLE Data.ItemsToMove(
      
    TripNumber INT NOT NULL,
      
    ItemName VARCHAR(50),
      
    ItemWeightInPounds FLOAT
    );
    GO
    CREATE FUNCTION Readers.TotalWeightPerTrip(@TripNumber INT)
    RETURNS FLOAT
    AS
    BEGIN
    RETURN 
    (SELECT SUM(ItemWeightInPounds
      
    FROM Data.ItemsToMove
      
    WHERE TripNumber @TripNumber);
    END
    GO
    ALTER TABLE Data.ItemsToMove
      
    ADD CONSTRAINT CHK_TotalWeightPerTrip_LessThan100pounds
      
    CHECK (Readers.TotalWeightPerTrip(TripNumber) <= 100);
    GO
      

    Loading 90 pounds succeeds, but when you go over the limit, the constraint works:

     

    INSERT INTO Data.ItemsToMove(TripNumber,
      
    ItemName,
      
    ItemWeightInPounds)
    SELECT 1'Big Box with Clothes'40
    UNION ALL
    SELECT 1'Lumbar Aeron Chair'25
    UNION ALL
    SELECT 1'Basic Aeron Chair'25;

    INSERT INTO Data.ItemsToMove(TripNumber,
      
    ItemName,
      
    ItemWeightInPounds)
    SELECT 1'Small Box with Books'40;
      

     Msg 547, Level 16, State 0, Line 1
    The INSERT statement conflicted with the CHECK constraint "CHK_TotalWeightPerTrip_LessThan100pounds". The conflict occurred in database "Test", table "data.ItemsToMove", column 'TripNumber'.
    The statement has been terminated.

    So, all other items should be moved on the second trip:

    INSERT INTO Data.ItemsToMove(TripNumber,
      
    ItemName,
      
    ItemWeightInPounds)
    SELECT 2'Small Box with Books'40
    UNION ALL
    SELECT 2'Humidifier'40;
      

    Now there is another problem: all bulky items are scheduled to move on the first trip, while small and hevy ones get on the second trip. Suppose you cannot fit the bulky box with clothes after you loaded your chairs, and want to move your small humidifier on your first trip instead:

    UPDATE Data.ItemsToMove 
    SET TripNumber CASE WHEN ItemName 'Humidifier' THEN 1
      
    WHEN ItemName 'Big Box with Clothes' THEN 2
      
    ELSE TripNumber END
    WHERE  
    ItemName IN ('Humidifier''Big Box with Clothes') AND TripNumber IN(1,2); 
     

     

    The update should succeed, because the box and the humidifier have the same weight. However, the update fails:

    Msg 547, Level 16, State 0, Line 1
    The UPDATE statement conflicted with the CHECK constraint "CHK_TotalWeightPerTrip_LessThan100pounds". The conflict occurred in database "Test", table "data.ItemsToMove", column 'TripNumber'.
    The statement has been terminated.

    The reason is simple: the database engine modifies just one row and immediately invokes the CHECK constraint for that row. At that moment the limit for total load is exceeded no matter which row is modified first - that is the way I made up this example. You can run the following scripts and see for yourself:

    ALTER TABLE Data.ItemsToMove
      
    DROP CONSTRAINT CHK_TotalWeightPerTrip_LessThan100pounds;

    BEGIN TRAN


    UPDATE 
    Data.ItemsToMove 
    SET TripNumber CASE WHEN ItemName 'Humidifier' THEN 1
      
    ELSE TripNumber END
    WHERE  
    ItemName 'Humidifier' AND TripNumber 2;


    SELECT TripNumberSUM(ItemWeightInPounds
      
    FROM Data.ItemsToMove
      
    GROUP BY TripNumber;


    ROLLBACK;

    BEGIN TRAN


    UPDATE 
    Data.ItemsToMove 
    SET TripNumber CASE 
      
    WHEN ItemName 'Big Box with Clothes' THEN 2
      
    ELSE TripNumber END
    WHERE  
    ItemName 'Big Box with Clothes' AND TripNumber 1;


    SELECT TripNumberSUM(ItemWeightInPounds
      
    FROM Data.ItemsToMove
      
    GROUP BY TripNumber;


    ROLLBACK;
      

     At this point the faulty constraint is dropped, so you can rerun your multi-row update:

    UPDATE Data.ItemsToMove 
    SET TripNumber CASE WHEN ItemName 'Humidifier' THEN 1
      
    WHEN ItemName 'Big Box with Clothes' THEN 2
      
    ELSE TripNumber END
    WHERE  
    ItemName IN ('Humidifier''Big Box with Clothes') AND TripNumber IN(1,2); 
      

     Create the constraint again, it will succeed. Verify that the constraint is trusted:

    SELECT nameis_not_trusted
    FROM   sys.check_constraints
    WHERE Name='CHK_TotalWeightPerTrip_LessThan100pounds';
     

    Make sure you understand why you need to know if your constraints are trusted.

    You can also verify that your data is clean, that the load for every trip does not exceed 100 pounds:

     SELECT TripNumberSUM(ItemWeightInPounds
      
    FROM Data.ItemsToMove
      
    GROUP BY TripNumber;


     


  • Scalar UDFs wrapped in CHECK constraints are very slow and may fail for multirow updates

    Surely this sounds like common knowledge, yet I decided to post a simple benchmark demonstrating how slow they are - in this case almost 100 times slower than a foreign key. Recently Pure Krome asked a question on stackoverflow about implementing subtypes in the database and the discussion lead to such benchmarking.

    The DDL for parent and child tables

    Here is the parent table:

     CREATE TABLE dbo.Animals
    (AnimalId INT NOT NULL IDENTITY PRIMARY KEY,
    AnimalType TINYINT NOT NULL, -- 1: Mammal, 2:Reptile, etc..
    Name VARCHAR(30))

     The child table and the check constraint that guarantees that all rows in the child table refer to Mammals:

    CREATE FUNCTION dbo.GetAnimalType(@AnimalId INT)
    RETURNS TINYINT
    AS
    BEGIN
    DECLARE 
    @ret TINYINT;
    SELECT @ret AnimalType FROM dbo.Animals
      
    WHERE AnimalId @AnimalId;
    RETURN @ret;
    END
    GO
    CREATE TABLE dbo.Mammals
    (AnimalId INT NOT NULL PRIMARY KEY,
    SomeOtherStuff VARCHAR(10),
    CONSTRAINT Chk_AnimalType_Mammal CHECK(dbo.GetAnimalType(AnimalId)=1)
    ); 

     Another approach, copying AnimalType column in the child table, and using a foreign key

     CREATE TABLE dbo.Mammals2
    (AnimalId INT NOT NULL PRIMARY KEY,
    AnimalType TINYINT NOT NULL,
    SomeOtherStuff VARCHAR(10),
    CONSTRAINT Chk_Mammals2_AnimalType_Mammal CHECK(AnimalType=1),
    CONSTRAINT FK_Mammals_Animals FOREIGN KEY(AnimalIdAnimalType)
      
    REFERENCES dbo.Animals(AnimalIdAnimalType)
    );

    And finally the child table without enforcing any logic with regards to AnimalType:

     

    CREATE TABLE dbo.Mammals3
    (AnimalId INT NOT NULL PRIMARY KEY,
    SomeOtherStuff VARCHAR(10)
    );
     

    Setting up the helper table:


    CREATE TABLE dbo.Numbers(INT NOT NULL PRIMARY KEY)
    GO
    DECLARE @i INT;
    SET @i 1;
    INSERT INTO dbo.Numbers(nSELECT 1;
    WHILE @i<128000 BEGIN
      INSERT INTO 
    dbo.Numbers(n)
        
    SELECT @i FROM dbo.Numbers;
      
    SET @i @i 2;
    END

     

    Benchmarking results

    The overhead of a FK was 78 ms

    The overhead of a UDF wrapped in CHECK constraint was 7453 ms

     Inserting 128K rows with a UDF:

    INSERT INTO dbo.Animals
      
    (AnimalTypeName)
    SELECT 1'some name' FROM dbo.Numbers;
    GO
    SET STATISTICS IO ON
    SET STATISTICS 
    TIME ON
    GO
    INSERT INTO dbo.Mammals
    (AnimalId,SomeOtherStuff)
    SELECT n'some info' FROM dbo.Numbers;
      

    SQL Server parse and compile time: 
    CPU time
    = 0 ms, elapsed time = 2 ms.
    Table 'Mammals'. Scan count 0, logical reads 272135,
        physical reads
    0, read-ahead reads 0, lob logical reads 0,
        lob physical reads
    0, lob read-ahead reads 0.
    Table 'Numbers'. Scan count 1, logical reads 441, physical reads 0,
        read
    -ahead reads 0, lob logical reads 0, lob physical reads 0,
        lob read
    -ahead reads 0.

    SQL
    Server Execution Times:
        CPU time
    = 7750 ms,  elapsed time = 7830 ms.


    (131072 row(s) affected)
     

      Inserting 128K rows with a FK:

     INSERT INTO dbo.Mammals2
    (AnimalId,AnimalType,SomeOtherStuff)
    SELECT n1'some info' FROM dbo.Numbers;
     

    SQL Server parse and compile time: 
       CPU time
    = 93 ms, elapsed time = 100 ms.
    Table 'Animals'. Scan count 1, logical reads 132, physical reads 0,
        read
    -ahead reads 0, lob logical reads 0, lob physical reads 0,
        lob read
    -ahead reads 0.
    Table 'Mammals2'. Scan count 0, logical reads 275381, physical reads 0,
       read
    -ahead reads 0, lob logical reads 0, lob physical reads 0,
       lob read
    -ahead reads 0.
    Table 'Numbers'. Scan count 1, logical reads 441, physical reads 0,
       read
    -ahead reads 0, lob logical reads 0, lob physical reads 0,
       lob read
    -ahead reads 0.

    SQL
    Server Execution Times:
       CPU time
    = 375 ms,  elapsed time = 383 ms.

     

    The baseline: inserting 128K rows into a table without CHECK or FK:

     INSERT INTO dbo.Mammals3
    (AnimalId,AnimalType,SomeOtherStuff)
    SELECT n1'some info' FROM dbo.Numbers;

     

    SQL Server Execution Times:
       CPU time
    = 0 ms,  elapsed time = 66 ms.
    Table 'Mammals3'. Scan count 0, logical reads 272135, physical reads 0,
        read
    -ahead reads 0, lob logical reads 0, lob physical reads 0,
        lob read
    -ahead reads 0.
    Table 'Numbers'. Scan count 1, logical reads 441, physical reads 0,
        read
    -ahead reads 0, lob logical reads 0, lob physical reads 0,
        lob read
    -ahead reads 0.

    SQL
    Server Execution Times:
       CPU time
    = 297 ms,  elapsed time = 303 ms.


    (131072 row(s) affected)
     
    The results of benchmarking are very clear, but the story does not end here, there is more to it: 
    A UDF wrapped in a CHECK constraint may not work correctly for multi row updates:
    Read the following post by Tony Rogerson:
    Using a UDF in a CHECK constraint to check validity of History Windows
     

     


  • Controlling the number and size of unit tests.

    I have been practicing database unit testing for two years, and I am a big fan of it. My test harness is a great help to me - it lets me change my modules quickly and with confidence. However, having too many unit tests is counterproductive - they become a burden and slow you down rather than help you out.Joel Spolski recently wrote a great article about the overhead of maintaining your unit tests being "disproportional to the amount of benefit that you get out of them".

     That article is a must read!

     

    Unit tests are an overhead.

     

    It takes time to develop unit test, and more time to maintain them. They also slow down working with source control and searches. For example, every time your search on a stored procedure name in your solution, every unit test running in shows up in your search results. Yet we go for this overhead, because we expect the payoff from better code quality and easier maintenance to compensate for it. Clearly to break even or benefit from unit tests we need to keep the overhead within reason.

     

    Unit tests take time to run, sometimes too much time.

     

    To be really useful, your unit test harness needs to run quickly - otherwise it will slow you down. You will either spend too much time watching it run, or you will not run it as often as you should - if you break something, you will not notice it right away. Sometimes this becomes a serious problem with C# or C++ solutions. Because database tests usually run much much slower, this may become a serious problem much sooner. In many cases you simply cannot afford to keep all your database unit test because of the sheer amount of time they run. Alternatively, database unit testing can be completely abandoned because of this slowness. It can be dismissed as yet another approach that sounds good in theory but does not hold up in the real world. Database unit testing does hold up in real life projects, but only if you use common sense and do it properly.

     

    Unit tests may help you out if you need to change your module.

     

    They can be extremely useful - you can recall what the module is supposed to do, identify an issue, make a change, and retest in just a few minutes. This is why we need them most of all. So, the more a unit test is potentially useful, the more you want to keep it long term. And, using the same logic, if a unit test is very unlikely to ever help you out, there is no reason to keep it.

     

    Complex modules need unit tests most of all. Trivial tests and tests for simple modules can be eliminated.

     

    Complex queries need more maintenance, because it takes more time to understand and change them, and because they are more likely to need optimization, This is where unit tests come very handy. On the other hand, trivial queries such as the following are easier to change and less likely to need optimization later:

     

    CREATE PROCEDURE Readers.SelectCustomerByID
    @CustomerID INT
    AS
    SET 
    NOCOUNT ON;
    SELECT CustomerIDFirstNameLastName 
    FROM Data.Customers 
    WHERE CustomerID @CustomerID;

     

     

    Do you have problems understanding what this procedure does? Do you expect it will be difficult to change it if needed? Do you think this procedure needs an explicit unit test?

     

    Do you really need existence tests?

     

    When we write blog posts and articles, our examples must be short and simple. Running a unit test to determine if your database or your table exists is a clear and simple example and a great way to learn how unit tests work - (make sure you have read this article by SQL Server MVP Andy Leonard). However, there is no need to keep such tests in real production solutions - if a database or a table is dropped, you will get a clear error message, and you will know what is your problem right away. If a database or a table is dropped, you will not need an existence unit test to help you out with troubleshooting, will you?

     

     Make a distinct database call once, and verify everything.

     

    Database call are slow, so you really want to limit the number of database calls your test harness does. For example, suppose that you have a complex stored procedure Readers.SelectCustomersByName, and if you only provide last name, it must return all the customers with that last name. Because the procedure is complex, you need a unit test for each distinct case of its usage. Because you want to minimize the number of database calls, you want to execute the following command only once:

    EXEC  Readers.SelectCustomersByName @LastName 'Larsen';

    It is very important to verify everything: number of result sets, number of columns, their names and types, and all the columns and rows returned by the procedure. The reason is simple: any change can be breaking, and you need to determine if you broke something when you do your changes. For more explanations read this article: "Close those Loopholes - Testing Stored Procedures"

    I've used the approach described in that article for two years and it works great. Everything is verified automatically, so I never forget to test anything, and I only need one database call for all my checks, so it runs fast.

     

    Clean up after test driven development.

     

    I like and practice test driven development, however I have the following reservations regarding it: all-too-often when you think that are done you have a huge amount of trivial unit tests. That seems to be the whole idea behind test driven development: every step in your progress is a unit test. However, there is no need to keep the whole history of your progress, to keep all the tests that drove your development, for the whole life of your system (each of us humans has had gills and a tail at some early stage of our development; we don't keep them forever, we undergo a refactoring). When you are done with your development, you only need the tests that might help you out in troubleshooting. Most likely you'll need to consolidate your unit tests in the way described in the previous chapter. Remember that test driven development is supposed to consist of the following stages:

    1. Encounter a non-trivial problem.
    2. Write a unit test that fails.
    3. Develop some code so that the test passes.
    4.  Refactor - test driven development does not stop when your tests pass!

    I think that this refactoring should include not only your code but unit tests too. Refactoring is done when your remaining unit tests are lean, so that the overhead of keeping them is justified by the benefits of having them.

    Let me repeat myself: from where I sit, test-driven development can be a huge help if you are solving a complex problem. However, it can slow you down if the steps you are taking are too easy for you. I guess I should write up some example of test driven development, but it is getting late, so not this time.


     


  • Calculating third Wednesday of the month with inline UDFs

    Problems such as calculating third Wednesday of the month or the last day of the month are very common. Recently Peter Larsson posted a scalar UDF that solves problems such as "third Wednesday of the month", and Uri Dimant added a comment with a solution by Steve Kass. Unfortunately, all these solutions use scalar UDFs and as such are very slow, so I decided to post some very fast inline UDFs that I find very useful.

     

    Composing a date from year, month, and day without checking if the parameters are valid.

     

    The following inline UDF composes a date. I deliberately intend to use it with invalid parameters, such as month equal to zero, because it dramatically simplifies my logic.

     

    CREATE FUNCTION Readers.ComposeDate(@year INT@month INT@day INT)
    -- I deliberately do not check if the parameters are valid
    RETURNS TABLE AS RETURN(
      
    SELECT DATEADD(DAY@day 1DATEADD(MONTH@month 1,  DATEADD(YEAR@year 1901'19010101')))
        
    AS ComposedDate

    GO
      

     

    Calculating the last days of previous and this months.

     

    Because my ComposeDate function handles out of range parameters, I don't need to handle January and December differently. Here you go, create them and see how they work:

     

    CREATE FUNCTION Readers.LastDayOfPreviousMonth(@year INT@month INT)
    RETURNS TABLE AS RETURN(
      
    SELECT ComposedDate AS LastDayOfPreviousMonth
        
    FROM Readers.ComposeDate(@year@month0)

    GO
    SELECT LastDayOfPreviousMonth'20081231' FROM Readers.LastDayOfPreviousMonth(2009,1)
    UNION ALL
    SELECT LastDayOfPreviousMonth'20090131' FROM Readers.LastDayOfPreviousMonth(2009,2)
    GO
    CREATE FUNCTION Readers.LastDayOfMonth(@year INT@month INT)
    RETURNS TABLE AS RETURN(
      
    SELECT ComposedDate AS LastDayOfMonth
        
    FROM Readers.ComposeDate(@year@month+10)

    GO
    SELECT LastDayOfMonth FROM Readers.LastDayOfMonth(2009,11)
    UNION ALL
    SELECT LastDayOfMonth FROM Readers.LastDayOfMonth(2009,12)
    GO 

     

    Calculating third Wednesday and other similar problems

     

    For simplicity (this is a blog post, not production code), this function does not determine if the returned date is actually in the required month or in another one. If you need such validation, use the approach demonstrated in the next example.

    CREATE FUNCTION Readers.MthDayOfNthWeek(@year INT@month INT@week INT@day INT)
    -- @day 1 Sun, 2 Mon etc.

    -- this function works in US but might need adjustments in other countries
    RETURNS TABLE AS RETURN
      
    SELECT t.ComposedDate AS MthDayOfNthWeek
        
    FROM (SELECT ComposedDate
          
    FROM Readers.ComposeDate(@year@month1)) AS d
            
    CROSS APPLY Readers.ComposeDate(@year@month
            
    @day DATEPART(dw,ComposedDate) + 1 +
            
    7*(@week CASE WHEN DATEPART(dw,ComposedDate)>@day THEN ELSE -END) ) AS t

    GO
    -- third Wednesday of June 2009
    SELECT MthDayOfNthWeek'20090617' AS ExpectedDate 
      
    FROM Readers.MthDayOfNthWeek(200963, 4)
    UNION ALL
    SELECT MthDayOfNthWeek'20090715' AS ExpectedDate 
      
    FROM Readers.MthDayOfNthWeek(200973, 4)
    UNION ALL
    SELECT MthDayOfNthWeek'20090601' AS ExpectedDate 
      
    FROM Readers.MthDayOfNthWeek(200961, 2)
    UNION ALL
    SELECT MthDayOfNthWeek'20090605' AS ExpectedDate 
      
    FROM Readers.MthDayOfNthWeek(200961, 6)
    UNION ALL
    SELECT MthDayOfNthWeek'20090606' AS ExpectedDate 
      
    FROM Readers.MthDayOfNthWeek(200961, 7)
    UNION ALL
    SELECT MthDayOfNthWeek'20090608' AS ExpectedDate 
      
    FROM Readers.MthDayOfNthWeek(200962, 2)
     

    This function is not completely trivial, so I used test driven development to come up with it.

     

    Validating dates

     

    If you need to validate dates, use the following approach:

     

    CREATE FUNCTION Readers.ComposeValidDate(@year INT@month INT@day INT)
    RETURNS TABLE AS RETURN(
      
    SELECT CASE WHEN YEAR(ComposedDate@year AND MONTH(ComposedDate@month 
          
    THEN ComposedDate END AS ComposedDate
        
    FROM Readers.ComposeDate(@year@month@day)

    GO
    SELECT ComposedDate'20090228' AS ExpectedResult FROM Readers.ComposeValidDate(2009228)
    UNION ALL 
    SELECT ComposedDate, NULL FROM Readers.ComposeValidDate(2009229)
    UNION ALL 
    SELECT ComposedDate, NULL FROM Readers.ComposeValidDate(200920)
    UNION ALL 
    SELECT ComposedDate, NULL FROM Readers.ComposeValidDate(2009012)
      

    Only the first result will validate.

     Read more about performance of inline UDFs in "Reuse Your Code with Table-Valued UDFs" and "Many nested inline UDFs are very fast"

     


  • Many nested inline UDFs are very fast

    It is well known that nested calls of scalar UDFs are a huge drag on performance, but in many cases inline UDFs can be nested without performance penalty. Here is a simple repro that you can run and see for yourself. Suppose that you already have two inline UDFs: one returning the previous calendar day and another returning the first day of the month, as follows:

     

    CREATE FUNCTION Readers.GetFirstDayOfMonth(@d DATETIME)
    RETURNS TABLE AS RETURN(
      
    SELECT DATEADD(MONTHDATEDIFF(MONTH'19900101'@d), '19900101'AS FirstDayOfMonth
    )
    GO
    DROP FUNCTION Readers.GetPreviousDay
    CREATE FUNCTION Readers.GetPreviousDay(@d DATETIME)
    RETURNS TABLE AS RETURN (
      
    SELECT DATEADD(DAYDATEDIFF(DAY'19900101'@d) - 1'19900101'AS PreviousDay
    )
    GO
      

     

    Suppose that you need to develop a UDF returning the last day of the previous month. You can develop it from scratch:

     

    CREATE FUNCTION Readers.GetLastDayOfPreviousMonth_Merged(@d DATETIME)
    RETURNS TABLE AS RETURN(
      
    SELECT DATEADD(DAYDATEDIFF(DAY'19900101'DATEADD(MONTHDATEDIFF(MONTH'19900101'@d), '19900101')) - 1'19900101')
        
    AS LastDayOfPreviousMonth

     

    Alternatively you can reuse your two existing UDFs in several ways, as follows:

     

    CREATE FUNCTION Readers.GetLastDayOfPreviousMonth_Nested1(@d DATETIME)
    RETURNS TABLE AS RETURN(
      
    SELECT p.PreviousDay AS LastDayOfPreviousMonth
        
    FROM Readers.GetFirstDayOfMonth(@dAS f
        
    CROSS APPLY Readers.GetPreviousDay(f.FirstDayOfMonthAS p   
    )
    GO
    CREATE FUNCTION Readers.GetLastDayOfPreviousMonth_Nested2(@d DATETIME)
    RETURNS TABLE AS RETURN(
      
    SELECT p.PreviousDay AS LastDayOfPreviousMonth
        
    FROM (SELECT FirstDayOfMonth FROM Readers.GetFirstDayOfMonth(@dAS fAS f
        
    CROSS APPLY Readers.GetPreviousDay(f.FirstDayOfMonthAS p   
    )
      

     

    To benchmark, I used my Data.Numbers table whihc has 1 million rows. You can use any large enough table.

     

    SET STATISTICS TIME ON
    SET STATISTICS 
    IO ON
    GO

    -- this table has 1 million rows
    SELECT COUNT(*) FROM Data.Numbers

    DECLARE @d DATETIME@d1 DATETIME;
    SET @d '20090512';
    SELECT @d1=LastDayOfPreviousMonth --INTO #t1
    FROM Data.Numbers
      
    CROSS APPLY Readers.GetLastDayOfPreviousMonth_Merged(@d);

    GO 

    DECLARE @d DATETIME@d1 DATETIME;
    SET @d '20090512';
    SELECT @d1=LastDayOfPreviousMonth --INTO #t1
    FROM Data.Numbers
      
    CROSS APPLY Readers.GetLastDayOfPreviousMonth_Nested1(@d);

    GO

    DECLARE @d DATETIME@d1 DATETIME;
    SET @d '20090512';
    SELECT @d1=LastDayOfPreviousMonth --INTO #t1
    FROM Data.Numbers
      
    CROSS APPLY Readers.GetLastDayOfPreviousMonth_Nested2(@d);

    GO

    -- this measures the overhead of benchmarking, 
    -- scanning the table and assigning a value to a variable
    DECLARE @d DATETIME@d1 DATETIME;
    SET @d '20090512';
    SELECT @d1=@d --INTO #t1
    FROM Data.Numbers

     

     

    I did not notice any difference in performance beyond usual noise. Of course, in some very complex cases performance of nested inline UDFs might suffer, but in general this is not the case and usually you can reuse your inline UDFs without performance penalty.

     My previous post about inline UDFs is this: Reuse Your Code with Table-Valued UDFs


  • Override the optimizer or give it an easy task?

    When your query must always return in a certain amount of time, trusting the optimizer might not be the best choice. Suppose your select must return in one second no matter what, and it usually returns in 50 ms, but sometimes, once in 100K calls, it takes full five seconds to complete, and clearly such query does not meet the requirement to return in one second. Sounds familiar? Sure, optimizers do behave like this from time to time.  It is very tempting to use hints or even plan freezing. However, neither hints nor plan freezing are long term maintenance-free solutions, and the reason is simple: what today looks like a great plan might at some later time become counterproductive.

    In cases like this

    index covering really shines.

    Scanning a range of a covering index is a very simple and efficient plan. Even better, it is one and the same plan for all possible combinations of parameters, so you may not expect unpleasant surprises from a confused optimizer. Of course, if the amount of data increases, the response time may increase too, but you may monitor it and react accordingly if needed. If you do not load huge amounts of data, your response time will not jump from 50 ms to five seconds any time soon, as long as the optimizer chooses the same range scan over and over again.

    Responding to the

    recent post by SQL server MVP Tibor Karaszi, I think that not all queries are born equal - some must respond in a certain time even if that slows down other activity on the server. This means that in some cases it is OK if index covering slows down your modifications, provided that the modifications have lower priority.

  • When correlated columns are in different tables

    In my previous post

    I described how correlation between columns may confuse the optimizer and cause it to choose an inefficient plan. I also recommended index covering as the most robust way to optimize queries that involve correlated columns.

    However, there are quite a few cases when correlated columns are in different tables. For example,  employees' home zipcodes and their cell phone area codes may be strongly correlated. Similarly, car models and kinds of repair work for those cars may be strongly correlated too.

    In such cases indexed views come very handy, because they can allow the optimizer to satisfy such queries with one and the same plan, the range scan, regardless of the expected cardinality. Even if the optimizer's estimated cardinality is wildly wrong, it does not matter if the execution plan is still the same range scan.

    Of course, indexed views come with a rather hefty price tag,

    and in some cases you may have to drop them altogether.

     


  • When correlation between columns fools the optimizer

    When two columns are correlated, it may fool the optimizer and cause it to choose a wrong plan. Here is a simple script that demonstrates it.
    The following script creates a table and populates some sample data:

     

    CREATE TABLE dbo.Vehicles(ID INT NOT NULL CONSTRAINT PK_Vehicles PRIMARY KEY,
      
    Make VARCHAR(10),
      
    Model VARCHAR(10),
      
    SomeOtherData CHAR(50)
    );
    GO
    DECLARE @i INT;
    SET NOCOUNT ON;
    SET @i=0;
    WHILE @i<100000 BEGIN
      INSERT 
    dbo.Vehicles(IDMakeModelSomeOtherData)
      
    SELECT @i
        
    CASE WHEN @i%100 THEN 'Toyota' ELSE 'M'+CAST(@i AS VARCHAR(6)) END,
        
    CASE WHEN @i%100 THEN 'Camry' 
           
    WHEN @i%100 THEN 'Corolla' 
           
    ELSE 'M'+CAST(@i AS VARCHAR(6)) END,
        
    'Some Data'
      
    SET @i @i 1;
    END
    GO
    CREATE INDEX Vehicles_Make ON dbo.Vehicles(Make);
    GO 

     

    The following query chooses to scan the whole clustered index, because it considers the predicate Make='Toyota' AND Model='Camry' not selective enough to justify bookmark lookups.

     

    SET STATISTICS IO ON;
    SET STATISTICS TIME ON;
    SELECT IDMakeModelSomeOtherData
      
    FROM dbo.Vehicles 
      
    WHERE Make='Toyota'
      
    AND Model='Camry';


     

    Note: while this is true on my server, nobody can guarantee that the optimizer will behave in the same way for all editions and versions.

    Clearly make and model of a car are strongly correlated - when you know car's model, you almost always know its make too. 2% of vehicles in the table are Toyotas, 1% are Camrys, and 1% are Toyota Camrys. If make and model were independent, there would be 0.02*0.01=0.0002=0.02% Toyota Camrys. Of course this is not the case, but let us see what happens when you create another index and rerun the same select:

     

    CREATE INDEX Vehicles_Model ON dbo.Vehicles(Model);
     

     

    The select now runs slower, and makes three times more reads that the clustered index scan.
    The reason is simple - creating another index comes with statistics on Model column.
    When I looked at the execution plan, I saw that the optimizer expected 20 rows, which is precisely 0.02%.In fact, there are 1000 Toyota Camry - the optimizer's estimate is off the mark. What happened?


    In this case, for this version and edition, the optimizer assumed that make and model are independent, decided that the search criteria (Make='Toyota' AND Model='Camry') is very selective, and chosen nested loops.
    Let me repeat myself - that might or might not be the case if you rerun my scripts on your system.

     

    Over time the optimizer can and does change. in my opinion the most robust way to optimize such queries is to create covering indexes, so that the plan stays the same regardless of selectivity estimates. Of course there are other approaches, which might be better in some other cases.


    Note: If you are still reading this, you might also want to read about DATE_CORRELATION_OPTIMIZATION setting in BOL.


  • Be ready to drop your indexed view.

    In all-too-many cases an indexed view may solve your short term performance goals but at some later time become counterproductive. So if you choose to use an indexed view, you may need an exit strategy. Let me describe a few common problems with indexed views.



    Indexed views may increase lock contention.

     

    It is very easy to demonstrate. Create the following table:

     

    CREATE TABLE dbo.ChildTable(ChildID INT NOT NULL 
      
    CONSTRAINT PK_ChildTable PRIMARY KEY,
      
    ParentID INT NOT NULL,
      
    Amount INT NOT NULL);
    GO
      

     

    From one tab in SSMS, run this script:

     

    BEGIN TRAN;
    INSERT INTO dbo.ChildTable(ChildIDParentIDAmount)
      
    VALUES(1,1,1); 

     

    From another tab, run a similar one:

     

    BEGIN TRAN;
    INSERT INTO dbo.ChildTable(ChildIDParentIDAmount)
      
    VALUES(2,1,1);
    ROLLBACK;
      

     

    Note that both inserts complete, they do not block each other. Rollback in both tabs, and create an indexed view:

     

    CREATE VIEW dbo.ChildTableTotals WITH SCHEMABINDING
    AS
    SELECT 
    ParentID
      
    COUNT_BIG(*) AS ChildRowsPerParent
      
    SUM(AmountAS SumAmount
    FROM dbo.ChildTable
    GROUP BY ParentID;
    GO
    CREATE UNIQUE CLUSTERED INDEX ChildTableTotals_CI 
      
    ON dbo.ChildTableTotals(ParentID);



    Rerun the two inserts. Note that the second one does not complete; it is blocked. The reason is very simple: the first insert modifies the corresponding entry in the indexed view, so the insert acquires and holds a lock on it.

     

    It is just as easy to demonstrate that when you create an indexed view, deadlocks may become more likely too.


    Note: this is not a problem with the way indexed views are implemented. If you roll out your own summary table, and develop triggers which directly modify it to keep it up-to-date, you will encounter the same problem. Only if you don't maintain your summary table all the time, you can get around this locking problem, but a more detailed discussion of this is beyond the scope of this post.

     

     

    Also make sure you have read an excellent post by Tony Rogerson:

     

    Indexed views on joins may become counterproductive.

     

    Create the following table and another indexed view:

    CREATE TABLE dbo.ParentTable(ParentID INT NOT NULL 
      
    CONSTRAINT PK_ParentTable PRIMARY KEY,
      
    WideData CHAR(1000) NOT NULL);
    GO

    CREATE VIEW dbo.ParentTableWithAmounts WITH SCHEMABINDING
    AS
    SELECT 
    p.ParentIDp.WideDatac.ChildIDc.Amount
    FROM dbo.ParentTable AS JOIN dbo.ChildTable AS c
      
    ON p.ParentID c.ParentID;
    GO

    CREATE UNIQUE CLUSTERED INDEX ParentTableWithAmounts_CI 
      
    ON dbo.ParentTableWithAmounts(ChildID);
     

    Suppose that originally you have an average one child row per parent on, and that selecting from this indexed view is faster than joining two tables - that's why you created it in the first place. However, if on average you have 10K child rows per parent one, your indexed view becomes counterproductive. Let's add 10K child rows:

     

    INSERT INTO dbo.ParentTable(ParentIDWideData)
      
    VALUES(1,'asdf');

    SET NOCOUNT ON;
    DECLARE @i INT;
    SET @i=10000;
    WHILE @i<20000 BEGIN
      INSERT INTO 
    dbo.ChildTable(ChildIDParentIDAmount)
        
    VALUES(@i,1,1);
      
    SET @i=@i+1;
    END
      

    Let us select from this indexed view:

     

    SELECT ParentIDWideData,  ChildID,  Amount
      
    FROM dbo.ParentTableWithAmounts

     

    The optimizer has chosen not to use the indexed view, and the execution costs are as follows:

     

    Table 'ChildTable'. Scan count 1, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'ParentTable'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

     

    If you force the optimizer to use the indexed view,  the execution costs are dramatically higher:

     

    SELECT ParentIDWideData,  ChildID,  Amount
      
    FROM dbo.ParentTableWithAmounts WITH(NOEXPAND);
      

     Table 'ParentTableWithAmounts'. Scan count 1, logical reads 1435, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    As you have seen, when the parent table is wide and there are many child rows per parent, the indexed view becomes counterproductive, and the optimizer is able to recognize the fact.

    Also note that this indexed view increases lock contention just as the previous one did. You can try to update a parent row and one of its child rows and see for yourself.

     

    Exit strategy for NOEXPAND hint

     

    If your indexed view becomes counterproductive, you may consider dropping it altogether. However, dropping the index view will break all those queries with NOEXPAND hint:

     

    DROP INDEX dbo.ParentTableWithAmounts.ParentTableWithAmounts_CI;
    GO
    SELECT ParentIDWideData,  ChildID,  Amount
      
    FROM dbo.ParentTableWithAmounts WITH(NOEXPAND);
     

    Msg 8171, Level 16, State 2, Line 1
    Hint 'noexpand' on object 'dbo.ParentTableWithAmounts' is invalid.

    I would recommend to be prepared for such possibility, to wrap all your selects using NOEXPAND in stored procedures, for example:

     

    CREATE PROCEDURE dbo.SelectParentTableWithAmounts
    AS
    SELECT 
    ParentIDWideData,  ChildID,  Amount
      
    FROM dbo.ParentTableWithAmounts WITH(NOEXPAND);


    and to provide a rollback script which alters these procedures, as follows:

     

    DROP INDEX dbo.ParentTableWithAmounts.ParentTableWithAmounts_CI;
    GO

    ALTER PROCEDURE dbo.SelectParentTableWithAmounts
    AS
    SELECT 
    ParentIDWideData,  ChildID,  Amount
      
    FROM dbo.ParentTableWithAmounts

    You can add error handling to this script, so that either both changes deploy or none does. Make sure to test this script. If you have unit tests, include this scenario in your test harness. Of course there are other approaches, but my main point is that you need to be aware that your indexed view can be dropped, and have a working exit strategy.

     

    Performance considerations for your exit strategy

     

    We have just discussed how to make sure that your application does not break, but what about the performance? Clearly the performance of your selects may plunge, what can be done about it? In many cases, index covering gives you acceptable performance without too much lock contention. Also sometimes you want to roll out your own summary tables, but that sounds like a topic for another post.

     


  • Without ORDER BY, there is no default sort order.

    Sounds trivial? Right, but different flavors of this myth still persist. Yesterday I noticed a thread on stackoverflow discussing SELECT TOP without a where clause, where it was suggested that "The order will be defined based on the clustered key in that table."

    http://stackoverflow.com/questions/882195/what-does-top-1-mean-in-an-sql-query/882690#882690

    Because apparently many visitors agreed with this myth, I decided to post a repro script which demonstrates that this is simply not true. Here you go:

    CREATE SCHEMA Data AUTHORIZATION dbo
    GO
    CREATE TABLE Data.Numbers(Number INT NOT NULL PRIMARY KEY)
    GO
    DECLARE @ID INT;
    SET NOCOUNT ON;
    SET @ID 1;
    WHILE @ID 100000 BEGIN
     INSERT INTO 
    Data.Numbers(Number)
     
    SELECT @ID;
     
    SET @ID @ID+1;
    END

    CREATE TABLE Data.WideTable(ID INT NOT NULL 
    CONSTRAINT PK_WideTable PRIMARY KEY,
    RandomInt INT NOT NULL,
    CHARFiller CHAR(1000))
    GO
    CREATE VIEW dbo.WrappedRand
    AS
    SELECT 
    RAND() AS random_value
    GO
    CREATE ALTER FUNCTION dbo.RandomInt()
    RETURNS INT
    AS
    BEGIN
    DECLARE 
    @ret INT;
    SET @ret (SELECT random_value*1000000 FROM dbo.WrappedRand);
    RETURN @ret;
    END
    GO

    INSERT INTO Data.WideTable(ID,RandomInt,CHARFiller)
    SELECT Numberdbo.RandomInt(), 'asdf'
    FROM Data.Numbers
    GO
    CREATE INDEX WideTable_RandomInt ON Data.WideTable(RandomInt)
    GO
    SELECT TOP 100 ID FROM Data.WideTable 

    1407
    253
    9175
    6568
    4506
    1623
    581

    As you have seen, the optimizer has chosen to use a non-clustered index to stisfy this SELECT TOP query.

     

     


  • Court in New Jersey ordered code review of software used for alcohol breath tests

    Court in New Jersey ordered code review of software used for alcohol breath tests.

    Sounds like an interesting precedent to me.

     


  • Are values generated by NewId() distributed evenly?

    I ran a quick check, and it looks like a reasonably even distribution to me. I genenerated 2M NewId values, as follows:

     

    SELECT NewID() AS RandomColumn INTO #t  FROM Data.TableWith2Mrows


     I looked at the distribution into 256 buckets, considering the first byte of the value only. All 256 buckets had values:

     

    SELECT COUNT(*) AS Cnt
    FROM (
    SELECT SUBSTRING(RandomColumn,1,1) AS FirstByte, COUNT(*) AS Frequency
    FROM (SELECT CAST(RandomColumn AS BINARY(16)) AS RandomColumn FROM #t) AS t
    GROUP BY SUBSTRING(RandomColumn,1,1)
    ) AS Distribution

    -----------

     

    256

     

    and the number of values in all the buckets was between 7200 and 7700:

     

     SELECT COUNT(*) AS Cnt, Frequency
    FROM (
    SELECT SUBSTRING(RandomColumn,1,1) AS FirstByte, COUNT(*) AS Frequency
    FROM (SELECT CAST(RandomColumn AS BINARY(16)) AS RandomColumn FROM #t) AS t
    GROUP BY SUBSTRING(RandomColumn,1,1)
    ) AS Distribution
    GROUP BY Frequency
    ORDER BY Frequency

     After I had posted this, Simon Sabin told me that he also posted a similar script. He did it four years ago. Somehow I failed to google it up:

     

    http://sqlblogcasts.com/blogs/simons/archive/2005/06/13/Random-number-on-a-per-row-basis.aspx

     


  • Your TRY block may fail, and your CATCH block may be bypassed.

    Some T-SQL code is written under the assumption that either a TRY block successfully completes or a CATCH block is invoked. Most likely, this is the case. However, there is a third, although rare, possibility – the TRY block may fail, and the CATCH one is bypassed. Let me provide some examples. I do not intend to provide a comprehensive list of all such cases, I only want to demonstrate that sometimes CATCH blocks are bypassed. Also I would like to emphasize that in almost all the cases CATCH blocks do catch errors, and the exceptions, if any, are quite rare.

     

    KILL command and timeout (aka attention) both stop execution without invoking CATCH blocks.

     

    As a result of a KILL or an attention, the execution stops immediately. You can run the following script, cancel the query or kill it from another tab, and see for yourself:

     

    SELECT @@SPID;

    GO

    BEGIN TRY

      PRINT 'Before WAITFOR';

      WAITFOR DELAY '00:35:00';

      PRINT 'After WAITFOR';

    END TRY

    BEGIN CATCH

      SELECT 'Beginning CATCH block';

      SELECT ERROR_NUMBER(), ERROR_MESSAGE();

    END CATCH

    PRINT 'At the end of the same batch';

    GO

    PRINT 'Next batch';

     

    You can also invoke it from ADO.Net, and you can also decrease CommandTimeout, so that the timeout occurs faster.

    As you have seen, you can only catch such errors on the client.

    Note: timeout and lock timeout are different. Lock timeout errors are caught by CATCH blocks.

     

    Some compile errors cancel batch execution.

     

    See for yourself:

     

    BEGIN TRY

      PRINT 'Beginning TRY';

      BEGIN TRAN

      INSERT data.SomeData(ID, AnotherID) VALUES(-1, -2);

    -- make sure #t does not exists for your connection

      DELETE FROM #t;

      COMMIT;

      PRINT 'Finishing TRY';

    END TRY

    BEGIN CATCH

      SELECT 'Beginning CATCH block';

      SELECT ERROR_NUMBER(), ERROR_MESSAGE();

      ROLLBACK;

    END CATCH

    PRINT 'At the end of the same batch';

    GO

    PRINT 'Next batch';

     

    Beginning TRY

     

    (1 row(s) affected)

    Msg 208, Level 16, State 0, Line 6

    Invalid object name '#t'.

    Next batch

     

    This problem is more likely to happen if you work with temporary tables.  However, because at the time of this writing we have deferred name resolution, you can get this problem with a permanent object too. Surprisingly enough, if you wrap this batch in a stored procedure, and invoke your procedure from another TRY block, the second TRY block does catch the error. You can create the procedure and see for yourself:

     

    CREATE PROCEDURE dbo.TestProc

    AS

    BEGIN TRY

      PRINT 'Beginning TRY';

      DELETE FROM #t;

      --INSERT data.SomeData(j) SELECT 1;

      PRINT 'Finishing TRY';

    END TRY

    BEGIN CATCH

      SELECT 'Beginning CATCH block';

      SELECT ERROR_NUMBER(), ERROR_MESSAGE();

    END CATCH

    PRINT 'At the end of the same batch';

    GO

    BEGIN TRY

      PRINT 'Beginning TRY';

      EXEC dbo.TestProc;

      PRINT 'Finishing TRY';

    END TRY

    BEGIN CATCH

      SELECT 'Beginning CATCH block';

      SELECT ERROR_NUMBER(), ERROR_MESSAGE();

    END CATCH

    PRINT 'At the end of the same batch';

     

    In my opinion it would be preferable not to start executing a batch if there are compilation errors, and in most cases exactly this is happening:

     

    BEGIN TRY

      PRINT 'Beginning TRY';

      DECLARE @i INT;

      INSERT data.SomeData(j) SELECT 1;

      PRINT 'Finishing TRY';

    END TRY

    BEGIN CATCH

      SELECT 'Beginning CATCH block';

      SELECT ERROR_NUMBER(), ERROR_MESSAGE();

    END CATCH

    PRINT 'At the end of the same batch';

    GO

    PRINT 'Next batch';

    Msg 207, Level 16, State 1, Line 4

    Invalid column name 'j'.

    Next batch

     

    If you are still reading this, also make sure you have read Erland Sommarskog’s articles on error handling:

    http://www.sommarskog.se/error-handling-I.html

    http://www.sommarskog.se/error-handling-II.html

     and STRICT_CHECKS:

    http://www.sommarskog.se/strict_checks.html

     

     


  • Defensive database programming: fun with triggers.

    There are three kinds of triggers: those which blow up and those which fail silently ;). Seriously, there are three all-too-common kinds of problems related to triggers:

    • ·         Triggers can fail to work as expected
    • ·         Triggers can be bypassed
    • ·         Triggers can expose problems in other code, such as T-SQL code written under the assumption that there are no triggers

    Fortunately there already are lots of good articles and posts written on this subject, so I don’t have to describe it all by myself. Still, there is a lot to be said, and there is no way I could squeeze it all into a single post. As usual in this series, I will concentrate only on correctness of the results, I will not consider the performance. Also I do not intend to list all possible kinds of problems, but only the ones about which I know and consider them worth mentioning. As usual, comments and additions are welcome.

     

    Disclaimer: don’t get me wrong – I do not object to using triggers, I use them myself in my system. However, if you do not use triggers properly, you can expose your system to several potential problems. So you need to be aware of these potential problems when you develop robust triggers.

     

    Triggers that fail to work as expected


    Triggers should not assume that statements can modify only one row at a time

     

    This may sound trivial, but this seems to be the most common mistake. You can skip this chapter if you are already familiar with the problem.

     

    Some triggers are coded under the assumption that only one row can be inserted/updated/deleted at a time. Such assumptions are usually unfounded. In fact, I have never seen a genuine business requirement prohibiting multi-row changes. However, apparently the most common problem with beginners’ triggers is their (triggers) inability to handle multi-row modifications. For example, consider the following table and a trigger which is supposed to populate it:

     

    CREATE TABLE ChangeLogs.SomeDataLog(ID INT, Change VARCHAR(30),

      ChangedBy VARCHAR(30), ChangeTime DATETIME);

    GO

    CREATE TRIGGER SomeData_Ins ON Data.SomeData

    FOR INSERT

    AS

    DECLARE @ID INT;

    SET @ID = (SELECT ID FROM INSERTED);

    INSERT INTO ChangeLogs.SomeDataLog(ID, Change,

      ChangedBy, ChangeTime)

    VALUES(@ID, 'Inserted', SUSER_NAME(), Getdate());

     

     

    Clearly this trigger cannot handle multi-row inserts:

     

    DELETE FROM data.SomeData;

    INSERT INTO data.SomeData(ID, AnotherID)

    SELECT 1,2 UNION ALL SELECT 3, 4;

     

    Msg 512, Level 16, State 1, Procedure SomeData_Ins, Line 5

    Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

    The statement has been terminated.

     

    Although the trigger blew up, at least you have received an error message indicating that you have a problem. The following trigger will silently fail, logging just one insert:

     

    CREATE TRIGGER SomeData_Ins ON Data.SomeData

    FOR INSERT

    AS

    DECLARE @ID INT;

    SELECT @ID = ID FROM INSERTED;

    INSERT INTO ChangeLogs.SomeDataLog(ID, Change,

      ChangedBy, ChangeTime)

    VALUES(@ID, 'Inserted', SUSER_NAME(), Getdate());

     

     

    Apparently this is a very common, although trivial, mistake. The correct solution is very simple:

     

    CREATE TRIGGER SomeData_Ins ON Data.SomeData

    FOR INSERT

    AS

    INSERT INTO ChangeLogs.SomeDataLog(ID, Change,

      ChangedBy, ChangeTime)

    SELECT ID, 'Inserted', SUSER_NAME(), Getdate() FROM INSERTED;

     

     

    Understanding RECURSIVE_TRIGGERS setting and using it properly

    If RECURSIVE_TRIGGERS setting is ON, then if your trigger modifies the same table, those modifications will also fire triggers. This can be used to modify data recursively. For example, consider the following table and sample data:

     

    CREATE TABLE Data.Nodes(NodeID INT NOT NULL,

      ParentID INT NULL,

      Status VARCHAR(20));

    GO

    INSERT INTO Data.Nodes(NodeID,

      ParentID,

      Status)

    SELECT 1, NULL, 'Sleeping' UNION ALL

    SELECT 2, 1, 'Sleeping'  UNION ALL

    SELECT 3, 2, 'Sleeping';

    GO

     

    Suppose that if a node becomes active, all its ancestors must also become active. The following trigger attempts to implement this business rule:

     

    CREATE TRIGGER Nodes_Upd ON Data.Nodes

    FOR UPDATE

    AS

    IF EXISTS(SELECT 1 FROM INSERTED) BEGIN

      UPDATE Data.Nodes SET Status = 'Active'

        WHERE NodeID IN(SELECT ParentID FROM INSERTED WHERE Status = 'Active');

    END

     

     

    For hierarchies without cycles, and less than 32 levels deep, this approach works, provided that RECURSIVE_TRIGGERS setting is ON:

     

    ALTER DATABASE Test SET RECURSIVE_TRIGGERS ON;

    UPDATE Data.Nodes SET Status = 'Sleeping';

    UPDATE Data.Nodes SET Status = 'Active'

    WHERE NodeID = 3;

    SELECT NodeID, Status FROM Data.Nodes

     

    NodeID      Status

    ----------- --------------------

    1           Active

    2           Active

    3           Active

     

     However, if you use this approach, you need to ensure that:

    ·         RECURSIVE_TRIGGERS setting is ON

    ·         Your data does not have cycles

    ·         Your hierarchies are less than 32 levels deep

     The first condition is obvious. The second one needs an example. Consider the following hierarchy with a cycle:

     

    INSERT INTO Data.Nodes(NodeID,

      ParentID,

      Status)

    SELECT 4, 5, 'Sleeping'  UNION ALL

    SELECT 5, 4, 'Sleeping';

    UPDATE Data.Nodes SET Status = 'Active'

    WHERE NodeID = 5;

     

     

    As one node becomes active, the trigger begins an infinite loop and hits the 32-level limit:

     

     

    INSERT INTO Data.Nodes(NodeID,

      ParentID,

      Status)

    SELECT 4, 5, 'Sleeping'  UNION ALL

    SELECT 5, 4, 'Sleeping';

    UPDATE Data.Nodes SET Status = 'Active'

    WHERE NodeID = 5;

     

    Msg 217, Level 16, State 1, Procedure Nodes_Upd, Line 5

    Maximum stored procedure, function, trigger, or view nesting level exceeded (limit 32).

     

    You can easily fix the trigger so that it tolerates cycles, as follows:

     

    DROP TRIGGER Data.Nodes_Upd;

    GO

    CREATE TRIGGER Nodes_Upd ON Data.Nodes

    FOR UPDATE

    AS

    IF EXISTS(SELECT 1 FROM INSERTED) BEGIN

      UPDATE Data.Nodes SET Status = 'Active'

        WHERE NodeID IN(SELECT ParentID FROM INSERTED WHERE Status = 'Active')

        AND Status <> 'Active';

    END

    However, the best practice is to ensure the integrity of your data rather than to develop workarounds because your data has problems. This is why I am suggesting that the data should have no cycles. Describing how to accomplish that is beyond the scope of this post.

     

    Finally, let me provide a short example demonstrating how this trigger blows up if the hierarchy is more than 32 levels deep:

     

    TRUNCATE TABLE Data.Nodes;

    SET NOCOUNT ON;

    DECLARE @ID INT;

    SET @ID = 100;

    WHILE @ID<200 BEGIN

      INSERT INTO Data.Nodes(NodeID,

        ParentID,

        Status)

      SELECT @ID, CASE WHEN @ID=100 THEN NULL ELSE @ID-1 END, 'Sleeping';

      SET @ID = @ID + 1;

    END

    UPDATE Data.Nodes SET Status = 'Active'

    WHERE NodeID = 199;

     

    Msg 217, Level 16, State 1, Procedure Nodes_Upd, Line 4

    Maximum stored procedure, function, trigger, or view nesting level exceeded (limit 32).

    As you have seen, you need to be very careful when you work with recursive triggers.

     

    In 2005 and later versions, triggers should take in account snapshot isolation

     

    SQL Server MVP Hugo Kornelis has demonstrated the problem:

    http://sqlblog.com/blogs/hugo_kornelis/archive/2006/07/26/Snapshot-and-integrity-part-2.aspx

     

     and posted a workaround:

    http://sqlblog.com/blogs/hugo_kornelis/archive/2006/08/25/snapshot-isolation-a-threat-for-integrity-part-3.aspx

     

    In 2008, triggers should not rely on @@ROWCOUNT the way they could in 2005 and before

     Recently SQL Server MVP Steve Kass described the problem here:

    http://sqlblog.com/blogs/steve_kass/archive/2009/04/29/It_2700_s-2008-and-rowcount-ain_2700_t-what-it-used-to-be.-Don_2700_t-use-it-in-triggers_2100_.aspx

     

    Triggers can render transactions doomed, making it impossible to rollback to a savepoint

    I have demonstrated the problem in this post:

    http://sqlblog.com/blogs/alexander_kuznetsov/archive/2008/11/15/avoid-mixing-old-and-new-styles-of-error-handling.aspx

     

    Another trigger can undo your trigger’s work

     

    If you have multiple triggers which are created on the same table and which fire on the same operation, one trigger may overwrite another trigger’s modifications. For instance, the following trigger will deliberately undo the changes done by the previous trigger (SomeData_Ins):

     

    CREATE TRIGGER Undo_Changes_By_SomeData_Ins ON Data.SomeData

    FOR INSERT

    AS

    DELETE FROM ChangeLogs.SomeDataLog WHERE ID IN(

      SELECT ID FROM INSERTED);

     

    Usually this problem does not manifest itself so clearly as in this obvious example.

    In my next post I will describe some situations when triggers are bypassed and when they expose problems in other T-SQL code.

     

    This post continues my series on defensive database programming. Here are my previous posts  from the series: 

    Defensive database programming: fun with ROWCOUNT

    Summarizing previous posts about defensive database programming

    Defensive database programming: SET vs. SELECT.

    Stress testing UPSERTs

    Defensive database programming: fun with UPDATE.

    Defensive database programming: eliminating IF statements.

    Defensive database programming: fun with changing column widths.

    Avoid mixing old and new styles of error handling.

    Defensive database programming: adding ESCAPE clauses.

    Defensive database programming: qualifying column names.

    Defensive database programming: rewriting queries with NOT IN().

     


  • Two suggestions to enhance constraints in SQL Server

    We can tuck any logical expressions in CHECK constraints, but all foreign keys are currently capable of right now is to verify that one or more column values are equal. In many cases it would be very useful to have foreign keys use more complex logical expressions. Also in some cases the ability to create constraints on indexed views would be very handy too. I will provide examples and hopefully we will come up with Connect items to vote for.

     

    When more complex logical expressions in foreign keys are useful

     

    Consider the following simple common sense rule: the maximum current of your device cannot exceed the maximum current of the circuit you plug it into. Suppose that the following tables store data about circuits and devices:

     

    CREATE TABLE Data.Curcuits(CurcuitID INT NOT NULL

      CONSTRAINT PK_Curcuits PRIMARY KEY,

      MaximumCurrent INT NOT NULL,

      Description VARCHAR(100) NOT NULL);

    GO

    INSERT INTO Data.Curcuits(CurcuitID,

      MaximumCurrent,

      Description)

    SELECT 1, 25, 'Deck and Garage';

    GO

    CREATE TABLE Data.Devices(DeviceID INT NOT NULL

      CONSTRAINT PK_Devices PRIMARY KEY,

      CurcuitID INT NULL,

      MaximumCurrent INT NOT NULL,

      Description VARCHAR(100) NOT NULL,

      CONSTRAINT FK_Devices_Curcuits FOREIGN KEY(CurcuitID)

        REFERENCES Data.Curcuits(CurcuitID)

      );

    GO

     

    It would be very convenient to issue a simple command and implement this business rule:

     

    ALTER TABLE Data.Devices ADD CONSTRAINT FK_Devices_Curcuits

      FOREIGN KEY(CurcuitID, MaximumCurrent)

      REFERENCES Data.Curcuits(CurcuitID, MaximumCurrent)

     MATCH ON((Data.Devices.CurcuitID = Data.Curcuits.CurcuitID) AND

     (Data.Devices.MaximumCurrent <= Data.Curcuits.MaximumCurrent));

     

    However, it is not supported, so I need to use a workaround, one more column and three constraints instead of one, as follows:

     

    ALTER TABLE Data.Curcuits

      ADD CONSTRAINT UNQ_Curcuits UNIQUE(CurcuitID, MaximumCurrent);

    GO

    ALTER TABLE Data.Devices ADD CurcuitMaximumCurrent INT NULL;

    GO

    ALTER TABLE Data.Devices DROP CONSTRAINT FK_Devices_Curcuits;

    GO

    ALTER TABLE Data.Devices ADD CONSTRAINT FK_Devices_Curcuits

      FOREIGN KEY(CurcuitID, CurcuitMaximumCurrent)

      REFERENCES Data.Curcuits(CurcuitID, MaximumCurrent)

      ON UPDATE CASCADE;

    GO

    ALTER TABLE Data.Devices

      ADD CONSTRAINT CHK_Devices_SufficientCurcuitMaximumCurrent

      CHECK(CurcuitMaximumCurrent >= MaximumCurrent);

    GO

    You can verify that the constraints work:

     

    INSERT INTO Data.Devices(DeviceID,

      CurcuitID,

      MaximumCurrent,

      CurcuitMaximumCurrent,

      Description)

    SELECT 1, 1, 50, 25, 'Electric car charger'

     

    Msg 547, Level 16, State 0, Line 1

    The INSERT statement conflicted with the CHECK constraint "CHK_Devices_SufficientCurcuitMaximumCurrent". The conflict occurred in database "Test", table "data.Devices".

    The statement has been terminated.

     

     

    As you have seen, the implementation of a very simple and very common business rule is quite involved, because such business rules are not directly supported by the database engine.

     

    When you want to create constraints on indexed views

     

    Even when your database guarantees that “the maximum current of your device cannot exceed the maximum current of the circuit you plug it into”, it is not good enough. Consider the following sample data:

     

    INSERT INTO Data.Devices(DeviceID,

      CurcuitID,

      MaximumCurrent,

      CurcuitMaximumCurrent,

      Description)

    SELECT 2, 1, 15, 25, 'ShopVac';

     

    INSERT INTO Data.Devices(DeviceID,

      CurcuitID,

      MaximumCurrent,

      CurcuitMaximumCurrent,

      Description)

    SELECT 3, 1, 15, 25, 'Miter Saw';

     

    The database structure allows to plug more than one device into a circuit, which is correct, but if you turn both devices on, their combined maximum current exceeds the circuit’s maximum current. To enforce this business rule, it would be natural to create an indexed view, so that the database guarantees that the totals are always correct:

     

    CREATE VIEW Data.TotalMaximumCurrentPerCircuit WITH SCHEMABINDING

    AS

    SELECT d.CurcuitID,

      c.MaximumCurrent AS CircuitMaximumCurrent,

      SUM(d.MaximumCurrent) AS TotalMaximumCurrent,

      COUNT_BIG(*) AS NumDevices

    FROM Data.Devices d JOIN Data.Curcuits c ON d.CurcuitID = c.CurcuitID

    GROUP BY d.CurcuitID, c.MaximumCurrent;

    GO

    CREATE UNIQUE CLUSTERED INDEX Data_TotalMaximumCurrentPerCircuit

    ON Data.TotalMaximumCurrentPerCircuit(CurcuitID);

    GO

     

    If I could create a check constraint on that indexed view, I would be all set:

     

    ALTER VIEW Data.TotalMaximumCurrentPerCircuit

      ADD CONSTRAINT CHK_TotalMaximumCurrentPerCircuit_ValidCurcuit

      CHECK(TotalMaximumCurrent <= CircuitMaximumCurrent)

     

     

    Instead, I need to use triggers or rather contrived kludges. A built in native support for such quite common business rules would increase the usefulness of SQL Server.

    I have not created anything on Connect yet, because the ideas are kind of raw yet. I am very interested in your feedback. What do you think?

     

    This continues the series on denormalizing, which started with this post about working hours and appointments 


    and continued with this post about running totals.

     

    and a post about Storing intervals of time with no overlaps

    Using ROWVERSION to enforce business rules


More Posts Next page »
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement