|
|
|
|
-
I have made up a simple example when a CHECK constraint that uses a UDF should succeed but always fails.
Suppose you need to move some stuff in your car, and you don't want
to load more than 100 pounds into your small car at a time. Here is the
DDL:
CREATE SCHEMA Data AUTHORIZATION dbo;
GO
CREATE SCHEMA Readers AUTHORIZATION dbo;
GO
CREATE TABLE Data.ItemsToMove(
TripNumber INT NOT NULL,
ItemName VARCHAR(50),
ItemWeightInPounds FLOAT
);
GO
CREATE FUNCTION Readers.TotalWeightPerTrip(@TripNumber INT)
RETURNS FLOAT
AS
BEGIN
RETURN (SELECT SUM(ItemWeightInPounds)
FROM Data.ItemsToMove
WHERE TripNumber = @TripNumber);
END
GO
ALTER TABLE Data.ItemsToMove
ADD CONSTRAINT CHK_TotalWeightPerTrip_LessThan100pounds
CHECK (Readers.TotalWeightPerTrip(TripNumber) <= 100);
GO
Loading 90 pounds succeeds, but when you go over the limit, the constraint works:
INSERT INTO Data.ItemsToMove(TripNumber,
ItemName,
ItemWeightInPounds)
SELECT 1, 'Big Box with Clothes', 40
UNION ALL
SELECT 1, 'Lumbar Aeron Chair', 25
UNION ALL
SELECT 1, 'Basic Aeron Chair', 25;
INSERT INTO Data.ItemsToMove(TripNumber,
ItemName,
ItemWeightInPounds)
SELECT 1, 'Small Box with Books', 40;
Msg 547, Level 16, State 0, Line 1 The INSERT statement
conflicted with the CHECK constraint
"CHK_TotalWeightPerTrip_LessThan100pounds". The conflict occurred in
database "Test", table "data.ItemsToMove", column 'TripNumber'. The statement has been terminated.
So, all other items should be moved on the second trip:
INSERT INTO Data.ItemsToMove(TripNumber,
ItemName,
ItemWeightInPounds)
SELECT 2, 'Small Box with Books', 40
UNION ALL
SELECT 2, 'Humidifier', 40;
Now there is another problem: all bulky items are scheduled to move
on the first trip, while small and hevy ones get on the second trip.
Suppose you cannot fit the bulky box with clothes after you loaded your
chairs, and want to move your small humidifier on your first trip
instead:
UPDATE Data.ItemsToMove
SET TripNumber = CASE WHEN ItemName = 'Humidifier' THEN 1
WHEN ItemName = 'Big Box with Clothes' THEN 2
ELSE TripNumber END
WHERE ItemName IN ('Humidifier', 'Big Box with Clothes') AND TripNumber IN(1,2);
The update should succeed, because the box and the humidifier have the same weight. However, the update fails:
Msg 547, Level 16, State 0, Line 1 The UPDATE statement
conflicted with the CHECK constraint
"CHK_TotalWeightPerTrip_LessThan100pounds". The conflict occurred in
database "Test", table "data.ItemsToMove", column 'TripNumber'. The statement has been terminated.
The reason is simple: the database engine modifies just one row and
immediately invokes the CHECK constraint for that row. At that moment
the limit for total load is exceeded no matter which row is modified first - that is the way I made up this
example. You can run the following scripts and see for yourself:
ALTER TABLE Data.ItemsToMove
DROP CONSTRAINT CHK_TotalWeightPerTrip_LessThan100pounds;
BEGIN TRAN
UPDATE Data.ItemsToMove
SET TripNumber = CASE WHEN ItemName = 'Humidifier' THEN 1
ELSE TripNumber END
WHERE ItemName = 'Humidifier' AND TripNumber = 2;
SELECT TripNumber, SUM(ItemWeightInPounds)
FROM Data.ItemsToMove
GROUP BY TripNumber;
ROLLBACK;
BEGIN TRAN
UPDATE Data.ItemsToMove
SET TripNumber = CASE
WHEN ItemName = 'Big Box with Clothes' THEN 2
ELSE TripNumber END
WHERE ItemName = 'Big Box with Clothes' AND TripNumber = 1;
SELECT TripNumber, SUM(ItemWeightInPounds)
FROM Data.ItemsToMove
GROUP BY TripNumber;
ROLLBACK;
At this point the faulty constraint is dropped, so you can rerun your multi-row update:
UPDATE Data.ItemsToMove
SET TripNumber = CASE WHEN ItemName = 'Humidifier' THEN 1
WHEN ItemName = 'Big Box with Clothes' THEN 2
ELSE TripNumber END
WHERE ItemName IN ('Humidifier', 'Big Box with Clothes') AND TripNumber IN(1,2);
Create the constraint again, it will succeed. Verify that the constraint is trusted:
SELECT name, is_not_trusted
FROM sys.check_constraints
WHERE Name='CHK_TotalWeightPerTrip_LessThan100pounds';
Make sure you understand why you need to know if your constraints are trusted.
You can also verify that your data is clean, that the load for every trip does not exceed 100 pounds:
SELECT TripNumber, SUM(ItemWeightInPounds)
FROM Data.ItemsToMove
GROUP BY TripNumber;
|
-
Surely this sounds like common knowledge, yet I decided to post a simple benchmark demonstrating how slow they are - in this case almost 100 times slower than a foreign key. Recently Pure Krome asked a question on stackoverflow about implementing subtypes in the database and the discussion lead to such benchmarking.
The DDL for parent and child tables
Here is the parent table:
CREATE TABLE dbo.Animals
(AnimalId INT NOT NULL IDENTITY PRIMARY KEY,
AnimalType TINYINT NOT NULL, -- 1: Mammal, 2:Reptile, etc..
Name VARCHAR(30))
The child table and the check constraint that guarantees that all rows in the child table refer to Mammals:
CREATE FUNCTION dbo.GetAnimalType(@AnimalId INT)
RETURNS TINYINT
AS
BEGIN
DECLARE @ret TINYINT;
SELECT @ret = AnimalType FROM dbo.Animals
WHERE AnimalId = @AnimalId;
RETURN @ret;
END
GO
CREATE TABLE dbo.Mammals
(AnimalId INT NOT NULL PRIMARY KEY,
SomeOtherStuff VARCHAR(10),
CONSTRAINT Chk_AnimalType_Mammal CHECK(dbo.GetAnimalType(AnimalId)=1)
);
Another approach, copying AnimalType column in the child table, and using a foreign key
CREATE TABLE dbo.Mammals2
(AnimalId INT NOT NULL PRIMARY KEY,
AnimalType TINYINT NOT NULL,
SomeOtherStuff VARCHAR(10),
CONSTRAINT Chk_Mammals2_AnimalType_Mammal CHECK(AnimalType=1),
CONSTRAINT FK_Mammals_Animals FOREIGN KEY(AnimalId, AnimalType)
REFERENCES dbo.Animals(AnimalId, AnimalType)
);
And finally the child table without enforcing any logic with regards to AnimalType:
CREATE TABLE dbo.Mammals3
(AnimalId INT NOT NULL PRIMARY KEY,
SomeOtherStuff VARCHAR(10)
);
Setting up the helper table:
CREATE TABLE dbo.Numbers(n INT NOT NULL PRIMARY KEY)
GO
DECLARE @i INT;
SET @i = 1;
INSERT INTO dbo.Numbers(n) SELECT 1;
WHILE @i<128000 BEGIN
INSERT INTO dbo.Numbers(n)
SELECT n + @i FROM dbo.Numbers;
SET @i = @i * 2;
END;
Benchmarking results
The overhead of a FK was 78 ms
The overhead of a UDF wrapped in CHECK constraint was 7453 ms
Inserting 128K rows with a UDF:
INSERT INTO dbo.Animals
(AnimalType, Name)
SELECT 1, 'some name' FROM dbo.Numbers;
GO
SET STATISTICS IO ON
SET STATISTICS TIME ON
GO
INSERT INTO dbo.Mammals
(AnimalId,SomeOtherStuff)
SELECT n, 'some info' FROM dbo.Numbers;
SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 2 ms. Table 'Mammals'. Scan count 0, logical reads 272135, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Numbers'. Scan count 1, logical reads 441, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times: CPU time = 7750 ms, elapsed time = 7830 ms.
(131072 row(s) affected)
Inserting 128K rows with a FK:
INSERT INTO dbo.Mammals2
(AnimalId,AnimalType,SomeOtherStuff)
SELECT n, 1, 'some info' FROM dbo.Numbers;
SQL Server parse and compile time: CPU time = 93 ms, elapsed time = 100 ms. Table 'Animals'. Scan count 1, logical reads 132, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Mammals2'. Scan count 0, logical reads 275381, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Numbers'. Scan count 1, logical reads 441, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times: CPU time = 375 ms, elapsed time = 383 ms.
The baseline: inserting 128K rows into a table without CHECK or FK:
INSERT INTO dbo.Mammals3
(AnimalId,AnimalType,SomeOtherStuff)
SELECT n, 1, 'some info' FROM dbo.Numbers;
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 66 ms. Table 'Mammals3'. Scan count 0, logical reads 272135, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Numbers'. Scan count 1, logical reads 441, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times: CPU time = 297 ms, elapsed time = 303 ms.
(131072 row(s) affected)
The results of benchmarking are very clear, but the story does not end here, there is more to it:
A UDF wrapped in a CHECK constraint may not work correctly for multi row updates:
Read the following post by Tony Rogerson:
Using a UDF in a CHECK constraint to check validity of History Windows
|
-
I have been practicing database unit testing for two years, and I am
a big fan of it. My test harness is a great help to me - it lets me
change my modules quickly and with confidence. However, having too many
unit tests is counterproductive - they become a burden and slow you
down rather than help you out.Joel Spolski recently wrote a great
article about the overhead of maintaining your unit tests being "disproportional to the amount of benefit that you get out of them".
That article is a must read!
Unit tests are an overhead.
It takes time to develop unit test, and more time to maintain them.
They also slow down working with source control and searches. For example, every
time your search on a stored procedure name in your solution, every
unit test running in shows up in your search results. Yet we go for
this overhead, because we expect the payoff from better code quality
and easier maintenance to compensate for it. Clearly to break even or
benefit from unit tests we need to keep the overhead within reason.
Unit tests take time to run, sometimes too much time.
To be really useful, your unit test harness needs to run quickly -
otherwise it will slow you down. You will either spend too much time
watching it run, or you will not run it as often as you should - if you
break something, you will not notice it right away. Sometimes this
becomes a serious problem with C# or C++ solutions. Because database
tests usually run much much slower, this may become a serious problem
much sooner. In many cases you simply cannot afford to keep all your
database unit test because of the sheer amount of time they run. Alternatively, database unit testing can be completely abandoned because of this slowness. It can be dismissed as yet another approach that sounds good in theory but does not hold up in the real world. Database unit testing does hold up in real life projects, but only if you use common sense and do it properly.
Unit tests may help you out if you need to change your module.
They can be extremely useful - you can recall what the module is
supposed to do, identify an issue, make a change, and retest in just a
few minutes. This is why we need them most of all. So, the more a unit
test is potentially useful, the more you want to keep it long term. And, using the same logic, if a unit test is very unlikely to ever help you out, there is no reason to keep it.
Complex modules need unit tests most of all. Trivial tests and tests for simple modules can be eliminated.
Complex queries need more maintenance, because it takes more time to
understand and change them, and because they are more likely to need
optimization, This is where unit tests come very handy. On the other
hand, trivial queries such as the following are easier to change and
less likely to need optimization later:
CREATE PROCEDURE Readers.SelectCustomerByID
@CustomerID INT
AS
SET NOCOUNT ON;
SELECT CustomerID, FirstName, LastName
FROM Data.Customers
WHERE CustomerID = @CustomerID;
Do you have problems understanding what this procedure does? Do you
expect it will be difficult to change it if needed? Do you think this
procedure needs an explicit unit test?
Do you really need existence tests?
When we write blog posts and articles, our examples must be short
and simple. Running a unit test to determine if your database or your
table exists is a clear and simple example and a great way to learn how
unit tests work - (make sure you have read this article by SQL Server MVP Andy Leonard). However, there is no need to keep such tests in real
production solutions - if a database or a table is dropped, you will
get a clear error message, and you will know what is your problem right away. If a database or a table is dropped, you
will not need an existence unit test to help you out with
troubleshooting, will you?
Make a distinct database call once, and verify everything.
Database call are slow, so you really want to limit the number of
database calls your test harness does. For example, suppose that you
have a complex stored procedure Readers.SelectCustomersByName, and if
you only provide last name, it must return all the customers with that
last name. Because the procedure is complex, you need a unit test for
each distinct case of its usage. Because you want to minimize the
number of database calls, you want to execute the following command
only once:
EXEC Readers.SelectCustomersByName @LastName = 'Larsen';
It is very important to verify everything: number of
result sets, number of columns, their names and types, and all the
columns and rows returned by the procedure. The reason is simple: any
change can be breaking, and you need to determine if you broke
something when you do your changes. For more explanations read this
article: "Close those Loopholes - Testing Stored Procedures" I've used the approach described in that article for two years and it works great. Everything is verified automatically, so I never forget to test anything, and I only need one database call for all my checks, so it runs fast.
Clean up after test driven development.
I like and practice test driven development, however I have the
following reservations regarding it: all-too-often when you think that are done you
have a huge amount of trivial unit tests. That seems to be the whole
idea behind test driven development: every step in your progress is a
unit test. However, there is no need to keep the whole history of your
progress, to keep all the tests that drove your development, for the whole life of your system (each of us humans has had
gills and a tail at some early stage of our development; we don't keep
them forever, we undergo a refactoring). When you are done with your
development, you only need the tests that might help you out in
troubleshooting. Most likely you'll need to consolidate your unit tests
in the way described in the previous chapter. Remember that test driven
development is supposed to consist of the following stages: - Encounter a non-trivial problem.
- Write a unit test that fails.
- Develop some code so that the test passes.
- Refactor - test driven development does not stop when your tests pass!
I think that this refactoring should include not only your code but
unit tests too. Refactoring is done when your remaining unit tests are
lean, so that the overhead of keeping them is justified by the benefits
of having them. Let me repeat myself: from where I sit, test-driven development can be a huge help if you are solving a complex problem. However, it can slow you down if the steps you are taking are too easy for you. I guess I should write up some example of test driven development, but it is getting late, so not this time.
|
-
Problems such as calculating third Wednesday of the month or the last day of the month are very common. Recently Peter Larsson posted a scalar UDF that solves problems such as "third Wednesday of the month", and Uri Dimant added a comment with a solution by Steve Kass. Unfortunately, all these solutions use scalar UDFs and as such are very slow, so I decided to post some very fast inline UDFs that I find very useful.
Composing a date from year, month, and day without checking if the parameters are valid.
The following inline UDF composes a date. I deliberately intend to use it with invalid parameters, such as month equal to zero, because it dramatically simplifies my logic.
CREATE FUNCTION Readers.ComposeDate(@year INT, @month INT, @day INT)
-- I deliberately do not check if the parameters are valid
RETURNS TABLE AS RETURN(
SELECT DATEADD(DAY, @day - 1, DATEADD(MONTH, @month - 1, DATEADD(YEAR, @year - 1901, '19010101')))
AS ComposedDate
)
GO
Calculating the last days of previous and this months.
Because my ComposeDate function handles out of range parameters, I don't need to handle January and December differently. Here you go, create them and see how they work:
CREATE FUNCTION Readers.LastDayOfPreviousMonth(@year INT, @month INT)
RETURNS TABLE AS RETURN(
SELECT ComposedDate AS LastDayOfPreviousMonth
FROM Readers.ComposeDate(@year, @month, 0)
)
GO
SELECT LastDayOfPreviousMonth, '20081231' FROM Readers.LastDayOfPreviousMonth(2009,1)
UNION ALL
SELECT LastDayOfPreviousMonth, '20090131' FROM Readers.LastDayOfPreviousMonth(2009,2)
GO
CREATE FUNCTION Readers.LastDayOfMonth(@year INT, @month INT)
RETURNS TABLE AS RETURN(
SELECT ComposedDate AS LastDayOfMonth
FROM Readers.ComposeDate(@year, @month+1, 0)
)
GO
SELECT LastDayOfMonth FROM Readers.LastDayOfMonth(2009,11)
UNION ALL
SELECT LastDayOfMonth FROM Readers.LastDayOfMonth(2009,12)
GO
Calculating third Wednesday and other similar problems
For simplicity (this is a blog post, not production code), this function does not determine if the returned date is actually in the required month or in another one. If you need such validation, use the approach demonstrated in the next example.
CREATE FUNCTION Readers.MthDayOfNthWeek(@year INT, @month INT, @week INT, @day INT)
-- @day 1 Sun, 2 Mon etc.
--
this function works in US but might need adjustments in other countries RETURNS TABLE AS RETURN(
SELECT t.ComposedDate AS MthDayOfNthWeek
FROM (SELECT ComposedDate
FROM Readers.ComposeDate(@year, @month, 1)) AS d CROSS APPLY Readers.ComposeDate(@year, @month,
@day - DATEPART(dw,ComposedDate) + 1 + 7*(@week + CASE WHEN DATEPART(dw,ComposedDate)>@day THEN 0 ELSE -1 END) ) AS t
)
GO
-- third Wednesday of June 2009
SELECT MthDayOfNthWeek, '20090617' AS ExpectedDate
FROM Readers.MthDayOfNthWeek(2009, 6, 3, 4)
UNION ALL
SELECT MthDayOfNthWeek, '20090715' AS ExpectedDate
FROM Readers.MthDayOfNthWeek(2009, 7, 3, 4)
UNION ALL
SELECT MthDayOfNthWeek, '20090601' AS ExpectedDate
FROM Readers.MthDayOfNthWeek(2009, 6, 1, 2)
UNION ALL
SELECT MthDayOfNthWeek, '20090605' AS ExpectedDate
FROM Readers.MthDayOfNthWeek(2009, 6, 1, 6)
UNION ALL
SELECT MthDayOfNthWeek, '20090606' AS ExpectedDate
FROM Readers.MthDayOfNthWeek(2009, 6, 1, 7)
UNION ALL
SELECT MthDayOfNthWeek, '20090608' AS ExpectedDate
FROM Readers.MthDayOfNthWeek(2009, 6, 2, 2)
This function is not completely trivial, so I used test driven development to come up with it.
Validating dates
If you need to validate dates, use the following approach:
CREATE FUNCTION Readers.ComposeValidDate(@year INT, @month INT, @day INT)
RETURNS TABLE AS RETURN(
SELECT CASE WHEN YEAR(ComposedDate) = @year AND MONTH(ComposedDate) = @month
THEN ComposedDate END AS ComposedDate
FROM Readers.ComposeDate(@year, @month, @day)
)
GO
SELECT ComposedDate, '20090228' AS ExpectedResult FROM Readers.ComposeValidDate(2009, 2, 28)
UNION ALL
SELECT ComposedDate, NULL FROM Readers.ComposeValidDate(2009, 2, 29)
UNION ALL
SELECT ComposedDate, NULL FROM Readers.ComposeValidDate(2009, 2, 0)
UNION ALL
SELECT ComposedDate, NULL FROM Readers.ComposeValidDate(2009, 0, 12)
Only the first result will validate.
Read more about performance of inline UDFs in "Reuse Your Code with Table-Valued UDFs" and "Many nested inline UDFs are very fast"
|
-
It is well known that nested calls of scalar UDFs are a huge drag on performance, but in many cases inline UDFs can be nested without performance penalty. Here is a simple repro that you can run and see for yourself. Suppose that you already have two inline UDFs: one returning the previous calendar day and another returning the first day of the month, as follows:
CREATE FUNCTION Readers.GetFirstDayOfMonth(@d DATETIME)
RETURNS TABLE AS RETURN(
SELECT DATEADD(MONTH, DATEDIFF(MONTH, '19900101', @d), '19900101') AS FirstDayOfMonth
)
GO
DROP FUNCTION Readers.GetPreviousDay
CREATE FUNCTION Readers.GetPreviousDay(@d DATETIME)
RETURNS TABLE AS RETURN (
SELECT DATEADD(DAY, DATEDIFF(DAY, '19900101', @d) - 1, '19900101') AS PreviousDay
)
GO
Suppose that you need to develop a UDF returning the last day of the previous month. You can develop it from scratch:
CREATE FUNCTION Readers.GetLastDayOfPreviousMonth_Merged(@d DATETIME)
RETURNS TABLE AS RETURN(
SELECT DATEADD(DAY, DATEDIFF(DAY, '19900101', DATEADD(MONTH, DATEDIFF(MONTH, '19900101', @d), '19900101')) - 1, '19900101')
AS LastDayOfPreviousMonth
)
Alternatively you can reuse your two existing UDFs in several ways, as follows:
CREATE FUNCTION Readers.GetLastDayOfPreviousMonth_Nested1(@d DATETIME)
RETURNS TABLE AS RETURN(
SELECT p.PreviousDay AS LastDayOfPreviousMonth
FROM Readers.GetFirstDayOfMonth(@d) AS f
CROSS APPLY Readers.GetPreviousDay(f.FirstDayOfMonth) AS p
)
GO
CREATE FUNCTION Readers.GetLastDayOfPreviousMonth_Nested2(@d DATETIME)
RETURNS TABLE AS RETURN(
SELECT p.PreviousDay AS LastDayOfPreviousMonth
FROM (SELECT FirstDayOfMonth FROM Readers.GetFirstDayOfMonth(@d) AS f) AS f
CROSS APPLY Readers.GetPreviousDay(f.FirstDayOfMonth) AS p
)
To benchmark, I used my Data.Numbers table whihc has 1 million rows. You can use any large enough table.
SET STATISTICS TIME ON
SET STATISTICS IO ON
GO
-- this table has 1 million rows SELECT COUNT(*) FROM Data.Numbers
DECLARE @d DATETIME, @d1 DATETIME;
SET @d = '20090512';
SELECT @d1=LastDayOfPreviousMonth --INTO #t1
FROM Data.Numbers
CROSS APPLY Readers.GetLastDayOfPreviousMonth_Merged(@d);
GO
DECLARE @d DATETIME, @d1 DATETIME;
SET @d = '20090512';
SELECT @d1=LastDayOfPreviousMonth --INTO #t1
FROM Data.Numbers
CROSS APPLY Readers.GetLastDayOfPreviousMonth_Nested1(@d);
GO
DECLARE @d DATETIME, @d1 DATETIME;
SET @d = '20090512';
SELECT @d1=LastDayOfPreviousMonth --INTO #t1
FROM Data.Numbers
CROSS APPLY Readers.GetLastDayOfPreviousMonth_Nested2(@d);
GO
-- this measures the overhead of benchmarking,
-- scanning the table and assigning a value to a variable
DECLARE @d DATETIME, @d1 DATETIME;
SET @d = '20090512';
SELECT @d1=@d --INTO #t1
FROM Data.Numbers
I did not notice any difference in performance beyond usual noise. Of course, in some very complex cases performance of nested inline UDFs might suffer, but in general this is not the case and usually you can reuse your inline UDFs without performance penalty.
My previous post about inline UDFs is this:
Reuse Your Code with Table-Valued UDFs
|
-
When your query must always return in a certain amount of time, trusting the optimizer might not be the best choice. Suppose your select must return in one second no matter what, and it usually returns in 50 ms, but sometimes, once in 100K calls, it takes full five seconds to complete, and clearly such query does not meet the requirement to return in one second. Sounds familiar? Sure, optimizers do behave like this from time to time. It is very tempting to use hints or even plan freezing. However, neither hints nor plan freezing are long term maintenance-free solutions, and the reason is simple: what today looks like a great plan might at some later time become counterproductive.
In cases like this
index covering really shines.
Scanning a range of a covering index is a very simple and efficient plan. Even better, it is one and the same plan for all possible combinations of parameters, so you may not expect unpleasant surprises from a confused optimizer. Of course, if the amount of data increases, the response time may increase too, but you may monitor it and react accordingly if needed. If you do not load huge amounts of data, your response time will not jump from 50 ms to five seconds any time soon, as long as the optimizer chooses the same range scan over and over again.
Responding to the
recent post by SQL server MVP Tibor Karaszi, I think that not all queries are born equal - some must respond in a certain time even if that slows down other activity on the server. This means that in some cases it is OK if index covering slows down your modifications, provided that the modifications have lower priority.
|
-
In my previous post I described how correlation between columns may confuse the optimizer and cause it to choose an inefficient plan. I also recommended index covering as the most robust way to optimize queries that involve correlated columns.
However, there are quite a few cases when correlated columns are in different tables. For example, employees' home zipcodes and their cell phone area codes may be strongly correlated. Similarly, car models and kinds of repair work for those cars may be strongly correlated too.
In such cases indexed views come very handy, because they can allow the optimizer to satisfy such queries with one and the same plan, the range scan, regardless of the expected cardinality. Even if the optimizer's estimated cardinality is wildly wrong, it does not matter if the execution plan is still the same range scan.
Of course, indexed views come with a rather hefty price tag,
and in some cases you may have to drop them altogether.
|
-
When two columns are correlated, it may fool the optimizer and cause it to choose a wrong plan. Here is a simple script that demonstrates it. The following script creates a table and populates some sample data: CREATE TABLE dbo.Vehicles(ID INT NOT NULL CONSTRAINT PK_Vehicles PRIMARY KEY,
Make VARCHAR(10),
Model VARCHAR(10),
SomeOtherData CHAR(50)
);
GO
DECLARE @i INT;
SET NOCOUNT ON;
SET @i=0;
WHILE @i<100000 BEGIN
INSERT dbo.Vehicles(ID, Make, Model, SomeOtherData)
SELECT @i,
CASE WHEN @i%100 < 2 THEN 'Toyota' ELSE 'M'+CAST(@i AS VARCHAR(6)) END,
CASE WHEN @i%100 = 0 THEN 'Camry'
WHEN @i%100 = 1 THEN 'Corolla'
ELSE 'M'+CAST(@i AS VARCHAR(6)) END,
'Some Data'
SET @i = @i + 1;
END
GO
CREATE INDEX Vehicles_Make ON dbo.Vehicles(Make);
GO
The following query chooses to scan the whole clustered index, because it considers the predicate Make='Toyota' AND Model='Camry' not selective enough to justify bookmark lookups.
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
SELECT ID, Make, Model, SomeOtherData
FROM dbo.Vehicles
WHERE Make='Toyota'
AND Model='Camry';
Note: while this is true on my server, nobody can guarantee that the optimizer will behave in the same way for all editions and versions.
Clearly make and model of a car are strongly correlated - when you know car's model, you almost always know its make too. 2% of vehicles in the table are Toyotas, 1% are Camrys, and 1% are Toyota Camrys. If make and model were independent, there would be 0.02*0.01=0.0002=0.02% Toyota Camrys. Of course this is not the case, but let us see what happens when you create another index and rerun the same select: CREATE INDEX Vehicles_Model ON dbo.Vehicles(Model);
The select now runs slower, and makes three times more reads that the clustered index scan. The reason is simple - creating another index comes with statistics on Model column. When I looked at the execution plan, I saw that the optimizer expected 20 rows, which is precisely 0.02%.In fact, there are 1000 Toyota Camry - the optimizer's estimate is off the mark. What happened?
In this case, for this version and edition, the optimizer assumed that make and model are independent, decided that the search criteria (Make='Toyota' AND Model='Camry') is very selective, and chosen nested loops. Let me repeat myself - that might or might not be the case if you rerun my scripts on your system.
Over time the optimizer can and does change. in my opinion the most robust way to optimize such queries is to create covering indexes, so that the plan stays the same regardless of selectivity estimates. Of course there are other approaches, which might be better in some other cases.
Note: If you are still reading this, you might also want to read about DATE_CORRELATION_OPTIMIZATION setting in BOL.
|
-
In all-too-many cases an indexed view may solve your short term performance goals but at some later time become counterproductive. So if you choose to use an indexed view, you may need an exit strategy. Let me describe a few common problems with indexed views.
Indexed views may increase lock contention.
It is very easy to demonstrate. Create the following table:
CREATE TABLE dbo.ChildTable(ChildID INT NOT NULL
CONSTRAINT PK_ChildTable PRIMARY KEY,
ParentID INT NOT NULL,
Amount INT NOT NULL);
GO
From one tab in SSMS, run this script:
BEGIN TRAN;
INSERT INTO dbo.ChildTable(ChildID, ParentID, Amount)
VALUES(1,1,1);
From another tab, run a similar one:
BEGIN TRAN;
INSERT INTO dbo.ChildTable(ChildID, ParentID, Amount)
VALUES(2,1,1);
ROLLBACK;
Note that both inserts complete, they do not block each other. Rollback in both tabs, and create an indexed view:
CREATE VIEW dbo.ChildTableTotals WITH SCHEMABINDING
AS
SELECT ParentID,
COUNT_BIG(*) AS ChildRowsPerParent,
SUM(Amount) AS SumAmount
FROM dbo.ChildTable
GROUP BY ParentID;
GO
CREATE UNIQUE CLUSTERED INDEX ChildTableTotals_CI
ON dbo.ChildTableTotals(ParentID);
Rerun the two inserts. Note that the second one does not complete; it is blocked. The reason is very simple: the first insert modifies the corresponding entry in the indexed view, so the insert acquires and holds a lock on it.
It is just as easy to demonstrate that when you create an indexed view, deadlocks may become more likely too.
Note: this is not a problem with the way indexed views are implemented. If you roll out your own summary table, and develop triggers which directly modify it to keep it up-to-date, you will encounter the same problem. Only if you don't maintain your summary table all the time, you can get around this locking problem, but a more detailed discussion of this is beyond the scope of this post.
Also make sure you have read an excellent post by Tony Rogerson:
Indexed views on joins may become counterproductive.
Create the following table and another indexed view:
CREATE TABLE dbo.ParentTable(ParentID INT NOT NULL
CONSTRAINT PK_ParentTable PRIMARY KEY,
WideData CHAR(1000) NOT NULL);
GO
CREATE VIEW dbo.ParentTableWithAmounts WITH SCHEMABINDING
AS
SELECT p.ParentID, p.WideData, c.ChildID, c.Amount
FROM dbo.ParentTable AS p JOIN dbo.ChildTable AS c
ON p.ParentID = c.ParentID;
GO
CREATE UNIQUE CLUSTERED INDEX ParentTableWithAmounts_CI
ON dbo.ParentTableWithAmounts(ChildID);
Suppose that originally you have an average one child row per parent on, and that selecting from this indexed view is faster than joining two tables - that's why you created it in the first place. However, if on average you have 10K child rows per parent one, your indexed view becomes counterproductive. Let's add 10K child rows:
INSERT INTO dbo.ParentTable(ParentID, WideData)
VALUES(1,'asdf');
SET NOCOUNT ON;
DECLARE @i INT;
SET @i=10000;
WHILE @i<20000 BEGIN
INSERT INTO dbo.ChildTable(ChildID, ParentID, Amount)
VALUES(@i,1,1);
SET @i=@i+1;
END
Let us select from this indexed view:
SELECT ParentID, WideData, ChildID, Amount
FROM dbo.ParentTableWithAmounts;
The optimizer has chosen not to use the indexed view, and the execution costs are as follows:
Table 'ChildTable'. Scan count 1, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'ParentTable'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
If you force the optimizer to use the indexed view, the execution costs are dramatically higher:
SELECT ParentID, WideData, ChildID, Amount
FROM dbo.ParentTableWithAmounts WITH(NOEXPAND);
Table 'ParentTableWithAmounts'. Scan count 1, logical reads 1435, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
As you have seen, when the parent table is wide and there are many child rows per parent, the indexed view becomes counterproductive, and the optimizer is able to recognize the fact.
Also note that this indexed view increases lock contention just as the previous one did. You can try to update a parent row and one of its child rows and see for yourself.
Exit strategy for NOEXPAND hint
If your indexed view becomes counterproductive, you may consider dropping it altogether. However, dropping the index view will break all those queries with NOEXPAND hint:
DROP INDEX dbo.ParentTableWithAmounts.ParentTableWithAmounts_CI;
GO
SELECT ParentID, WideData, ChildID, Amount
FROM dbo.ParentTableWithAmounts WITH(NOEXPAND);
Msg 8171, Level 16, State 2, Line 1 Hint 'noexpand' on object 'dbo.ParentTableWithAmounts' is invalid.
I would recommend to be prepared for such possibility, to wrap all your selects using NOEXPAND in stored procedures, for example:
CREATE PROCEDURE dbo.SelectParentTableWithAmounts
AS
SELECT ParentID, WideData, ChildID, Amount
FROM dbo.ParentTableWithAmounts WITH(NOEXPAND);
and to provide a rollback script which alters these procedures, as follows:
DROP INDEX dbo.ParentTableWithAmounts.ParentTableWithAmounts_CI;
GO
ALTER PROCEDURE dbo.SelectParentTableWithAmounts
AS
SELECT ParentID, WideData, ChildID, Amount
FROM dbo.ParentTableWithAmounts;
You can add error handling to this script, so that either both changes deploy or none does. Make sure to test this script. If you have unit tests, include this scenario in your test harness. Of course there are other approaches, but my main point is that you need to be aware that your indexed view can be dropped, and have a working exit strategy.
Performance considerations for your exit strategy
We have just discussed how to make sure that your application does not break, but what about the performance? Clearly the performance of your selects may plunge, what can be done about it? In many cases, index covering gives you acceptable performance without too much lock contention. Also sometimes you want to roll out your own summary tables, but that sounds like a topic for another post.
|
-
Sounds trivial? Right, but different flavors of this myth still persist. Yesterday I noticed a thread on stackoverflow discussing SELECT TOP without a where clause, where it was suggested that "The order will be defined based on the clustered key in that table." http://stackoverflow.com/questions/882195/what-does-top-1-mean-in-an-sql-query/882690#882690
Because apparently many visitors agreed with this myth, I decided to post a repro script which demonstrates that this is simply not true. Here you go: CREATE SCHEMA Data AUTHORIZATION dbo
GO
CREATE TABLE Data.Numbers(Number INT NOT NULL PRIMARY KEY)
GO
DECLARE @ID INT;
SET NOCOUNT ON;
SET @ID = 1;
WHILE @ID < 100000 BEGIN
INSERT INTO Data.Numbers(Number)
SELECT @ID;
SET @ID = @ID+1;
END
CREATE TABLE Data.WideTable(ID INT NOT NULL
CONSTRAINT PK_WideTable PRIMARY KEY,
RandomInt INT NOT NULL,
CHARFiller CHAR(1000))
GO
CREATE VIEW dbo.WrappedRand
AS
SELECT RAND() AS random_value
GO
CREATE ALTER FUNCTION dbo.RandomInt()
RETURNS INT
AS
BEGIN
DECLARE @ret INT;
SET @ret = (SELECT random_value*1000000 FROM dbo.WrappedRand);
RETURN @ret;
END
GO
INSERT INTO Data.WideTable(ID,RandomInt,CHARFiller)
SELECT Number, dbo.RandomInt(), 'asdf'
FROM Data.Numbers
GO
CREATE INDEX WideTable_RandomInt ON Data.WideTable(RandomInt)
GO
SELECT TOP 100 ID FROM Data.WideTable
1407 253 9175 6568 4506 1623 581 As you have seen, the optimizer has chosen to use a non-clustered index to stisfy this SELECT TOP query.
|
-
-
I ran a quick check, and it looks like a reasonably even distribution to me. I genenerated 2M NewId values, as follows: SELECT NewID() AS RandomColumn INTO #t FROM Data.TableWith2Mrows I looked at the distribution into 256 buckets, considering the first byte of the value only. All 256 buckets had values:
SELECT COUNT(*) AS Cnt FROM ( SELECT SUBSTRING(RandomColumn,1,1) AS FirstByte, COUNT(*) AS Frequency FROM (SELECT CAST(RandomColumn AS BINARY(16)) AS RandomColumn FROM #t) AS t GROUP BY SUBSTRING(RandomColumn,1,1) ) AS Distribution ----------- 256 and the number of values in all the buckets was between 7200 and 7700: SELECT COUNT(*) AS Cnt, Frequency FROM ( SELECT SUBSTRING(RandomColumn,1,1) AS FirstByte, COUNT(*) AS Frequency FROM (SELECT CAST(RandomColumn AS BINARY(16)) AS RandomColumn FROM #t) AS t GROUP BY SUBSTRING(RandomColumn,1,1) ) AS Distribution GROUP BY Frequency ORDER BY Frequency After I had posted this, Simon Sabin told me that he also posted a similar script. He did it four years ago. Somehow I failed to google it up: http://sqlblogcasts.com/blogs/simons/archive/2005/06/13/Random-number-on-a-per-row-basis.aspx
|
-
Some T-SQL code is written under the assumption that either
a TRY block successfully completes or a CATCH block is invoked. Most likely,
this is the case. However, there is a third, although rare, possibility – the
TRY block may fail, and the CATCH one is bypassed. Let me provide some
examples. I do not intend to provide a comprehensive list of all such cases, I
only want to demonstrate that sometimes CATCH blocks are bypassed. Also I would
like to emphasize that in almost all the cases CATCH blocks do catch errors,
and the exceptions, if any, are quite rare.
KILL command and
timeout (aka attention) both stop execution without invoking CATCH blocks.
As a result of a KILL or an attention, the execution stops
immediately. You can run the following script, cancel the query or kill it from
another tab, and see for yourself:
SELECT @@SPID;
GO
BEGIN TRY
PRINT 'Before WAITFOR';
WAITFOR DELAY
'00:35:00';
PRINT 'After WAITFOR';
END TRY
BEGIN CATCH
SELECT 'Beginning CATCH block';
SELECT
ERROR_NUMBER(), ERROR_MESSAGE();
END CATCH
PRINT 'At the end of the same batch';
GO
PRINT 'Next batch';
You can also invoke it from ADO.Net, and you can also
decrease CommandTimeout, so that the timeout occurs faster.
As you have seen, you can only catch such errors on the
client.
Note: timeout and
lock timeout are different. Lock timeout errors are caught by CATCH blocks.
Some compile
errors cancel batch execution.
See for yourself:
BEGIN TRY
PRINT 'Beginning TRY';
BEGIN TRAN
INSERT data.SomeData(ID, AnotherID) VALUES(-1, -2);
-- make sure #t does not exists for your connection
DELETE FROM #t;
COMMIT;
PRINT 'Finishing TRY';
END TRY
BEGIN CATCH
SELECT 'Beginning CATCH block';
SELECT
ERROR_NUMBER(), ERROR_MESSAGE();
ROLLBACK;
END CATCH
PRINT 'At the end of the same batch';
GO
PRINT 'Next batch';
Beginning TRY
(1 row(s) affected)
Msg 208, Level 16, State 0,
Line 6
Invalid object name '#t'.
Next batch
This problem is more likely to happen if you work with
temporary tables. However, because at
the time of this writing we have deferred name resolution, you can get this
problem with a permanent object too. Surprisingly enough, if you wrap this
batch in a stored procedure, and invoke your procedure from another TRY block,
the second TRY block does catch the error. You can create the procedure and see
for yourself:
CREATE PROCEDURE dbo.TestProc
AS
BEGIN TRY
PRINT 'Beginning TRY';
DELETE FROM #t;
--INSERT
data.SomeData(j) SELECT 1;
PRINT 'Finishing TRY';
END TRY
BEGIN CATCH
SELECT 'Beginning CATCH block';
SELECT
ERROR_NUMBER(), ERROR_MESSAGE();
END CATCH
PRINT 'At the end of the same batch';
GO
BEGIN TRY
PRINT 'Beginning TRY';
EXEC dbo.TestProc;
PRINT 'Finishing TRY';
END TRY
BEGIN CATCH
SELECT 'Beginning CATCH block';
SELECT
ERROR_NUMBER(), ERROR_MESSAGE();
END CATCH
PRINT 'At the end of the same batch';
In my opinion it would be preferable not to start executing
a batch if there are compilation errors, and in most cases exactly this is happening:
BEGIN TRY
PRINT 'Beginning TRY';
DECLARE @i INT;
INSERT data.SomeData(j) SELECT 1;
PRINT 'Finishing TRY';
END TRY
BEGIN CATCH
SELECT 'Beginning CATCH block';
SELECT
ERROR_NUMBER(), ERROR_MESSAGE();
END CATCH
PRINT 'At the end of the same batch';
GO
PRINT 'Next batch';
Msg 207, Level 16, State 1,
Line 4
Invalid column name 'j'.
Next batch
If you are still reading this, also make sure you have read
Erland Sommarskog’s articles on error handling:
http://www.sommarskog.se/error-handling-I.html
http://www.sommarskog.se/error-handling-II.html
and STRICT_CHECKS:
http://www.sommarskog.se/strict_checks.html
|
-
There are three kinds of triggers: those which blow up and
those which fail silently ;). Seriously, there are three all-too-common kinds
of problems related to triggers:
- ·
Triggers can fail to work as expected
- ·
Triggers can be bypassed
- ·
Triggers can expose problems in other code, such
as T-SQL code written under the assumption that there are no triggers
Fortunately there already are lots of good articles and
posts written on this subject, so I don’t have to describe it all by myself.
Still, there is a lot to be said, and there is no way I could squeeze it all
into a single post. As usual in this series, I will concentrate only on
correctness of the results, I will not consider the performance. Also I do not
intend to list all possible kinds of problems, but only the ones about which I know
and consider them worth mentioning. As usual, comments and additions are welcome.
Disclaimer: don’t get me wrong – I do not object to using triggers,
I use them myself in my system. However, if you do not use triggers properly,
you can expose your system to several potential problems. So you need to be
aware of these potential problems when you develop robust triggers.
Triggers that fail to work as
expected
Triggers should
not assume that statements can modify only one row at a time
This may sound trivial, but this seems to be the most common
mistake. You can skip this chapter if you are already familiar with the
problem.
Some triggers are coded under the assumption that only one
row can be inserted/updated/deleted at a time. Such assumptions are usually
unfounded. In fact, I have never seen a genuine business requirement
prohibiting multi-row changes. However, apparently the most common problem with
beginners’ triggers is their (triggers) inability to handle multi-row
modifications. For example, consider the following table and a trigger which is
supposed to populate it:
CREATE TABLE ChangeLogs.SomeDataLog(ID INT, Change VARCHAR(30),
ChangedBy VARCHAR(30), ChangeTime DATETIME);
GO
CREATE TRIGGER SomeData_Ins ON
Data.SomeData
FOR INSERT
AS
DECLARE @ID INT;
SET @ID = (SELECT ID FROM
INSERTED);
INSERT INTO ChangeLogs.SomeDataLog(ID, Change,
ChangedBy,
ChangeTime)
VALUES(@ID, 'Inserted', SUSER_NAME(), Getdate());
Clearly this trigger cannot handle multi-row inserts:
DELETE FROM data.SomeData;
INSERT INTO data.SomeData(ID, AnotherID)
SELECT 1,2 UNION ALL SELECT 3, 4;
Msg 512, Level 16, State 1, Procedure SomeData_Ins,
Line 5
Subquery
returned more than 1 value. This is not permitted when
the subquery follows =, !=, <, <=
, >, >= or when the subquery is
used as an expression.
The
statement has been terminated.
Although the trigger blew up, at least you have received an
error message indicating that you have a problem. The following trigger will
silently fail, logging just one insert:
CREATE TRIGGER SomeData_Ins ON
Data.SomeData
FOR INSERT
AS
DECLARE @ID INT;
SELECT @ID = ID FROM INSERTED;
INSERT INTO ChangeLogs.SomeDataLog(ID, Change,
ChangedBy,
ChangeTime)
VALUES(@ID, 'Inserted', SUSER_NAME(), Getdate());
Apparently this is a very common, although trivial, mistake.
The correct solution is very simple:
CREATE TRIGGER SomeData_Ins ON
Data.SomeData
FOR INSERT
AS
INSERT INTO ChangeLogs.SomeDataLog(ID, Change,
ChangedBy,
ChangeTime)
SELECT ID, 'Inserted', SUSER_NAME(), Getdate() FROM INSERTED;
Understanding
RECURSIVE_TRIGGERS setting and using it properly
If RECURSIVE_TRIGGERS setting is ON, then if your trigger
modifies the same table, those modifications will also fire triggers. This can
be used to modify data recursively. For example, consider the following table
and sample data:
CREATE TABLE Data.Nodes(NodeID INT NOT NULL,
ParentID INT NULL,
Status VARCHAR(20));
GO
INSERT INTO Data.Nodes(NodeID,
ParentID,
Status)
SELECT 1, NULL, 'Sleeping' UNION ALL
SELECT 2, 1, 'Sleeping' UNION ALL
SELECT 3, 2, 'Sleeping';
GO
Suppose that if a node becomes active, all its ancestors
must also become active. The following trigger attempts to implement this
business rule:
CREATE TRIGGER Nodes_Upd ON
Data.Nodes
FOR UPDATE
AS
IF EXISTS(SELECT 1 FROM INSERTED) BEGIN
UPDATE Data.Nodes SET Status = 'Active'
WHERE
NodeID IN(SELECT
ParentID FROM INSERTED WHERE
Status = 'Active');
END
For hierarchies without cycles, and less than 32 levels
deep, this approach works, provided that RECURSIVE_TRIGGERS setting is ON:
ALTER DATABASE Test SET
RECURSIVE_TRIGGERS ON;
UPDATE Data.Nodes SET Status = 'Sleeping';
UPDATE Data.Nodes SET Status = 'Active'
WHERE NodeID = 3;
SELECT NodeID, Status FROM Data.Nodes
NodeID Status
-----------
--------------------
1 Active
2 Active
3 Active
However, if you use
this approach, you need to ensure that:
·
RECURSIVE_TRIGGERS setting is ON
·
Your data does not have cycles
·
Your hierarchies are less than 32 levels deep
The first condition
is obvious. The second one needs an example. Consider the following hierarchy
with a cycle:
INSERT INTO Data.Nodes(NodeID,
ParentID,
Status)
SELECT 4, 5, 'Sleeping' UNION ALL
SELECT 5, 4, 'Sleeping';
UPDATE Data.Nodes SET Status = 'Active'
WHERE NodeID = 5;
As one node becomes active, the trigger begins an infinite
loop and hits the 32-level limit:
INSERT INTO Data.Nodes(NodeID,
ParentID,
Status)
SELECT 4, 5, 'Sleeping' UNION ALL
SELECT 5, 4, 'Sleeping';
UPDATE Data.Nodes SET Status = 'Active'
WHERE NodeID = 5;
Msg 217, Level 16, State 1, Procedure Nodes_Upd,
Line 5
Maximum
stored procedure,
function, trigger, or view nesting level exceeded (limit
32).
You can easily fix the trigger so that it tolerates cycles,
as follows:
DROP TRIGGER Data.Nodes_Upd;
GO
CREATE TRIGGER Nodes_Upd ON
Data.Nodes
FOR UPDATE
AS
IF EXISTS(SELECT 1 FROM INSERTED) BEGIN
UPDATE Data.Nodes SET Status = 'Active'
WHERE
NodeID IN(SELECT
ParentID FROM INSERTED WHERE
Status = 'Active')
AND Status <> 'Active';
END
However, the best practice is to ensure the integrity of
your data rather than to develop workarounds because your data has problems.
This is why I am suggesting that the data should have no cycles. Describing how
to accomplish that is beyond the scope of this post.
Finally, let me provide a short example demonstrating how
this trigger blows up if the hierarchy is more than 32 levels deep:
TRUNCATE TABLE Data.Nodes;
SET NOCOUNT ON;
DECLARE @ID INT;
SET @ID = 100;
WHILE @ID<200 BEGIN
INSERT INTO Data.Nodes(NodeID,
ParentID,
Status)
SELECT @ID, CASE WHEN @ID=100 THEN NULL ELSE @ID-1 END, 'Sleeping';
SET @ID = @ID + 1;
END
UPDATE Data.Nodes SET Status = 'Active'
WHERE NodeID = 199;
Msg 217, Level 16, State 1, Procedure Nodes_Upd,
Line 4
Maximum stored procedure, function, trigger, or view nesting level exceeded (limit
32).
As you have seen, you need to be very careful when you work
with recursive triggers.
In 2005 and later
versions, triggers should take in account snapshot isolation
SQL Server MVP Hugo Kornelis has demonstrated the problem:
http://sqlblog.com/blogs/hugo_kornelis/archive/2006/07/26/Snapshot-and-integrity-part-2.aspx
and posted a
workaround:
http://sqlblog.com/blogs/hugo_kornelis/archive/2006/08/25/snapshot-isolation-a-threat-for-integrity-part-3.aspx
In 2008, triggers
should not rely on @@ROWCOUNT the way they could in 2005 and before
Recently SQL Server
MVP Steve Kass described the problem here:
http://sqlblog.com/blogs/steve_kass/archive/2009/04/29/It_2700_s-2008-and-rowcount-ain_2700_t-what-it-used-to-be.-Don_2700_t-use-it-in-triggers_2100_.aspx
Triggers can
render transactions doomed, making it impossible to rollback to a savepoint
I have demonstrated the problem in this post:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2008/11/15/avoid-mixing-old-and-new-styles-of-error-handling.aspx
Another trigger
can undo your trigger’s work
If you have multiple triggers which are created on the same
table and which fire on the same operation, one trigger may overwrite another
trigger’s modifications. For instance, the following trigger will deliberately
undo the changes done by the previous trigger (SomeData_Ins):
CREATE TRIGGER Undo_Changes_By_SomeData_Ins ON Data.SomeData
FOR INSERT
AS
DELETE FROM ChangeLogs.SomeDataLog
WHERE ID IN(
SELECT ID FROM
INSERTED);
Usually this problem does not manifest itself so clearly as in
this obvious example.
In my next post I will describe some situations when
triggers are bypassed and when they expose problems in other T-SQL code.
This post continues my series on defensive database programming. Here are
my previous posts from the series:
Summarizing previous posts about defensive database programming
Defensive database programming: SET vs. SELECT.
Stress
testing UPSERTs
Defensive
database programming: fun with UPDATE.
Defensive
database programming: eliminating IF statements.
Defensive
database programming: fun with changing column widths.
Avoid
mixing old and new styles of error handling.
Defensive
database programming: adding ESCAPE clauses.
Defensive
database programming: qualifying column names.
Defensive
database programming: rewriting queries with NOT IN().
|
-
We can tuck any logical expressions in CHECK constraints,
but all foreign keys are currently capable of right now is to verify that one
or more column values are equal. In many cases it would be very useful to have
foreign keys use more complex logical expressions. Also in some cases the
ability to create constraints on indexed views would be very handy too. I will
provide examples and hopefully we will come up with Connect items to vote for.
When more complex logical expressions in foreign keys are
useful
Consider the following simple common sense rule: the maximum
current of your device cannot exceed the maximum current of the circuit you
plug it into. Suppose that the following tables store data about circuits and
devices:
CREATE TABLE
Data.Curcuits(CurcuitID
INT NOT NULL
CONSTRAINT
PK_Curcuits PRIMARY KEY,
MaximumCurrent INT
NOT NULL,
Description VARCHAR(100) NOT NULL);
GO
INSERT INTO
Data.Curcuits(CurcuitID,
MaximumCurrent,
Description)
SELECT 1,
25, 'Deck and Garage';
GO
CREATE TABLE
Data.Devices(DeviceID
INT NOT NULL
CONSTRAINT
PK_Devices PRIMARY KEY,
CurcuitID INT
NULL,
MaximumCurrent INT
NOT NULL,
Description VARCHAR(100) NOT NULL,
CONSTRAINT
FK_Devices_Curcuits FOREIGN KEY(CurcuitID)
REFERENCES
Data.Curcuits(CurcuitID)
);
GO
It would be very convenient to issue a simple command and
implement this business rule:
ALTER TABLE
Data.Devices ADD
CONSTRAINT FK_Devices_Curcuits
FOREIGN KEY(CurcuitID, MaximumCurrent)
REFERENCES Data.Curcuits(CurcuitID,
MaximumCurrent)
MATCH ON((Data.Devices.CurcuitID = Data.Curcuits.CurcuitID) AND
(Data.Devices.MaximumCurrent
<= Data.Curcuits.MaximumCurrent));
However, it is not supported, so I need to use a workaround,
one more column and three constraints instead of one, as follows:
ALTER TABLE
Data.Curcuits
ADD CONSTRAINT UNQ_Curcuits UNIQUE(CurcuitID,
MaximumCurrent);
GO
ALTER TABLE
Data.Devices ADD
CurcuitMaximumCurrent INT NULL;
GO
ALTER TABLE
Data.Devices DROP
CONSTRAINT FK_Devices_Curcuits;
GO
ALTER TABLE
Data.Devices ADD
CONSTRAINT FK_Devices_Curcuits
FOREIGN KEY(CurcuitID, CurcuitMaximumCurrent)
REFERENCES
Data.Curcuits(CurcuitID, MaximumCurrent)
ON UPDATE CASCADE;
GO
ALTER TABLE
Data.Devices
ADD CONSTRAINT
CHK_Devices_SufficientCurcuitMaximumCurrent
CHECK(CurcuitMaximumCurrent >=
MaximumCurrent);
GO
You can verify that the constraints work:
INSERT INTO
Data.Devices(DeviceID,
CurcuitID,
MaximumCurrent,
CurcuitMaximumCurrent,
Description)
SELECT 1,
1, 50, 25, 'Electric car charger'
Msg 547, Level 16, State 0, Line 1
The INSERT
statement conflicted with the CHECK constraint
"CHK_Devices_SufficientCurcuitMaximumCurrent".
The conflict occurred in database "Test",
table "data.Devices".
The statement has been
terminated.
As you have seen, the implementation of a very simple and
very common business rule is quite involved, because such business rules are
not directly supported by the database engine.
When you want to create constraints on indexed views
Even when your database guarantees that “the maximum current
of your device cannot exceed the maximum current of the circuit you plug it into”,
it is not good enough. Consider the following sample data:
INSERT INTO
Data.Devices(DeviceID,
CurcuitID,
MaximumCurrent,
CurcuitMaximumCurrent,
Description)
SELECT 2,
1, 15, 25, 'ShopVac';
INSERT INTO
Data.Devices(DeviceID,
CurcuitID,
MaximumCurrent,
CurcuitMaximumCurrent,
Description)
SELECT 3, 1,
15, 25, 'Miter Saw';
The database structure allows to plug more than one device
into a circuit, which is correct, but if you turn both devices on, their
combined maximum current exceeds the circuit’s maximum current. To enforce this
business rule, it would be natural to create an indexed view, so that the
database guarantees that the totals are always correct:
CREATE VIEW
Data.TotalMaximumCurrentPerCircuit WITH SCHEMABINDING
AS
SELECT d.CurcuitID,
c.MaximumCurrent
AS CircuitMaximumCurrent,
SUM(d.MaximumCurrent) AS
TotalMaximumCurrent,
COUNT_BIG(*) AS NumDevices
FROM Data.Devices d JOIN Data.Curcuits c ON d.CurcuitID = c.CurcuitID
GROUP BY
d.CurcuitID, c.MaximumCurrent;
GO
CREATE UNIQUE
CLUSTERED INDEX
Data_TotalMaximumCurrentPerCircuit
ON Data.TotalMaximumCurrentPerCircuit(CurcuitID);
GO
If I could create a check constraint on that indexed view, I
would be all set:
ALTER VIEW
Data.TotalMaximumCurrentPerCircuit
ADD CONSTRAINT
CHK_TotalMaximumCurrentPerCircuit_ValidCurcuit
CHECK(TotalMaximumCurrent <=
CircuitMaximumCurrent)
Instead, I need to use triggers or rather contrived kludges.
A built in native support for such quite common business rules would increase
the usefulness of SQL Server.
I have not created anything on Connect yet, because the
ideas are kind of raw yet. I am very interested in your feedback. What do you
think?
This continues the series on denormalizing, which started with this post about working hours and appointments
and continued with this post about running totals.
and a post about Storing intervals of time with no overlaps
Using ROWVERSION to enforce business rules
|
|
|
|
|
|