|
|
|
|
-
Suppose that Bob can retrieve all messages, as follows, and we are not happy with it:
EXEC Exchange.ShowAllMessages;
Suppose that we don't want him to read anything related to messages, so our knee-jerk reaction is this:
REVOKE EXECUTE ON Exchange.ShowAllMessages FROM Bob ;
This does not work - Bob still can execute Exchange.ShowAllMessages, just as he could before. The reason is simple: everyone were granted the privilege to execute that procedure:
GRANT EXECUTE ON Exchange.ShowAllMessages TO PUBLIC ;
This privilege still exists after we have run our REVOKE command, and Bob still belongs to PUBLIC. The following brute force method prevents Bob from reading messages, but at a steep price: no one else can read all messages any more:
REVOKE EXECUTE ON Exchange.ShowAllMessages FROM PUBLIC ;
The following script does a better job: Bob cannot invoke Exchange.ShowAllMessages directly any more, and all others can execute the procedure exactly as before:
GRANT EXECUTE ON Exchange.ShowAllMessages TO PUBLIC ; DENY EXECUTE ON Exchange.ShowAllMessages TO Bob ;
Note, however, the emphasis on the word directly: in some cases Bob still can execute the procedure indirectly, if it is called by another procedure he has access to.Suppose, for example. that we have two schemas, Exchange and Sales, both created with the same authorization:
CREATE SCHEMA Sales AUTHORIZATION dbo; GO CREATE SCHEMA Exchange AUTHORIZATION dbo;
Unfortunately, Bob can run the following procedure, and he will still see all the messages:
CREATE PROCEDURE Sales.ShowMySalesWithBackDoor AS BEGIN ; -- do something here IF SUSER_NAME() = 'Bob' BEGIN ; EXEC Exchange.ShowAllMessages ; END ; END ;
So, we have spent some time trying out different approaches, and nothing works so far. Clearly we are out of our depth - otherwise we should have completed this seemingly trivial task long ago.
It is time to do some learning - should we have deep understanding of SQL Server security, we would have solved this problem easily. To improve our knowledge, I would suggest the following two resources:
Expert SQL Server 2005 Development
by Adam Machanic,and
Protecting SQL Server Data by John Magnabosco.
|
-
When a select uses scalar or multi-statement UDFs under READ_COMMITTED_SNAPSHOT, we might not get consistent results as of the time our select began - I will provide simple repro scripts. At the time of this writing MSDN clearly states the following: "Read committed isolation using row versioning provides statement-level read consistency". However, if our select invokes scalar or multi-statement UDFs, this consistency is ensured per a statement in the body of the UDF, not per the whole statement that invokes the UDF(s).
Prerequisites Before we run the repro scripts, we need the following setup: ALTER DATABASE MyTest SET READ_COMMITTED_SNAPSHOT ON ; GO
USE MyTest ; GO
CREATE TABLE dbo.Messages ( MessageID INT NOT NULL CONSTRAINT PK_Messages PRIMARY KEY , MessageBody VARCHAR(50) NOT NULL ) ; GO
INSERT INTO dbo.Messages ( MessageID, MessageBody ) VALUES ( 1, 'Hello world' ) ; GO
Getting inconsistent results from a multi-statement UDF First, we need to create this UDF: CREATE FUNCTION dbo.AllMessagesTwice ( ) RETURNS @ret TABLE ( MessageBody VARCHAR(50) NOT NULL , SelectedAt DATETIME2(5) NOT NULL ) AS BEGIN ; INSERT @ret ( MessageBody , SelectedAt ) SELECT MessageBody , GETDATE() FROM dbo.Messages ; DECLARE @d DATETIME2(5), @i INT ; SET @d = DATEADD(SECOND, 30, GETDATE()) ; WHILE GETDATE() < @d BEGIN ; SET @i = 1; END ; INSERT @ret ( MessageBody , SelectedAt ) SELECT MessageBody , GETDATE() FROM dbo.Messages ; RETURN ; END ; GO
Next, let us open a tab and cut and paste the following script, but not run it yet: SET TRANSACTION ISOLATION LEVEL READ COMMITTED ; SELECT * FROM dbo.AllMessagesTwice ( )
DELETE FROM dbo.Messages WHERE MessageId = 2 ;
In another tab, cut and paste the following script, but not run it either: INSERT INTO dbo.Messages ( MessageID, MessageBody ) VALUES ( 2, 'Hi there' ) ;
Run the select script in the first tab - it takes 30 seconds to complete. Immediately return to the second tab and insert the 'Hi there' message. When the select in the first tab completes, we shall clearly see the 'Hi there' message, which definitle was inserted after the select began, as well as the timestamp that shows when the row was selected, 30 seconds after the beginning of the statement: Hello world 2011-08-02 20:56:44.70000 Hello world 2011-08-02 20:57:14.74000 Hi there 2011-08-02 20:57:14.74000
As we have seen, a multi-statement UDF can return data that was committed after the time when the select using the UDF began. Getting inconsistent results from a scalar UDF
Here is the scalar UDF we shall be using: CREATE FUNCTION dbo.MessageWithLastId ( ) RETURNS VARCHAR(50) AS BEGIN ; DECLARE @d DATETIME2(5) , @i INT ; SET @d = DATEADD(SECOND, 30, GETDATE()) ; WHILE GETDATE() < @d BEGIN ; SET @i = 1 ; END ; RETURN (SELECT TOP(1) MessageBody FROM dbo.Messages ORDER BY MessageID DESC) ; END ; GO
When we have created the scalar UDF, let us cut, paste, and run the following script: SET TRANSACTION ISOLATION LEVEL READ COMMITTED ;
DELETE FROM dbo.Messages WHERE MessageId = 2 ; PRINT 'Data as of the beginning of the dbo.MessageWithLastId ( ) below' SELECT MessageBody FROM dbo.Messages ORDER BY MessageID ;
PRINT 'Data returned by dbo.MessageWithLastId ( )' SELECT dbo.MessageWithLastId ( ) ;
DELETE FROM dbo.Messages WHERE MessageId = 2 ;
In the second tab we need to immediately run the following, we have 30 seconds to do so: INSERT INTO dbo.Messages ( MessageID, MessageBody ) VALUES ( 2, 'Hi there' ) ;
When the select in the first tab completes, we shall see that the scalar UDF returned the value that was not in the table when the select has started: Data AS OF the beginning OF the dbo.MessageWithLastId ( ) below MessageBody -------------------------------------------------- Hello world
(1 row(s) affected)
Data returned BY dbo.MessageWithLastId ( )
-------------------------------------------------- Hi there
As we have seen, a scalar UDF can read data that was committed after the time when the select using the UDF began. Snapshot isolation guarantees consistency
We can rerun both previous examples under snapshot isolation. All we need to do is add the following line immediately before the select that invokes our UDF: SET TRANSACTION ISOLATION LEVEL SNAPSHOT ;
In both cases we shall see that under snapshot isolation our selects return consistent data as of the time the select started. Conclusion As we have seen, if we are using scalar or multi-statement UDFs, and we need statement level consistency, we need to use snapshot isolation - READ_COMMITTED_SNAPSHOT may fail to provide the consistency we need. Naturally, inline UDFs are esentially macroes - they are flattened into the select they are invoked from. As such, READ_COMMITTED_SNAPSHOT does provide statement level consistency as long as we are only using inline UDFs, and do not use scalar and multi-statement ones.
I would like to thank Erland Sommarskog, whose feedback inspired this post - I am posting this content with his permission.
|
-
The overhead of entering a TRY block is relatively small, but the overhead of catching an exception is more substantial. If we manage to use CASE or IF to prevent an exception, we can noticeably improve performance. We shall run benchmarks and see for ourselves - in our example, catching an exception will be approximately nine times more expensive than exiting the TRY block without it. Entering TRY blocks is not free In this section we shall wrap the body of a stored procedure in a TRY CATCH block, and measure the overhead. The following procedure does not use a TRY CATCH block: CREATE ALTER PROCEDURE dbo.PerformDivision @dividend INT , @divisor INT , @Quotient INT OUT AS BEGIN; SET NOCOUNT ON ; SELECT @Quotient = CASE WHEN @divisor = 0 THEN NULL ELSE @dividend / @divisor END ; RETURN @@ERROR ; END ; GO
Let us benchmark it. Whether the divisor is zero or not, the procedure runs at the same speed: CREATE PROCEDURE dbo.PerformDivision100kTimes @dividend INT , @divisor INT AS BEGIN ; DECLARE @i INT , @Quotient INT ; SET @i = 1 ; WHILE @i <= 100000 BEGIN ; EXEC dbo.PerformDivision @dividend = @dividend, @divisor = @divisor, @Quotient = @Quotient OUT ; SET @i = @i + 1 ; END ; END ; GO
EXEC dbo.PerformDivision100kTimes @dividend = 1 , @divisor = 1 -- cpu 1250 ms GO EXEC dbo.PerformDivision100kTimes @dividend = 1 , @divisor = 0 -- cpu 1250 ms GO
Let us remove the CASE expression, so that the division can raise an exception, and wrap the division operator in a TRY CATCH block: ALTER PROCEDURE dbo.PerformDivision @dividend INT , @divisor INT , @Quotient INT OUT AS BEGIN; SET NOCOUNT ON ; BEGIN TRY SELECT @Quotient = @dividend / @divisor ; END TRY BEGIN CATCH SELECT @Quotient = NULL ; END CATCH ; RETURN @@ERROR ; END ; GO
Even when the division does not raise an exception, the procedure runs 36% slower, 1703 ms vs 1250 ms: EXEC dbo.PerformDivision100kTimes @dividend = 1 , @divisor = 1 -- cpu 1703 ms GO
Catching exceptions is a noticeable overhead
When the divisor is zero, and division does raise an exception that is caught in the CATCH block, the procedure runs 4.36 times slower, 5453 ms vs 1250 ms: EXEC dbo.PerformDivision100kTimes @dividend = 1 , @divisor = 0 -- cpu 5453 ms GO
As we have seen, catching exceptions in CATCH blocks is slow. In this particular scenario, it is approximately nine times more expensive than just entering the TRY block. If we manage to use CASE or IF to avoid an exception, we can noticeably improve performance.
|
-
It is well known that INSERT ... EXEC is not free, but how expensive is it as compared to its alternatives? We shall run benchmarks and see that:
- Invoking a multi-statement UDF is approximately two times faster than capturing the output of a stored procedure in a table variable via INSERT ... EXEC;
- INSERT ... EXEC into a table variable is approximately two times faster than INSERT ... EXEC into a temporary table;
- Inline UDFs are much cheaper than multi-statement UDFs;
- The cost of invoking a multi-statement UDF or running INSERT ... EXEC may be several times higher than the cost of running the DML inside the UDF or stored procedure.
We shall only run benchmarks and estimate real execution costs. Whenever we choose whether to cut-and-paste code or to reuse it, whether via INSERT...EXEC, or in a multi-statement UDF, knowing these numbers may be very useful.
For an in-depth analysis of what contributes to the overhead, make sure that you have read the following great post by Adam Machanic:
The Hidden Costs of INSERT EXEC
Also when we use INSERT...EXEC a lot, we increase the workload on tempdb, but this is beyond the scope of this post.
Determining the cost of INSERT ... EXEC into a table variable.
The following stored procedure is very fast, so that the cost of its execution should be quite low. Also it does not contain any DML and as such it does not perform any reads when it runs.
CREATE PROCEDURE dbo.GetProduct @int1 INT, @int2 INT AS BEGIN; SET NOCOUNT ON ; SELECT @int1 * @int2 AS Product ; RETURN @@ERROR ; END ; GO
Suppose also that we need the product of two integers in an output parameter, not in a result set. To accomplish that, we certainly can capture its output via INSERT ... EXEC, as follows:
CREATE PROCEDURE dbo.ReuseGetProduct @int1 INT , @int2 INT , @product INT OUT AS BEGIN ; SET NOCOUNT ON ; DECLARE @res TABLE ( product INT ) ; INSERT @res ( Product ) EXEC dbo.GetProduct @int1 = @int1, @int2 = @int2 ; SELECT @product = Product FROM @res ; END ; GO
Alternatively, we can cut and paste some code and get the product of two integers directly into an output parameter:
CREATE PROCEDURE dbo.GetProductDirectly @int1 INT , @int2 INT , @product INT OUT AS BEGIN ; SET @product = @int1 * @int2 ; END ; GO
To benchmark these two approaches, let us create and execute the following two procedures:
CREATE PROCEDURE dbo.LoopReuseGetProduct AS BEGIN ; DECLARE @i INT , @product INT ; SET @i = 1 ; WHILE @i <= 100000 BEGIN ; EXEC dbo.ReuseGetProduct @int1 = 1, @int2 = 2, @product = @product OUT ; SET @i = @i + 1 ; END ; END ; GO DROP PROCEDURE dbo.LoopGetProductDirectly CREATE PROCEDURE dbo.LoopGetProductDirectly AS BEGIN ; DECLARE @i INT , @product INT ; SET @i = 1 ; WHILE @i <= 100000 BEGIN ; EXEC dbo.GetProductDirectly @int1 = 1, @int2 = 2, @product = @product OUT ; SET @i = @i + 1 ; END ; END ; GO
EXEC dbo.LoopReuseGetProduct ; GO EXEC dbo.LoopGetProductDirectly ;
The difference in performance is dramatic: while LoopGetProductDirectly uses 1094 ms CPU and no reads, LoopReuseGetProduct uses 21000 ms CPU and performs 1.9M reads.
So the overhead of every INSERT...EXEC into a table variable in this particular case is approximately 19 reads and 0.2 ms of CPU.
Note: if you are wondering why our benchmarks are wrapped as stored procedures, read an excellent post by Linchi Shea entitled "Performance impact: stored procedures, SQL batches, and CPU usage"
Comparing INSERT ... EXEC into a table variable vs. INSERT ... EXEC into a temporary table
As we have seen, INSERT ... EXEC into a table variable is not cheap. Let us replace the table variable with a temporary table and see if it runs faster:
ALTER PROCEDURE dbo.ReuseGetProduct @int1 INT , @int2 INT , @product INT OUT AS BEGIN ; SET NOCOUNT ON ; CREATE TABLE #res( product INT ) ; INSERT #res ( Product ) EXEC dbo.GetProduct @int1 = @int1, @int2 = @int2 ; SELECT @product = Product FROM #res ; END ; GO
When we rerun LoopReuseGetProduct, the performance is even worse: it uses 41600 ms CPU and performs 4.6M reads.
So the overhead of every INSERT...EXEC into a temporary table in this particular case is approximately 46 reads and 0.4 ms CPU.
Comparing INSERT ... EXEC into a table variable vs. invoking a UDF
To reuse our code, we also can wrap it in a multi-statement UDF, or in an inline one. As before, we shall wrap our benchmarks in stored procedures:
CREATE FUNCTION dbo.GetProduct_MultiStatement ( @int1 INT, @int2 INT ) RETURNS @ret TABLE ( Product INT ) AS BEGIN ; INSERT @ret ( Product ) SELECT @int1 * @int2 ; RETURN ; END ; GO
CREATE PROCEDURE dbo.LoopGetProduct_MultiStatement AS BEGIN ; DECLARE @i INT , @product INT ; SET @i = 1 ; WHILE @i <= 100000 BEGIN ; SELECT @product = product FROM dbo.GetProduct_MultiStatement(1, 1) ; SET @i = @i + 1 ; END ; END ; GO
CREATE FUNCTION dbo.GetProduct_Inline ( @int1 INT, @int2 INT ) RETURNS TABLE AS RETURN ( SELECT @int1 * @int2 AS Product ) ; GO
CREATE PROCEDURE dbo.LoopGetProduct_Inline AS BEGIN ; DECLARE @i INT , @product INT ; SET @i = 1 ; WHILE @i <= 100000 BEGIN ; SELECT @product = product FROM dbo.GetProduct_Inline(1, 1) ; SET @i = @i + 1 ; END ; END ; GO
Here are the results of our benchmarking:
- dbo.LoopGetProduct_Inline uses 563 ms of CPU and no reads;
- dbo.LoopGetProduct_MultiStatement uses 10266 ms of CPU and performs exactly 1M reads.
Clearly the inline UDF performs much faster.
In this scenario, the overhead of invoking a multi-statement UDF is 10 reads and 0.1 ms of CPU.
Is this overhead high as compared to the cost of executing DML?
If our DML is fast, it may take less resources to execute our DML than to invoke a multi-statement UDF or to run an INSERT...EXEC. To demonstrate that, let us create a small table with just 10K rows:
CREATE TABLE dbo.Products ( Multiplier1 INT NOT NULL , Multiplier2 INT NOT NULL , Product INT NOT NULL , CONSTRAINT PK_Products PRIMARY KEY(Multiplier1, Multiplier2 ) ) ; GO
INSERT INTO dbo.Products ( Multiplier1 , Multiplier2 , Product ) SELECT n1.n , n2.n , n1.n * n2.n FROM dbo.Numbers AS n1 CROSS JOIN dbo.Numbers AS n2 WHERE n1.n <= 100 AND n2.n <= 100 ; GO
Also let us alter our procedures and UDFs to read from this table:
ALTER PROCEDURE dbo.GetProduct @int1 INT, @int2 INT AS BEGIN; SET NOCOUNT ON ; SELECT Product FROM dbo.Products WHERE Multiplier1 = @int1 AND Multiplier2 = @int2 ; RETURN @@ERROR ; END ; GO
ALTER FUNCTION dbo.GetProduct_MultiStatement ( @int1 INT, @int2 INT ) RETURNS @ret TABLE ( Product INT ) AS BEGIN ; INSERT @ret ( Product ) SELECT Product FROM dbo.Products WHERE Multiplier1 = @int1 AND Multiplier2 = @int2 ; RETURN ; END ; GO
ALTER FUNCTION dbo.GetProduct_Inline ( @int1 INT, @int2 INT ) RETURNS TABLE AS RETURN ( SELECT Product FROM dbo.Products WHERE Multiplier1 = @int1 AND Multiplier2 = @int2 ) ; GO
Also let us revert the INSERT...EXEC back to its original version, which uses a table variable and is two times faster than the one with a temporary table:
ALTER PROCEDURE dbo.ReuseGetProduct @int1 INT , @int2 INT , @product INT OUT AS BEGIN ; SET NOCOUNT ON ; DECLARE @res TABLE ( product INT ) ; INSERT @res ( Product ) EXEC dbo.GetProduct @int1 = @int1, @int2 = @int2 ; SELECT @product = Product FROM @res ; END ; GO
Let us rerun our benchmarks:
EXEC dbo.LoopReuseGetProduct ; -- EXEC dbo.LoopReuseGetProduct ; cpu 21906 reads 2100115
GO EXEC dbo.LoopGetProduct_MultiStatement ; -- EXEC dbo.LoopGetProduct_MultiStatement ; cpu 11109 reads 1200010
GO EXEC dbo.LoopGetProduct_Inline ; --EXEC dbo.LoopGetProduct_Inline ; cpu 1079 reads 200000
Our table Products is small; it takes just two reads two perform a simple select ad look up the value in it. Because we look up the value 100K times, we use 200K reads to do the actual useful work.
In this scenario, less than 20% of CPU and reads are spent on executing the DML. The rest is the overhead of invoking a multi-statement UDF or of running an INSERT...EXEC.
Of course, if we are running more complex and/or slower DML, the ratio of useful work vs the overhead will be higher. As such, we might be more willing to reuse our code via INSERT...EXEC or via a multi-statement UDF, even though the overhead is quite substantial.
Conclusion
As we have seen, neither INSERT...EXEC nor the multi-statement UDF are cheap. Whenever we are deciding whether to use them, we need to consider their impact on performance, preferably by running benchmarks.
One more thing: I am running my benchmarks on SQL 2008 R2, on my laptop. The results I am getting on my servers are similar. However, you might get different results in other environments, on other versions of SQL Server. I encourage you to rerun my scripts, and post here whether you are getting similar results, and what is your environment.
My other posts discussing T-SQL code reuse and performance are as follows:
Will wrapping up code in a reusable stored procedure hurt performance?
Reuse Your Code with Table-Valued UDFs
|
-
I have just downloaded and started reading a free PDF book about version control systems (VCS), entitled Version Control by Example, written by Eric Sink. So far I have read a really informative chapter about the advantages of third generation VCS, such as Git, over the second generation ones, such as Subversion and TFS. So far it is a very nice read. I hope you will like it too.
|
-
It is well known that replacing a generic, one-size-fits-all query with several more specific ones may boost performance. However, in some cases these more specific queries do not perform any faster than the generic one. We shall consider several examples which demonstrate both scenarios.
Prerequisites
The following script creates a table and populates it with 4M rows, not much but enough for our test cases.
CREATE TABLE dbo.Packages ( ID INT NOT NULL IDENTITY , [length] INT NOT NULL , width INT NOT NULL , [weight] INT NOT NULL , SpaceFiller CHAR(30) NOT NULL, CONSTRAINT PK_Packages PRIMARY KEY ( ID ) ) ; GO
INSERT INTO dbo.Packages ( [length] , width , [weight], SpaceFiller ) SELECT n1.n , n2.n , 1 , '*' FROM ( SELECT n FROM dbo.Numbers WHERE n <= 2000 ) AS n1 CROSS JOIN ( SELECT n FROM dbo.Numbers WHERE n <= 2000 ) AS n2 ;
GO
CREATE INDEX Packages_length_width ON dbo.Packages([length], width) INCLUDE ( [weight], SpaceFiller ) ; GO
CREATE INDEX Packages_width_length ON dbo.Packages(width, [length]) INCLUDE ( [weight], SpaceFiller ) ; GO
When replacing a generic query with two specific ones boosts performance
The following generic query performs very poorly - it does not utilize any indexes. Instead, it scans the whole table and sorts it:
CREATE PROCEDURE dbo.GetTop1Weight @OrderBy VARCHAR(30) , @topWeight INT OUT AS BEGIN ; SET NOCOUNT ON ; SELECT TOP ( 1 ) @topWeight = [weight] FROM dbo.Packages ORDER BY CASE WHEN @OrderBy = 'Length,Width' THEN [length] ELSE [width] END , CASE WHEN @OrderBy = 'Length,Width' THEN [width] ELSE [length] END ; END ; GO
You can run it or at least see the execution plan, and see for yourself - the query needs about 25K logical reads to complete.
Let us replace this generic query with two specific ones:
ALTER PROCEDURE dbo.GetTop1Weight @OrderBy VARCHAR(30) , @topWeight INT OUT AS BEGIN ; SET NOCOUNT ON ; IF @OrderBy = 'Length,Width' BEGIN ; SELECT TOP ( 1 ) @topWeight = [weight] FROM dbo.Packages ORDER BY [length] , [width] ; END ; ELSE BEGIN ; SELECT TOP ( 1 ) @topWeight = [weight] FROM dbo.Packages ORDER BY [width] , [length] ; END ; END ; GO
As a result the query speeds up dramatically - for both sort orders it executes as one index seek:
DECLARE @topWeight INT ; EXEC dbo.GetTop1Weight @OrderBy = 'Length,Width', @topWeight = @topWeight OUT ; EXEC dbo.GetTop1Weight @OrderBy = 'Width,Length', @topWeight = @topWeight OUT ;
Both queries need just 3 logical reads to complete, which is a huge improvement.
As we have seen, sometimes replacing a generic, one-size-fits-all query with several more specific ones dramatically improves performance. However, in many other cases more specific queries do not perform any better.
When replacing a generic query with two specific ones does not improve performance at all
The following one-size-fits-all query performs very poorly:
CREATE PROCEDURE dbo.GetTotalWeight @length INT = NULL , @width INT = NULL , @totalWeight INT OUT AS BEGIN ; SET NOCOUNT ON ; IF ( ( @length IS NULL AND @width IS NULL ) OR ( @length IS NOT NULL AND @width IS NOT NULL ) ) BEGIN ; RAISERROR('Must provide exactly one of the following parameters: @width and @length', 16, 1) ; RETURN 1 ; END ; SELECT @totalWeight = SUM([weight]) FROM dbo.Packages WHERE [length] = COALESCE(@length, [length]) AND [width] = COALESCE(@width, [width]) ; END ; GO
Note: for simplicity, let us assume that we must always provide exactly one parameter, either length or width, but never both. In all the subsequent versions we shall skip the verification of parameters, the first IF statement, just to keep our examples shorter.
We can run the following script and see that every time our stored procedure scans the whole table and performs 25K logical reads:
DECLARE @totalWeight INT ; EXEC dbo.GetTotalWeight @length = 1, @totalWeight = @totalWeight OUT ; EXEC dbo.GetTotalWeight @width = 1, @totalWeight = @totalWeight OUT ;
Speeding up this query is very easy. The following query is index friendly. For brevity, we have omitted the first step, the verification that either length or width, but never both, is provided:
ALTER PROCEDURE dbo.GetTotalWeight @length INT = NULL , @width INT = NULL , @totalWeight INT OUT AS BEGIN ; SET NOCOUNT ON ; SELECT @totalWeight = SUM([weight]) FROM dbo.Packages WHERE [length] = @length OR [width] = @width ; END ; GO
Every time it uses the right index and needs just 16 logical reads:
DECLARE @totalWeight INT ; EXEC dbo.GetTotalWeight @length = 1, @totalWeight = @totalWeight OUT ; EXEC dbo.GetTotalWeight @width = 1, @totalWeight = @totalWeight OUT ;
In this case more specific queries do not run any faster. The following queries show several approaches which we can try out. We can keep all the code in one procedure, as shown in the following scripts:
ALTER PROCEDURE dbo.GetTotalWeight @length INT = NULL , @width INT = NULL , @totalWeight INT OUT AS BEGIN ; SET NOCOUNT ON ; SELECT @totalWeight = SUM([weight]) FROM ( SELECT [weight] FROM dbo.Packages WHERE [length] = @length UNION ALL SELECT [weight] FROM dbo.Packages WHERE [width] = @width ) AS T ; END ; GO
ALTER PROCEDURE dbo.GetTotalWeight @length INT = NULL , @width INT = NULL , @totalWeight INT OUT AS BEGIN ; SET NOCOUNT ON ; IF @length IS NOT NULL BEGIN ; SELECT @totalWeight = SUM([weight]) FROM dbo.Packages WHERE [length] = @length ; END ELSE BEGIN SELECT @totalWeight = SUM([weight]) FROM dbo.Packages WHERE [width] = @width ; END END ; GO
Alternatively, we can try out two separate procedures:
CREATE PROCEDURE dbo.GetTotalWeight_ByWidth @width INT , @totalWeight INT OUT AS BEGIN ; SET NOCOUNT ON ; SELECT @totalWeight = SUM([weight]) FROM dbo.Packages WHERE [width] = @width ; END ; GO
CREATE PROCEDURE dbo.GetTotalWeight_ByLength @length INT , @totalWeight INT OUT AS BEGIN ; SET NOCOUNT ON ; SELECT @totalWeight = SUM([weight]) FROM dbo.Packages WHERE [length] = @length ; END ; GO
DECLARE @totalWeight INT ; EXEC dbo.GetTotalWeight_ByLength @length = 1, @totalWeight = @totalWeight OUT ; EXEC dbo.GetTotalWeight_ByWidth @width = 1, @totalWeight = @totalWeight OUT ;
Anyway, none of the approaches we have tried out performs any faster than the optimized generic one - they all complete in 16 logical reads and use approximately the same CPU.
So, in this case replacing a generic procedure with several specific ones does not make any sense at all - it increases complexity without improving performance.
We are better off with the original generic query:
ALTER PROCEDURE dbo.GetTotalWeight @length INT = NULL , @width INT = NULL , @totalWeight INT OUT AS BEGIN ; SET NOCOUNT ON ; SELECT @totalWeight = SUM([weight]) FROM dbo.Packages WHERE [length] = @length OR [width] = @width ; END ; GO
Conclusion
Although in many cases replacing a generic query with several specific ones does boost performance, we should not use this approach in all cases as a rule of thumb. Instead, we need to prove that splitting a generic query does improve performance before actually doing it.
My other posts discussing T-SQL code reuse and performance are as follows:
Will wrapping up code in a reusable stored procedure hurt performance?
Reuse Your Code with Table-Valued UDFs
Comparing the cost of INSERT ... EXEC and its alternatives
|
-
In T-SQL programming, stored procedure calls are not free, and nested stored procedure calls cost even more. First, we
shall prove it by running benchmarks. Next, we shall determine if the
cost of calling stored procedures is high as compared to the rest of
workload.
Proving that stored procedure calls are not free
Let us begin with a very simple benchmark, using a trivial stored procedure that increments a variable:
CREATE PROCEDURE dbo.IncreaseByOne @i INT OUT AS BEGIN ; SET @i = @i + 1 ; END ; GO
In order to get more or less meaningful comparison, we shall invoke the procedure 100K times, and compare its real execution costs versus the cost of incrementing the variable directly in the body of the loop, as shown in the following script:
CREATE PROCEDURE dbo.DirectUpdateInLoop @NumIterations INT AS BEGIN ; SET NOCOUNT ON ; DECLARE @i INT ; SET @i = 0 ; WHILE @i < @NumIterations BEGIN ; SET @i = @i + 1 ; END ; SELECT @i AS [@i] ; END ; GO
CREATE PROCEDURE dbo.ProcedureCallInLoop @NumIterations INT AS BEGIN ; SET NOCOUNT ON ; DECLARE @i INT ; SET @i = 0 ; WHILE @i < @NumIterations BEGIN ; EXEC dbo.IncreaseByOne @i = @i OUT ; END ; SELECT @i AS [@i] ; END ; GO
EXEC dbo.DirectUpdateInLoop @NumIterations = 100000; GO EXEC dbo.ProcedureCallInLoop @NumIterations = 100000;
To exclude parse and compile time from our comparison, we can just invoke our stored procedures more than once, and disregard their first executions.
The second procedure, dbo.ProcedureCallInLoop , consistently runs much slower - 970 ms vs. 93 ms on my laptop, running SQL 2008 R2. Naturally, your mileage may vary.
Proving that nested stored procedure calls are even more expensive
Clearly, invoking a stored procedure is not free. To determine if nested stored procedure calls are more expensive, we shall create and run the following modules:
CREATE PROCEDURE dbo.WrappedIncreaseByOne @i INT OUT AS BEGIN ; EXEC dbo.IncreaseByOne @i = @i OUT ; END ; GO
CREATE PROCEDURE dbo.WrappedProcedureCallInLoop @NumIterations INT AS BEGIN ; SET NOCOUNT ON ; DECLARE @i INT ; SET @i = 0 ; WHILE @i < @NumIterations BEGIN ; EXEC dbo.WrappedIncreaseByOne @i = @i OUT ; END ; SELECT @i AS [@i] ; END ; GO
EXEC dbo.WrappedProcedureCallInLoop @NumIterations = 10; GO
EXEC dbo.WrappedProcedureCallInLoop @NumIterations = 100000;
Nested calls are consistently slower - dbo.WrappedProcedureCallInLoop runs in 1620 ms vs. 970 ms for the previous tests, dbo.ProcedureCallInLoop.
Comparing the overhead of calling procedures to the overhead of other actions
As we have seen, stored procedure calls are not free, especially nested ones. However, our benchmarks were not very realistic - our stored procedures were extremely trivial.
If a stored procedure actually executes some DML, how big is the overhead of calling it as compared to the overhead of running the DML inside it?
The following procedure executes just a little bit of DML - it updates one row:
CREATE TABLE dbo.Counters ( CounterId INT NOT NULL CONSTRAINT PK_Counters PRIMARY KEY , Amount INT NOT NULL ) ; GO INSERT INTO dbo.Counters ( CounterId, Amount ) VALUES ( 1, 0 ) ; GO CREATE PROCEDURE dbo.IncreaseCounterByOne @CounterId INT AS BEGIN ; SET NOCOUNT ON ; UPDATE dbo.Counters SET Amount = Amount + 1 WHERE CounterId = @CounterId ; END ; GO
As before, we shall be invoking it in a loop, as follows:
CREATE PROCEDURE dbo.IncreaseCounterViaProcedure @CounterId INT , @NumIterations INT AS BEGIN ; SET NOCOUNT ON ; DECLARE @i INT ; SET @i = 0 ; WHILE @i < @NumIterations BEGIN ; EXEC dbo.IncreaseCounterByOne @CounterId = @CounterId ;
SET @i = @i + 1 ; END ; END ; GO
Let us cut-and-paste the body of dbo.IncreaseCounterByOne into the calling module and see if i it improves the performance:
CREATE PROCEDURE dbo.IncreaseCounterViaDirectUpdate @CounterId INT , @NumIterations INT AS BEGIN ; SET NOCOUNT ON ; DECLARE @i INT ; SET @i = 0 ; WHILE @i < @NumIterations BEGIN ; UPDATE dbo.Counters SET Amount = Amount + 1 WHERE CounterId = @CounterId ; SET @i = @i + 1 ; END ; END ; GO
Cut-and-pasting the body of dbo.IncreaseCounterByOne into the loop definitely improves the performance, but not very much: CPU goes down from 3400 ms (dbo.IncreaseCounterViaProcedure) to 2400 ms (dbo.IncreaseCounterViaDirectUpdate), and duration does not change very much either.
For more complex stored procedures, the overhead of making a call should be even less pronounced as compared to the cost of executing DML.
Conclusion
As we have seen, sometimes we can eliminate stored procedure calls and save some CPU. However, we should not expect huge savings, unless we are flattening out deeply nested modules.
Also we have seen that although wrapping up reusable code in a stored procedure does slow down the execution, usually the performance penalty is relatively small.
My other posts discussing T-SQL code reuse and performance are as follows:
More specific queries may run faster, but we should not overdo it.
Reuse Your Code with Table-Valued UDFs
Comparing the cost of INSERT ... EXEC and its alternatives
|
-
-
|
Today I will be speaking at the March 2011 meeting of Chicago user group. Attached are demos and slides.
|
-
Usually it is an honor when we get to present to a user group, but not this time, so let me explain.
I have no idea how my presentation got briefly mentioned in the invitation which went out today, without my consent. I have never asked or agreed to speak at SQL Connections February 2011 meeting in Chicago suburbs. Yet I apologize for any inconvenience it might have caused. I was going to speak at the meeting of December 2010, which was agreed by email with the person in charge. I had spent some time preparing for the presentation, and was very surprised to learn that my presentation was canceled when I received the invitation for December 2010 meeting with a different agenda. Nobody sent me a note telling me that my presentation has been canceled.
Apparently this way of canceling presentations is very unusual. I have discussed this with quite a few other speakers, and nobody has recalled anything similar. I was quite surprised when I had a closer look at the invitation for February 2011 meeting, which I just received:
You are invited to attend ...
SQL Connections February 2011
!!! SQLConnections 1st Birthday Celebration !!!
Please join us at the Microsoft Offices in Oak Brook for our 1st Birthday
BRENT OZAR
Tuning T-SQL Step by Step
If execution plans look like subway maps
of foreign cities, this session is for you. Brent will turn SQL Server
inside out by making YOU role-play as the SQL Server engine. He’ll feed
you one query at a time, make you write your own execution plans in
plain English, and then show how SQL Server draws out execution plans.
You’ll learn how sorts and joins rely on indexes, and get your first
glimpse into why statistics are so important. No tuning experience
necessary, but you do need to understand how to write SELECT statements
with joins.
Brent presented at the chapter’s first
meeting a year ago, and we are honored to have him back as the chapter’s
First Anniversary speaker!
Afterwards Alex will present the session he did not get to present in Decemeber.
Cant Wait to See You There!
Note that it does not include neither my last name nor the abstract for the presentation. Of course, Brent is a much better speaker than I. Naturally, the invitation may reflect this well known fact. However, failing to mention the speaker's last name and failing to include the abstract seems to be just a little bit over the top.
Even if I were planning to speak at this event, I would quite likely have canceled after such an invitation.
|
-
Although technically the sessions I submitted are entitled "Avoid Common Mistakes in T-SQL Programming" and "Developing Robust T-SQL for High Concurrency", the main thing you will take with you is the following: the ability to develop T-SQL that does not break in a changing environment, under real life concurrency. As such, you will not have to wake up at 3AM to make an urgent fix, and you will have your weekends for yourself. When you have mastered the art of defensive programming, you will not need to spend 80% of your time making fixes after fixes after fixes - you will enjoy your job better, and you will surely have the time, energy, and motivation to learn all those cool new features.
Vote!
|
-
When we transform initial ideas into clear requirements for databases, we typically have to make the following choices: Frequent maintenance vs doing it once. As we are clarifying the requirements, we need to determine whether we want to concinue spending considerable time maintaining the system, or if we want to finish it up and move on to other tasks. Race car maintenance vs installing electric wiring is my favorite analogy for this kind of choice.
In some cases we need to sqeeze every last bit of performance out of a highly important query, and we are ready to re-optimize as needed. This is similar to maintaining a race car, which is very labor-intensive, and is done very frequently, pretty much all the time. Another similarity is that the goal is to get the best possible performance, and we are ready to pay a high price for it.. When we deal with race cars, long term considerations are totally beyond our horizon – we need to do our best right now, whatever it takes, and we may go for another cool new approach at any moment. If this is the case, then doing anything defensively for the long term is completely irrelevant.
In other cases we really want to get the job done once, and not to return to it for as long as possible. This is similar to installing electric wiring in a house, because in order to avoid expensive re-wiring, we build in some redundancy into our solution; we have more outlets and more capacity than we actually need right now, and all outlets are grounded even if we only plug table lamps into them.
Typically our wiring solution does not perfectly fit our current needs: some outlets are not positioned exactly where they would be most convenient right now, some others are useless right now. We are likely to have to use extension cords, and we can tolerate that. Yet we are more willing to cope with the minor inconvenience of using an extension cord than to reinstall outlets every time we move furniture. When we deal with electric wiring, safety and long term stability is highly important, and it makes a lot of sense to do everything very defensively. Also it makes sense to build in more capacity than we need right now.
Of course, most real life situations are not as clearly cut as these two extreme examples – they are typically somewhere in between. Yet in many real life cases that we would rather do the query tuning just once, and we are ready for some overkill, for some redundancy – that is the price we are willing to pay for not having to return and re-optimize the query later. Best performance on average vs. acceptable performance all the time. Depending on the circumstances, the criteria for acceptable performance may be quite different in different situations. For example, as I am writing T-SQL code, book chapters, and articles, I can and do take breaks from time to time. As a result, it may take me an hour or more to finish just one sentence, or one method, or one query, and that’s OK. It is the overall, the average performance that matters, as long as I manage to meet my deadlines.
On the other hand, if I am driving, all my driving decisions must be done quickly, without any exceptions. Let us apply a similar approach to queries. For instance, let us consider the following two queries which always return identical results:
- Query A runs in 50 ms 99.8% of the time, but takes 1500 ms in 0.2% cases
- Query B always runs between 350ms and 450ms.
Which one of these two queries should we choose? It depends. Of course query A is a better citizen; on average it uses up less resources on the server. As such, in many cases we would really want to choose query A over query B. However, if our query is required to always return in 1000ms or less, then, unfortunately, query A does not meet the requirements, and query B is our only choice. As we have seen, to choose which query better fits the needs of the customers, we need to know the requirements.
Performance of reads vs. performance of writes. As we tune select queries we may need to make yet another choice: how much can we slow down our modifications to speed up our selects? As in two previous subsections, let us consider two extreme examples:
- Security video must always record real time, and we don’t expect very high quality from it. Most likely this video recording will never be viewed. Only in the unlikely event that it will be analyzed, only then we shall work on enhancing its quality.
- On the other hand, if we are making a movie, it can take several hours or even days to come up with a short episode. This is OK: because we expect a lot of views, we can afford to invest considerable time and effort in improving quality. Also modifying a DVD is not an option - if we need to change anything, we have to burn another one.
As in the previous choices, most real life situations fall somewhere between these two extreme cases, but we need to determine where we are on this scale.
|
-
In other words, if we add a month, then subtract a month, we might not get back to the date we started from. For example: SELECT DATEADD(MONTH, 1, DATEADD(MONTH, -1, '20100330')) , DATEADD(MONTH, -1, DATEADD(MONTH, 1, '20100330'))
2010-03-28 00:00:00.000 2010-03-30 00:00:00.000
If our code assumes otherwise, it has a bug.
|
-
The biggest advantage of unit testing is the ability to make changes quickly, and with confidence that we have not broken anything with our change. Whether we need to speed up a query real quick, or to fix a bug, automated testing saves us a lot of time, especially if test failure is exposed in user friendly way. This post continues the series on unit testing, the previous posts are
How to Benefit from Unit Testing T-SQL: choosing what not to test
How to Benefit from Unit Testing T-SQL: choosing what to test
How to Benefit from Unit Testing T-SQL. Part One.
How to Benefit from Unit Testing T-SQL: Reusing Manual Tests as Parts of Unit Tests
How to Benefit from Unit Testing T-SQL: Exposing Failure in a User-Friendly Way
Benefit from Unit Testing T-SQL: Reuse Unit Tests as Documentation
Benefit from Unit Testing T-SQL: Speed up Your Test Harness To ensure quick turnaround, we typically use the following approaches: Utilize version control. Use branches. When all the code, including unit tests, is in version control, the failure of our workstation does not delay us too much: we can quickly download source code on any other workstation and continue working. Similarly, when we work on a feature and use a branch, we can quickly switch context to the trunk, make a fix against it, and switch back to the branch. Run all tests off local instance. By default, build test database from scratch. If we have to share test databases, too much time can be wasted on coordinating, especially if we need to change database schema or test data. So, all the scripts needed to build the test database and populate it with test data must be downloaded from version control. Optionally, we should be able to bypass creating and populating test database - that saves time on subsequent runs. Practice, practice, practice. To act quickly, we need to be familiar with unit testing. Learning something during an emergency is not very efficient. Unit testing should be practiced frequently, so that our skills do not get rusty. Also we need to be familiar with our tools. If we use open source ones, we need to be familiar with the source code. Anyway, if our tools have gotchas, we need to learn them proactively, not when we are in a hurry and those gotchas get in our way. Needless to say, in most cases simple and rock solid tools are preferable to complex ones with gotchas.
|
-
In this post we shall see how adding test data breaks existing unit tests, and how to fix them.
This post continues the series on unit testing, the previous posts are
How to Benefit from Unit Testing T-SQL: choosing what not to test
How to Benefit from Unit Testing T-SQL: choosing what to test
How to Benefit from Unit Testing T-SQL. Part One.
How to Benefit from Unit Testing T-SQL: Reusing Manual Tests as Parts of Unit Tests
How to Benefit from Unit Testing T-SQL: Exposing Failure in a User-Friendly Way
Benefit from Unit Testing T-SQL: Reuse Unit Tests as Documentation
Benefit from Unit Testing T-SQL: Speed up Your Test Harness When changing test data breaks some tests
Suppose that we need to test another stored procedure, which returns
total amount of messages sent, grouped by user, month, and year.
Typically we prefer to reuse existing test data as much as possible.
However, to test this module we need messages sent on different months,
so we need to add more messages sent by the same user on another month.
When we do that, these additional rows may show up in the output of
the existing stored procedure, which may break some unit tests . To
accommodate for that, let us regenerate expected results, and visually
confirm the changes:

Typically it is much easier to reuse the same test data and adjust
expected results as needed, than to come up with separate test data for
each test.
|
|
|
|
|
|