|
|
|
|
-
To ensure atomicity of transactions, we can use XACT_ABORT ON or wrap the transaction in TRY block and rollback in CATCH block. In some cases, the XACT_ABORT ON approach uses noticeably less CPU. I am posting repro scripts. Please run them, tweak them, and post your findings. Environment I've run my scripts on 2008 R2 Dev Edition. Snapshot isolation is enabled, READ_COMMITTED_SNAPSHOT is not enabled.
Test data We are using the same test data as in my previous post: CREATE TABLE dbo.Toggle1 ( id INT NOT NULL PRIMARY KEY , i INT NOT NULL ) ; GO INSERT INTO dbo.Toggle1 ( id, i ) VALUES ( 1, 0 ) ; GO CREATE TABLE dbo.Toggle2 ( id INT NOT NULL PRIMARY KEY , i INT NOT NULL ) ; GO INSERT INTO dbo.Toggle2 ( id, i ) VALUES ( 1, 0 ) ;Benchmarking The following two stored procedures run the same modifications 10K times. The first one uses XACT_ABORT ON to ensure atomicity of the transaction: CREATE PROCEDURE dbo.Toggle10kTimesWithXactAbortOn AS BEGIN; SET NOCOUNT ON ; SET XACT_ABORT ON ; DECLARE @i INT ; SET @i = 0 ; WHILE @i < 10000 BEGIN ; BEGIN TRAN ; UPDATE dbo.Toggle1 SET i = 1 - i WHERE id = 1 ; UPDATE dbo.Toggle2 SET i = 1 - i WHERE id = 1 ; COMMIT ; SET @i = @i + 1 ; END ; END ;
The second procedure uses a ROLLBACK in a CATCH block:
CREATE PROCEDURE dbo.Toggle10kTimesWithTryCatch AS BEGIN; SET NOCOUNT ON ; DECLARE @i INT ; SET @i = 0 ; WHILE @i < 10000 BEGIN ; BEGIN TRY ; BEGIN TRAN ; UPDATE dbo.Toggle1 SET i = 1 - i WHERE id = 1 ; UPDATE dbo.Toggle2 SET i = 1 - i WHERE id = 1 ; COMMIT ; END TRY BEGIN CATCH ; ROLLBACK ; END CATCH ; SET @i = @i + 1 ; END ; END ; Benchmarking Let us run these two procedures several times: EXEC dbo.Toggle10kTimesWithXactAbortOn ; GO EXEC dbo.Toggle10kTimesWithTryCatch ;
Typical results are as follows: dbo.Toggle10kTimesWithXactAbortOn
CPU: 265-280 dbo.Toggle10kTimesWithTryCatch
CPU: 406-468 Conclusion As we have seen, sometimes using XACT_ABORT ON may use less CPU as opposed to using TRY...CATCH.
I am and not making any blanket statements here - you are encouraged to run your own benchmarks and see if you are observing this effect in your situation. If you do so, can you do us all a favor and post your findings.
|
-
Wrapping related changes in a transaction is a good way to ensure data integrity. Besides, in some cases it just runs noticeably faster, using less CPU. As usual, I am posting repro scripts, which you can run, tweak, and see for yourself. Environment I've run my scripts on 2008 R2 Dev Edition. Snapshot isolation is enabled, READ_COMMITTED_SNAPSHOT is not enabled.
Test data After applying Occum's razor, all we need is two tables with one row in each: CREATE TABLE dbo.Toggle1 ( id INT NOT NULL PRIMARY KEY , i INT NOT NULL ) ; GO INSERT INTO dbo.Toggle1 ( id, i ) VALUES ( 1, 0 ) ; GO CREATE TABLE dbo.Toggle2 ( id INT NOT NULL PRIMARY KEY , i INT NOT NULL ) ; GO INSERT INTO dbo.Toggle2 ( id, i ) VALUES ( 1, 0 ) ;
Benchmarking The following two stored procedures run the same modifications 10K times. The first one does not use an explicit transaction: CREATE PROCEDURE dbo.Toggle10kTimesX3WithoutTransaction AS BEGIN; SET NOCOUNT ON ; SET XACT_ABORT OFF ; DECLARE @i INT ; SET @i = 0 ; WHILE @i < 10000 BEGIN ; UPDATE dbo.Toggle1 SET i = 1 - i WHERE id = 1 ; UPDATE dbo.Toggle1 SET i = 1 - i WHERE id = 1 ; UPDATE dbo.Toggle1 SET i = 1 - i WHERE id = 1 ; UPDATE dbo.Toggle2 SET i = 1 - i WHERE id = 1 ; UPDATE dbo.Toggle2 SET i = 1 - i WHERE id = 1 ; UPDATE dbo.Toggle2 SET i = 1 - i WHERE id = 1 ; SET @i = @i + 1 ; END ; END ; The second procedure does use an explicit transaction: CREATE PROCEDURE dbo.Toggle10kTimesX3WithXactAbortOff AS BEGIN; SET NOCOUNT ON ; SET XACT_ABORT OFF ; DECLARE @i INT ; SET @i = 0 ; WHILE @i < 10000 BEGIN ; BEGIN TRAN ; UPDATE dbo.Toggle1 SET i = 1 - i WHERE id = 1 ; UPDATE dbo.Toggle1 SET i = 1 - i WHERE id = 1 ; UPDATE dbo.Toggle1 SET i = 1 - i WHERE id = 1 ; UPDATE dbo.Toggle2 SET i = 1 - i WHERE id = 1 ; UPDATE dbo.Toggle2 SET i = 1 - i WHERE id = 1 ; UPDATE dbo.Toggle2 SET i = 1 - i WHERE id = 1 ; COMMIT ; SET @i = @i + 1 ; END ; END ;
Let's run the following script a few times: EXEC dbo.Toggle10kTimesX3WithoutTransaction ; GO EXEC dbo.Toggle10kTimesX3WithXactAbortOff ;
Typical CPU and duration is as follows: dbo.Toggle10kTimesX3WithoutTransaction CPU: 450-530 Duration:1310 dbo.Toggle10kTimesX3WithXactAbortOff
CPU: 110-140 Duration: 255 Conclusion As we have seen, wrapping related changes in as transaction may improve performance.By related, I mean changes that should all succeed or all fail if we are ensuring atomicity.
However, we typically need to keep transactions short, so we should not bundle unrelated changes in transactions - that may hurt concurrency. Also I am being very careful and not making any blanket statements here. I did not research this effect with due diligence. I noticed the effect in one case and came up with a repro that works in one particular case on one server, this is all. I would really appreciate if you run the repro, tweak it as you see needed, and post the results here. TIA!
In general, we should not assume that larger transactions always mean better performance - the opposite is true in many cases. We should always benchmark the specific case at hand and decide on case by case basis.
|
-
Be careful: unlike most other languages, T-SQL does not limit variables' scope to the block where the variable has been defined. For example, the following snippet compiles and runs: -- @to is not in scope yet -- the line below would not compile --SET @to = @to + 3 ;
IF DATEPART(weekday, GETDATE()) = 3 BEGIN ; -- the scope of this DECLARE does not end when the block ends DECLARE @from INT , @to INT ; SELECT @from = 5 , @to = 7 ; SELECT Number FROM data.Numbers WHERE Number BETWEEN @from AND @to ; -- both @from and @to are defined beyond this point END ELSE BEGIN ; -- Big surprise: -- @from and @to are still in scope, this compiles and runs SET @to = @to + 3 ; SELECT Number FROM data.Numbers WHERE Number BETWEEN @from AND @to ; END ;
This is counter-intuitive for most developers that worked with other languages. IMO this is one of those features we would probably be better off without. What do you think?
|
-
Our OLTPish mixed load system has not had a single deadlock since last April, and we just love it. I would not make any blanket statements, but I think in our case being deadlock-free just makes a lot of practical sense. Of course, in other cases in might not be worth it. To accomplish deadlock-free state, we have taken the following steps: - Snapshot isolation for all readers
- If concurrent updates are not frequent, we just acquire some exclusive lock before updating. As long as collisions are not likely, this does not slow us down much. We stress test how it works.
- If concurrent updates become frequent, we solve the problem outside the database. Our application servers accumulate incoming changes and save them in bulk. Of course, these bulk saves still have to acquire some exclusive lock before updating. We stress test how it works.
I still have to see how it works out long term, but short term we save considerable effort on not troubleshooting deadlocks, and not having to implement retry logic. So far I think that being deadlock-free is worth it, but only if we can avoid high concurrency, when lots of modifications touch a single row each. If we cannot avoid high concurrency, then it might be easier to just let it deadlock from time to time.
What do you think? Are you doing anything similar?
|
-
Refactoring tables does not have to be such a big deal. We do not have to shut modifications down, migrate all data into new structure, deploy modified modules, and do it all at once, while the system is down. Doing all migration at once may be risky, and if we make a mistake, there is no easy rollback to the old version. This is why whenever we need to refactor a large table, we are using an alternative, low-risk, no-downtime, incremental approach. The method I am going to describe has been used in practice several times, and we did not have any problems with it. All our transitions from old table structure to the new one were smooth and incremental. More to the point, we were able to set aside the migration at any time and switch to some other more important task, or leave for the day, or enjoy the weekend, all while the database was fully functional. Typical refactoring scenario I am totally making this up, I have never worked on bug tracking systems. Suppose that we are storing tickets in a table dbo.Tickets, which, among other columns, has a column named AssignedTo. As such, each ticket can be assigned to only one person at any time. However, the next version of our system will allow to assign a ticket to more than one person. Our plan is to create another table, dbo.TicketAssignments(TicketID, AssignedTo), migrate existing data to it, and change appr. 50 stored procedures affected by this change. To minimize risks, we are going to finish all database changes at least a few days before the new version of our system is released. This means that the current version of our system is going to run against our refactored database exactly as it did against the old one. BTW, the table dbo.Tickets is quite large, and is heavily used all the time. Before the migration. Prerequisites. We shall need some extra disk space, approximately as much as the old table uses up. Besides, we need a server that is not struggling with its current workload, so that it can withstand some additional work for the duration of our migration.
Also we shall need a good test coverage on all modules using the table, including unit tests, stress tests, and performance baselines. Typically in our system we already have solid test coverage.
One more thing: because table structure is going to change, inserts directly into the table are not going to work any more. As a result, all test data should be populated via stored procedures that continue to work against the new table structure. For example, we might use a stored procedure dbo.SaveTicket that has a parameter @AssingedTo to populate test data for our unit tests. Creating new tables, changing modifications. As our first step, we create two new empty tables: - dbo.TicketsV2, which has all the same columns as dbo.Tickets, except it does not have AssignedTo column;
- dbo.TicketAssignments(TicketID, AssignedTo)
We also change all the procedures which modify dbo.Tickets, so that they write to both old and new tables.
This biggest risk in this step is introducing concurrency-related problems. We need to stress test our modifications well. If we have any problems, however, we can just run a rollback script, changing all the modifications back to their original version. Such rollback takes just a split second. In fact, we never actually had any problems at this step, because our stress testing harness is quite solid. Of course, our modifications get slower, but we have made sure that our hardware can handle it. We had not had actual problems with slow modifications either. Migrating existing data At this stage all the reports are running off the old table.
As all new changes get saved into both old and new tables, we are also moving over all existing data to the new structure. We want this migration to be non-intrusive, so we typically just run one WHILE loop, moving over like 1K-10K rows at a time, so that our migration does not hinder OLTP activity and reports. Modifying the reports. While the data is migrating, we can take our time changing our stored procedures to read from new tables. Because we have good unit testing coverage, we can refactor procedures with confidence - if we break something, we shall know it right away. Because not all data has been migrated yet, we do not deploy the modified procedures. Verifying that migration completed. To verify that all data migrated correctly, we need to write a SQL query. To my best knowledge, there is no GUI tool that can efficiently compare large tables. Writing a query, however, is not that difficult. This query is going to use a lot of resources. We need to be careful not to bring the server to its knees while the query is running. There are several ways to accomplish that. Deploying modified reports. Once we have complete and correct data in the new tables, we can start deploying new procedures. We do not have to deploy them all at once - we can deploy them five or ten modules at a time, even if some other procedures still access the old table. There is one more problem our team need to be very careful with - our test server is not exactly identical to our production one. As such, we encounter a risk that our procedure runs fast in test environment, but is slow in production. This is why we first deploy our changed procedures into a different schema, which (schema) is not exposed to our user. For example, instead of altering procedure dbo.GetTicketsForDay, we create a new procedure Internal.GetTicketsForDay. Only developers have privileges on Internal schema, so users cannot execute it yet. Once we have executed Internal.GetTicketsForDay in production environment and are happy with performance, we can deploy it as dbo.GetTicketsForDay. Our only risk at this stage is that we can deploy poorly performing procedures. Our rollback strategy is simple - we just roll back to the original stored procedures that read from the old table. Finalizing the migration. Once all the reports access the new table, we can change our modifications, so that they no longer write to the old table. The old table can be archived out and dropped, reclaiming the additional storage which we needed for the migration. Conclusion As we have seen, we can refactor an OLTP table without downtime and with low risks, even if it is big and data migration takes a lot of time. One more thing: a common reaction to such Agile war stories is a recommendation to "do it right the first time". It is so common that I would like to address it right now. Of course, at the time when the previous version of bug tracking system, it was quite obvious that eventually we might need to add the ability to assign a ticket to more than one person. Of course, if we had the table dbo.TicketAssignments from the very beginning, we would not have to go through this complex refactoring. In other words, we should "do it right the first time", should we not? In general, "doing it right the first time", developing a flexible database structure that we should not have to change later, makes a lot of practical sense, but not under all circumstances.
More specifically, when the previous version of bug tracking system was being designed and developed, there were hundreds of brilliant ideas, lots of great features that might be useful later on. Implementing all these hundreds of brilliant ideas would take years, and a much bigger team. It was just not possible. More to the point, it was not what our customers wanted - they wanted us to take care of their most important problems first, and quickly. They did not want to wait until we could provide more features.
Instead of trying to add as many features as possible and as a result delivering a half baked low quality product, the team concentrated on developing only must-have features, on performance, and on very high quality. As a result, the product had a very limited set of features, but it delivered everything it promised, and it was simple, very fast, and rock solid. As such, it was a success. It would be quite easy to provide examples of over-engineered projects that tried to "do it right the first time" and miserably failed, but this is beyond the scope of this post...
|
-
In SQL 2008 R2, MERGE does not implement foreign keys properly. I will show both false negatives (valid rows are rejected) and false positives - orphan rows that are allowed to save.
False Negatives
The following tables implement a very common type/subtype pattern:
CREATE TABLE dbo.Vehicles( ID INT NOT NULL, [Type] VARCHAR(5) NOT NULL, CONSTRAINT Vehicles_PK PRIMARY KEY(ID), CONSTRAINT Vehicles_UNQ_ID_Type UNIQUE(ID, [Type]), CONSTRAINT Vehicles_CHK_ValidTypes CHECK([Type] IN ('Car', 'Truck')) ); GO
CREATE TABLE dbo.Cars(ID INT NOT NULL, [Type] AS CAST('Car' AS VARCHAR(5)) PERSISTED, OtherData VARCHAR(10) NULL, CONSTRAINT Cars_PK PRIMARY KEY(ID), CONSTRAINT Cars_FK_Vehicles FOREIGN KEY(ID, [Type]) REFERENCES dbo.Vehicles(ID, [Type]) ); GO -- adding parent rows INSERT INTO dbo.Vehicles(ID, [Type]) VALUES(1, 'Car'), (2, 'Truck');
The following INSERT succeeds:
INSERT INTO dbo.Cars(ID, OtherData) VALUES(1, 'Some Data');
DELETE FROM dbo.Cars;
MERGE should be able to add the same row, but it fails:
MERGE dbo.Cars AS TargetTable USING ( SELECT 1 AS ID , 'Some Data' AS OtherData ) AS SourceData ON SourceData.ID = TargetTable.ID WHEN NOT MATCHED THEN INSERT (ID, OtherData) VALUES(SourceData.ID, SourceData.OtherData);
Msg 547, LEVEL 16, State 0, Line 1 The MERGE statement conflicted WITH the FOREIGN KEY CONSTRAINT "Cars_FK_Vehicles" The conflict occurred IN DATABASE "Test" TABLE "dbo.Vehicles" The statement has been terminated.
False Positives
Martin Smith has posted this repro on stackoverflow, in response to my scenario described above. This script successfully adds an orphan row into dbo.Cars table:
ALTER TABLE dbo.Vehicles DROP CONSTRAINT Vehicles_CHK_ValidTypes ; GO
INSERT INTO dbo.Vehicles(ID, [Type]) VALUES (3, ''); GO
MERGE dbo.Cars AS TargetTable USING ( SELECT 3 AS ID , 'Some Data' AS OtherData ) AS SourceData ON SourceData.ID = TargetTable.ID WHEN NOT MATCHED THEN INSERT (ID, OtherData) VALUES(SourceData.ID, SourceData.OtherData); GO
-- the FK constraint is enabled and trusted SELECT is_disabled,is_not_trusted FROM sys.foreign_keys WHERE name = 'Cars_FK_Vehicles'; -- yet we have an orphan row SELECT * FROM dbo.Cars SELECT * FROM dbo.Vehicles GO
is_disabled is_not_trusted ----------- -------------- 0 0
(1 row(s) affected)
ID TYPE OtherData ----------- ----- ---------- 3 Car SOME Data
(1 row(s) affected)
ID TYPE ----------- ----- 1 Car 2 Truck 3
Conclusion
In Ch 7 of my book "Defensive Database Programming", I wrote the following statement.
"As long as a constraint is trusted, then we know for sure that all our data complies with the rules that it dictates."
At the time I was writing that, I believed that was true. Unfortunately, as we have just seen, I was wrong - we can bypass trusted foreign key constraints in SQL 2008 R2. P.S. I have just learned that there is another scenario to bypass a foreign key. Google up this: "Parameterized DELETE and MERGE Allow Foreign Key Constraint Violations". Another very important thing that I have just learned that this Connect item did not get urgent priority. Quoting directly from Connect: "Since this is not a common scenario or use case i.e., doing DDL changes
and DML in the same transaction we don't see this as a big issue".
|
-
Suppose that Bob can retrieve all messages, as follows, and we are not happy with it:
EXEC Exchange.ShowAllMessages;
Suppose that we don't want him to read anything related to messages, so our knee-jerk reaction is this:
REVOKE EXECUTE ON Exchange.ShowAllMessages FROM Bob ;
This does not work - Bob still can execute Exchange.ShowAllMessages, just as he could before. The reason is simple: everyone were granted the privilege to execute that procedure:
GRANT EXECUTE ON Exchange.ShowAllMessages TO PUBLIC ;
This privilege still exists after we have run our REVOKE command, and Bob still belongs to PUBLIC. The following brute force method prevents Bob from reading messages, but at a steep price: no one else can read all messages any more:
REVOKE EXECUTE ON Exchange.ShowAllMessages FROM PUBLIC ;
The following script does a better job: Bob cannot invoke Exchange.ShowAllMessages directly any more, and all others can execute the procedure exactly as before:
GRANT EXECUTE ON Exchange.ShowAllMessages TO PUBLIC ; DENY EXECUTE ON Exchange.ShowAllMessages TO Bob ;
Note, however, the emphasis on the word directly: in some cases Bob still can execute the procedure indirectly, if it is called by another procedure he has access to.Suppose, for example. that we have two schemas, Exchange and Sales, both created with the same authorization:
CREATE SCHEMA Sales AUTHORIZATION dbo; GO CREATE SCHEMA Exchange AUTHORIZATION dbo;
Unfortunately, Bob can run the following procedure, and he will still see all the messages:
CREATE PROCEDURE Sales.ShowMySalesWithBackDoor AS BEGIN ; -- do something here IF SUSER_NAME() = 'Bob' BEGIN ; EXEC Exchange.ShowAllMessages ; END ; END ;
So, we have spent some time trying out different approaches, and nothing works so far. Clearly we are out of our depth - otherwise we should have completed this seemingly trivial task long ago.
It is time to do some learning - should we have deep understanding of SQL Server security, we would have solved this problem easily. To improve our knowledge, I would suggest the following two resources:
Expert SQL Server 2005 Development
by Adam Machanic,and
Protecting SQL Server Data by John Magnabosco.
|
-
When a select uses scalar or multi-statement UDFs under READ_COMMITTED_SNAPSHOT, we might not get consistent results as of the time our select began - I will provide simple repro scripts. At the time of this writing MSDN clearly states the following: "Read committed isolation using row versioning provides statement-level read consistency". However, if our select invokes scalar or multi-statement UDFs, this consistency is ensured per a statement in the body of the UDF, not per the whole statement that invokes the UDF(s).
Prerequisites Before we run the repro scripts, we need the following setup: ALTER DATABASE MyTest SET READ_COMMITTED_SNAPSHOT ON ; GO
USE MyTest ; GO
CREATE TABLE dbo.Messages ( MessageID INT NOT NULL CONSTRAINT PK_Messages PRIMARY KEY , MessageBody VARCHAR(50) NOT NULL ) ; GO
INSERT INTO dbo.Messages ( MessageID, MessageBody ) VALUES ( 1, 'Hello world' ) ; GO
Getting inconsistent results from a multi-statement UDF First, we need to create this UDF: CREATE FUNCTION dbo.AllMessagesTwice ( ) RETURNS @ret TABLE ( MessageBody VARCHAR(50) NOT NULL , SelectedAt DATETIME2(5) NOT NULL ) AS BEGIN ; INSERT @ret ( MessageBody , SelectedAt ) SELECT MessageBody , GETDATE() FROM dbo.Messages ; DECLARE @d DATETIME2(5), @i INT ; SET @d = DATEADD(SECOND, 30, GETDATE()) ; WHILE GETDATE() < @d BEGIN ; SET @i = 1; END ; INSERT @ret ( MessageBody , SelectedAt ) SELECT MessageBody , GETDATE() FROM dbo.Messages ; RETURN ; END ; GO
Next, let us open a tab and cut and paste the following script, but not run it yet: SET TRANSACTION ISOLATION LEVEL READ COMMITTED ; SELECT * FROM dbo.AllMessagesTwice ( )
DELETE FROM dbo.Messages WHERE MessageId = 2 ;
In another tab, cut and paste the following script, but not run it either: INSERT INTO dbo.Messages ( MessageID, MessageBody ) VALUES ( 2, 'Hi there' ) ;
Run the select script in the first tab - it takes 30 seconds to complete. Immediately return to the second tab and insert the 'Hi there' message. When the select in the first tab completes, we shall clearly see the 'Hi there' message, which definitle was inserted after the select began, as well as the timestamp that shows when the row was selected, 30 seconds after the beginning of the statement: Hello world 2011-08-02 20:56:44.70000 Hello world 2011-08-02 20:57:14.74000 Hi there 2011-08-02 20:57:14.74000
As we have seen, a multi-statement UDF can return data that was committed after the time when the select using the UDF began. Getting inconsistent results from a scalar UDF
Here is the scalar UDF we shall be using: CREATE FUNCTION dbo.MessageWithLastId ( ) RETURNS VARCHAR(50) AS BEGIN ; DECLARE @d DATETIME2(5) , @i INT ; SET @d = DATEADD(SECOND, 30, GETDATE()) ; WHILE GETDATE() < @d BEGIN ; SET @i = 1 ; END ; RETURN (SELECT TOP(1) MessageBody FROM dbo.Messages ORDER BY MessageID DESC) ; END ; GO
When we have created the scalar UDF, let us cut, paste, and run the following script: SET TRANSACTION ISOLATION LEVEL READ COMMITTED ;
DELETE FROM dbo.Messages WHERE MessageId = 2 ; PRINT 'Data as of the beginning of the dbo.MessageWithLastId ( ) below' SELECT MessageBody FROM dbo.Messages ORDER BY MessageID ;
PRINT 'Data returned by dbo.MessageWithLastId ( )' SELECT dbo.MessageWithLastId ( ) ;
DELETE FROM dbo.Messages WHERE MessageId = 2 ;
In the second tab we need to immediately run the following, we have 30 seconds to do so: INSERT INTO dbo.Messages ( MessageID, MessageBody ) VALUES ( 2, 'Hi there' ) ;
When the select in the first tab completes, we shall see that the scalar UDF returned the value that was not in the table when the select has started: Data AS OF the beginning OF the dbo.MessageWithLastId ( ) below MessageBody -------------------------------------------------- Hello world
(1 row(s) affected)
Data returned BY dbo.MessageWithLastId ( )
-------------------------------------------------- Hi there
As we have seen, a scalar UDF can read data that was committed after the time when the select using the UDF began. Snapshot isolation guarantees consistency
We can rerun both previous examples under snapshot isolation. All we need to do is add the following line immediately before the select that invokes our UDF: SET TRANSACTION ISOLATION LEVEL SNAPSHOT ;
In both cases we shall see that under snapshot isolation our selects return consistent data as of the time the select started. Conclusion As we have seen, if we are using scalar or multi-statement UDFs, and we need statement level consistency, we need to use snapshot isolation - READ_COMMITTED_SNAPSHOT may fail to provide the consistency we need. Naturally, inline UDFs are esentially macroes - they are flattened into the select they are invoked from. As such, READ_COMMITTED_SNAPSHOT does provide statement level consistency as long as we are only using inline UDFs, and do not use scalar and multi-statement ones.
I would like to thank Erland Sommarskog, whose feedback inspired this post - I am posting this content with his permission.
|
-
The overhead of entering a TRY block is relatively small, but the overhead of catching an exception is more substantial. If we manage to use CASE or IF to prevent an exception, we can noticeably improve performance. We shall run benchmarks and see for ourselves - in our example, catching an exception will be approximately nine times more expensive than exiting the TRY block without it. Entering TRY blocks is not free In this section we shall wrap the body of a stored procedure in a TRY CATCH block, and measure the overhead. The following procedure does not use a TRY CATCH block: CREATE ALTER PROCEDURE dbo.PerformDivision @dividend INT , @divisor INT , @Quotient INT OUT AS BEGIN; SET NOCOUNT ON ; SELECT @Quotient = CASE WHEN @divisor = 0 THEN NULL ELSE @dividend / @divisor END ; RETURN @@ERROR ; END ; GO
Let us benchmark it. Whether the divisor is zero or not, the procedure runs at the same speed: CREATE PROCEDURE dbo.PerformDivision100kTimes @dividend INT , @divisor INT AS BEGIN ; DECLARE @i INT , @Quotient INT ; SET @i = 1 ; WHILE @i <= 100000 BEGIN ; EXEC dbo.PerformDivision @dividend = @dividend, @divisor = @divisor, @Quotient = @Quotient OUT ; SET @i = @i + 1 ; END ; END ; GO
EXEC dbo.PerformDivision100kTimes @dividend = 1 , @divisor = 1 -- cpu 1250 ms GO EXEC dbo.PerformDivision100kTimes @dividend = 1 , @divisor = 0 -- cpu 1250 ms GO
Let us remove the CASE expression, so that the division can raise an exception, and wrap the division operator in a TRY CATCH block: ALTER PROCEDURE dbo.PerformDivision @dividend INT , @divisor INT , @Quotient INT OUT AS BEGIN; SET NOCOUNT ON ; BEGIN TRY SELECT @Quotient = @dividend / @divisor ; END TRY BEGIN CATCH SELECT @Quotient = NULL ; END CATCH ; RETURN @@ERROR ; END ; GO
Even when the division does not raise an exception, the procedure runs 36% slower, 1703 ms vs 1250 ms: EXEC dbo.PerformDivision100kTimes @dividend = 1 , @divisor = 1 -- cpu 1703 ms GO
Catching exceptions is a noticeable overhead
When the divisor is zero, and division does raise an exception that is caught in the CATCH block, the procedure runs 4.36 times slower, 5453 ms vs 1250 ms: EXEC dbo.PerformDivision100kTimes @dividend = 1 , @divisor = 0 -- cpu 5453 ms GO
As we have seen, catching exceptions in CATCH blocks is slow. In this particular scenario, it is approximately nine times more expensive than just entering the TRY block. If we manage to use CASE or IF to avoid an exception, we can noticeably improve performance.
|
-
It is well known that INSERT ... EXEC is not free, but how expensive is it as compared to its alternatives? We shall run benchmarks and see that:
- Invoking a multi-statement UDF is approximately two times faster than capturing the output of a stored procedure in a table variable via INSERT ... EXEC;
- INSERT ... EXEC into a table variable is approximately two times faster than INSERT ... EXEC into a temporary table;
- Inline UDFs are much cheaper than multi-statement UDFs;
- The cost of invoking a multi-statement UDF or running INSERT ... EXEC may be several times higher than the cost of running the DML inside the UDF or stored procedure.
We shall only run benchmarks and estimate real execution costs. Whenever we choose whether to cut-and-paste code or to reuse it, whether via INSERT...EXEC, or in a multi-statement UDF, knowing these numbers may be very useful.
For an in-depth analysis of what contributes to the overhead, make sure that you have read the following great post by Adam Machanic:
The Hidden Costs of INSERT EXEC
Also when we use INSERT...EXEC a lot, we increase the workload on tempdb, but this is beyond the scope of this post.
Determining the cost of INSERT ... EXEC into a table variable.
The following stored procedure is very fast, so that the cost of its execution should be quite low. Also it does not contain any DML and as such it does not perform any reads when it runs.
CREATE PROCEDURE dbo.GetProduct @int1 INT, @int2 INT AS BEGIN; SET NOCOUNT ON ; SELECT @int1 * @int2 AS Product ; RETURN @@ERROR ; END ; GO
Suppose also that we need the product of two integers in an output parameter, not in a result set. To accomplish that, we certainly can capture its output via INSERT ... EXEC, as follows:
CREATE PROCEDURE dbo.ReuseGetProduct @int1 INT , @int2 INT , @product INT OUT AS BEGIN ; SET NOCOUNT ON ; DECLARE @res TABLE ( product INT ) ; INSERT @res ( Product ) EXEC dbo.GetProduct @int1 = @int1, @int2 = @int2 ; SELECT @product = Product FROM @res ; END ; GO
Alternatively, we can cut and paste some code and get the product of two integers directly into an output parameter:
CREATE PROCEDURE dbo.GetProductDirectly @int1 INT , @int2 INT , @product INT OUT AS BEGIN ; SET @product = @int1 * @int2 ; END ; GO
To benchmark these two approaches, let us create and execute the following two procedures:
CREATE PROCEDURE dbo.LoopReuseGetProduct AS BEGIN ; DECLARE @i INT , @product INT ; SET @i = 1 ; WHILE @i <= 100000 BEGIN ; EXEC dbo.ReuseGetProduct @int1 = 1, @int2 = 2, @product = @product OUT ; SET @i = @i + 1 ; END ; END ; GO DROP PROCEDURE dbo.LoopGetProductDirectly CREATE PROCEDURE dbo.LoopGetProductDirectly AS BEGIN ; DECLARE @i INT , @product INT ; SET @i = 1 ; WHILE @i <= 100000 BEGIN ; EXEC dbo.GetProductDirectly @int1 = 1, @int2 = 2, @product = @product OUT ; SET @i = @i + 1 ; END ; END ; GO
EXEC dbo.LoopReuseGetProduct ; GO EXEC dbo.LoopGetProductDirectly ;
The difference in performance is dramatic: while LoopGetProductDirectly uses 1094 ms CPU and no reads, LoopReuseGetProduct uses 21000 ms CPU and performs 1.9M reads.
So the overhead of every INSERT...EXEC into a table variable in this particular case is approximately 19 reads and 0.2 ms of CPU.
Note: if you are wondering why our benchmarks are wrapped as stored procedures, read an excellent post by Linchi Shea entitled "Performance impact: stored procedures, SQL batches, and CPU usage"
Comparing INSERT ... EXEC into a table variable vs. INSERT ... EXEC into a temporary table
As we have seen, INSERT ... EXEC into a table variable is not cheap. Let us replace the table variable with a temporary table and see if it runs faster:
ALTER PROCEDURE dbo.ReuseGetProduct @int1 INT , @int2 INT , @product INT OUT AS BEGIN ; SET NOCOUNT ON ; CREATE TABLE #res( product INT ) ; INSERT #res ( Product ) EXEC dbo.GetProduct @int1 = @int1, @int2 = @int2 ; SELECT @product = Product FROM #res ; END ; GO
When we rerun LoopReuseGetProduct, the performance is even worse: it uses 41600 ms CPU and performs 4.6M reads.
So the overhead of every INSERT...EXEC into a temporary table in this particular case is approximately 46 reads and 0.4 ms CPU.
Comparing INSERT ... EXEC into a table variable vs. invoking a UDF
To reuse our code, we also can wrap it in a multi-statement UDF, or in an inline one. As before, we shall wrap our benchmarks in stored procedures:
CREATE FUNCTION dbo.GetProduct_MultiStatement ( @int1 INT, @int2 INT ) RETURNS @ret TABLE ( Product INT ) AS BEGIN ; INSERT @ret ( Product ) SELECT @int1 * @int2 ; RETURN ; END ; GO
CREATE PROCEDURE dbo.LoopGetProduct_MultiStatement AS BEGIN ; DECLARE @i INT , @product INT ; SET @i = 1 ; WHILE @i <= 100000 BEGIN ; SELECT @product = product FROM dbo.GetProduct_MultiStatement(1, 1) ; SET @i = @i + 1 ; END ; END ; GO
CREATE FUNCTION dbo.GetProduct_Inline ( @int1 INT, @int2 INT ) RETURNS TABLE AS RETURN ( SELECT @int1 * @int2 AS Product ) ; GO
CREATE PROCEDURE dbo.LoopGetProduct_Inline AS BEGIN ; DECLARE @i INT , @product INT ; SET @i = 1 ; WHILE @i <= 100000 BEGIN ; SELECT @product = product FROM dbo.GetProduct_Inline(1, 1) ; SET @i = @i + 1 ; END ; END ; GO
Here are the results of our benchmarking:
- dbo.LoopGetProduct_Inline uses 563 ms of CPU and no reads;
- dbo.LoopGetProduct_MultiStatement uses 10266 ms of CPU and performs exactly 1M reads.
Clearly the inline UDF performs much faster.
In this scenario, the overhead of invoking a multi-statement UDF is 10 reads and 0.1 ms of CPU.
Is this overhead high as compared to the cost of executing DML?
If our DML is fast, it may take less resources to execute our DML than to invoke a multi-statement UDF or to run an INSERT...EXEC. To demonstrate that, let us create a small table with just 10K rows:
CREATE TABLE dbo.Products ( Multiplier1 INT NOT NULL , Multiplier2 INT NOT NULL , Product INT NOT NULL , CONSTRAINT PK_Products PRIMARY KEY(Multiplier1, Multiplier2 ) ) ; GO
INSERT INTO dbo.Products ( Multiplier1 , Multiplier2 , Product ) SELECT n1.n , n2.n , n1.n * n2.n FROM dbo.Numbers AS n1 CROSS JOIN dbo.Numbers AS n2 WHERE n1.n <= 100 AND n2.n <= 100 ; GO
Also let us alter our procedures and UDFs to read from this table:
ALTER PROCEDURE dbo.GetProduct @int1 INT, @int2 INT AS BEGIN; SET NOCOUNT ON ; SELECT Product FROM dbo.Products WHERE Multiplier1 = @int1 AND Multiplier2 = @int2 ; RETURN @@ERROR ; END ; GO
ALTER FUNCTION dbo.GetProduct_MultiStatement ( @int1 INT, @int2 INT ) RETURNS @ret TABLE ( Product INT ) AS BEGIN ; INSERT @ret ( Product ) SELECT Product FROM dbo.Products WHERE Multiplier1 = @int1 AND Multiplier2 = @int2 ; RETURN ; END ; GO
ALTER FUNCTION dbo.GetProduct_Inline ( @int1 INT, @int2 INT ) RETURNS TABLE AS RETURN ( SELECT Product FROM dbo.Products WHERE Multiplier1 = @int1 AND Multiplier2 = @int2 ) ; GO
Also let us revert the INSERT...EXEC back to its original version, which uses a table variable and is two times faster than the one with a temporary table:
ALTER PROCEDURE dbo.ReuseGetProduct @int1 INT , @int2 INT , @product INT OUT AS BEGIN ; SET NOCOUNT ON ; DECLARE @res TABLE ( product INT ) ; INSERT @res ( Product ) EXEC dbo.GetProduct @int1 = @int1, @int2 = @int2 ; SELECT @product = Product FROM @res ; END ; GO
Let us rerun our benchmarks:
EXEC dbo.LoopReuseGetProduct ; -- EXEC dbo.LoopReuseGetProduct ; cpu 21906 reads 2100115
GO EXEC dbo.LoopGetProduct_MultiStatement ; -- EXEC dbo.LoopGetProduct_MultiStatement ; cpu 11109 reads 1200010
GO EXEC dbo.LoopGetProduct_Inline ; --EXEC dbo.LoopGetProduct_Inline ; cpu 1079 reads 200000
Our table Products is small; it takes just two reads two perform a simple select ad look up the value in it. Because we look up the value 100K times, we use 200K reads to do the actual useful work.
In this scenario, less than 20% of CPU and reads are spent on executing the DML. The rest is the overhead of invoking a multi-statement UDF or of running an INSERT...EXEC.
Of course, if we are running more complex and/or slower DML, the ratio of useful work vs the overhead will be higher. As such, we might be more willing to reuse our code via INSERT...EXEC or via a multi-statement UDF, even though the overhead is quite substantial.
Conclusion
As we have seen, neither INSERT...EXEC nor the multi-statement UDF are cheap. Whenever we are deciding whether to use them, we need to consider their impact on performance, preferably by running benchmarks.
One more thing: I am running my benchmarks on SQL 2008 R2, on my laptop. The results I am getting on my servers are similar. However, you might get different results in other environments, on other versions of SQL Server. I encourage you to rerun my scripts, and post here whether you are getting similar results, and what is your environment.
My other posts discussing T-SQL code reuse and performance are as follows:
Will wrapping up code in a reusable stored procedure hurt performance?
Reuse Your Code with Table-Valued UDFs
|
-
I have just downloaded and started reading a free PDF book about version control systems (VCS), entitled Version Control by Example, written by Eric Sink. So far I have read a really informative chapter about the advantages of third generation VCS, such as Git, over the second generation ones, such as Subversion and TFS. So far it is a very nice read. I hope you will like it too.
|
-
It is well known that replacing a generic, one-size-fits-all query with several more specific ones may boost performance. However, in some cases these more specific queries do not perform any faster than the generic one. We shall consider several examples which demonstrate both scenarios.
Prerequisites
The following script creates a table and populates it with 4M rows, not much but enough for our test cases.
CREATE TABLE dbo.Packages ( ID INT NOT NULL IDENTITY , [length] INT NOT NULL , width INT NOT NULL , [weight] INT NOT NULL , SpaceFiller CHAR(30) NOT NULL, CONSTRAINT PK_Packages PRIMARY KEY ( ID ) ) ; GO
INSERT INTO dbo.Packages ( [length] , width , [weight], SpaceFiller ) SELECT n1.n , n2.n , 1 , '*' FROM ( SELECT n FROM dbo.Numbers WHERE n <= 2000 ) AS n1 CROSS JOIN ( SELECT n FROM dbo.Numbers WHERE n <= 2000 ) AS n2 ;
GO
CREATE INDEX Packages_length_width ON dbo.Packages([length], width) INCLUDE ( [weight], SpaceFiller ) ; GO
CREATE INDEX Packages_width_length ON dbo.Packages(width, [length]) INCLUDE ( [weight], SpaceFiller ) ; GO
When replacing a generic query with two specific ones boosts performance
The following generic query performs very poorly - it does not utilize any indexes. Instead, it scans the whole table and sorts it:
CREATE PROCEDURE dbo.GetTop1Weight @OrderBy VARCHAR(30) , @topWeight INT OUT AS BEGIN ; SET NOCOUNT ON ; SELECT TOP ( 1 ) @topWeight = [weight] FROM dbo.Packages ORDER BY CASE WHEN @OrderBy = 'Length,Width' THEN [length] ELSE [width] END , CASE WHEN @OrderBy = 'Length,Width' THEN [width] ELSE [length] END ; END ; GO
You can run it or at least see the execution plan, and see for yourself - the query needs about 25K logical reads to complete.
Let us replace this generic query with two specific ones:
ALTER PROCEDURE dbo.GetTop1Weight @OrderBy VARCHAR(30) , @topWeight INT OUT AS BEGIN ; SET NOCOUNT ON ; IF @OrderBy = 'Length,Width' BEGIN ; SELECT TOP ( 1 ) @topWeight = [weight] FROM dbo.Packages ORDER BY [length] , [width] ; END ; ELSE BEGIN ; SELECT TOP ( 1 ) @topWeight = [weight] FROM dbo.Packages ORDER BY [width] , [length] ; END ; END ; GO
As a result the query speeds up dramatically - for both sort orders it executes as one index seek:
DECLARE @topWeight INT ; EXEC dbo.GetTop1Weight @OrderBy = 'Length,Width', @topWeight = @topWeight OUT ; EXEC dbo.GetTop1Weight @OrderBy = 'Width,Length', @topWeight = @topWeight OUT ;
Both queries need just 3 logical reads to complete, which is a huge improvement.
As we have seen, sometimes replacing a generic, one-size-fits-all query with several more specific ones dramatically improves performance. However, in many other cases more specific queries do not perform any better.
When replacing a generic query with two specific ones does not improve performance at all
The following one-size-fits-all query performs very poorly:
CREATE PROCEDURE dbo.GetTotalWeight @length INT = NULL , @width INT = NULL , @totalWeight INT OUT AS BEGIN ; SET NOCOUNT ON ; IF ( ( @length IS NULL AND @width IS NULL ) OR ( @length IS NOT NULL AND @width IS NOT NULL ) ) BEGIN ; RAISERROR('Must provide exactly one of the following parameters: @width and @length', 16, 1) ; RETURN 1 ; END ; SELECT @totalWeight = SUM([weight]) FROM dbo.Packages WHERE [length] = COALESCE(@length, [length]) AND [width] = COALESCE(@width, [width]) ; END ; GO
Note: for simplicity, let us assume that we must always provide exactly one parameter, either length or width, but never both. In all the subsequent versions we shall skip the verification of parameters, the first IF statement, just to keep our examples shorter.
We can run the following script and see that every time our stored procedure scans the whole table and performs 25K logical reads:
DECLARE @totalWeight INT ; EXEC dbo.GetTotalWeight @length = 1, @totalWeight = @totalWeight OUT ; EXEC dbo.GetTotalWeight @width = 1, @totalWeight = @totalWeight OUT ;
Speeding up this query is very easy. The following query is index friendly. For brevity, we have omitted the first step, the verification that either length or width, but never both, is provided:
ALTER PROCEDURE dbo.GetTotalWeight @length INT = NULL , @width INT = NULL , @totalWeight INT OUT AS BEGIN ; SET NOCOUNT ON ; SELECT @totalWeight = SUM([weight]) FROM dbo.Packages WHERE [length] = @length OR [width] = @width ; END ; GO
Every time it uses the right index and needs just 16 logical reads:
DECLARE @totalWeight INT ; EXEC dbo.GetTotalWeight @length = 1, @totalWeight = @totalWeight OUT ; EXEC dbo.GetTotalWeight @width = 1, @totalWeight = @totalWeight OUT ;
In this case more specific queries do not run any faster. The following queries show several approaches which we can try out. We can keep all the code in one procedure, as shown in the following scripts:
ALTER PROCEDURE dbo.GetTotalWeight @length INT = NULL , @width INT = NULL , @totalWeight INT OUT AS BEGIN ; SET NOCOUNT ON ; SELECT @totalWeight = SUM([weight]) FROM ( SELECT [weight] FROM dbo.Packages WHERE [length] = @length UNION ALL SELECT [weight] FROM dbo.Packages WHERE [width] = @width ) AS T ; END ; GO
ALTER PROCEDURE dbo.GetTotalWeight @length INT = NULL , @width INT = NULL , @totalWeight INT OUT AS BEGIN ; SET NOCOUNT ON ; IF @length IS NOT NULL BEGIN ; SELECT @totalWeight = SUM([weight]) FROM dbo.Packages WHERE [length] = @length ; END ELSE BEGIN SELECT @totalWeight = SUM([weight]) FROM dbo.Packages WHERE [width] = @width ; END END ; GO
Alternatively, we can try out two separate procedures:
CREATE PROCEDURE dbo.GetTotalWeight_ByWidth @width INT , @totalWeight INT OUT AS BEGIN ; SET NOCOUNT ON ; SELECT @totalWeight = SUM([weight]) FROM dbo.Packages WHERE [width] = @width ; END ; GO
CREATE PROCEDURE dbo.GetTotalWeight_ByLength @length INT , @totalWeight INT OUT AS BEGIN ; SET NOCOUNT ON ; SELECT @totalWeight = SUM([weight]) FROM dbo.Packages WHERE [length] = @length ; END ; GO
DECLARE @totalWeight INT ; EXEC dbo.GetTotalWeight_ByLength @length = 1, @totalWeight = @totalWeight OUT ; EXEC dbo.GetTotalWeight_ByWidth @width = 1, @totalWeight = @totalWeight OUT ;
Anyway, none of the approaches we have tried out performs any faster than the optimized generic one - they all complete in 16 logical reads and use approximately the same CPU.
So, in this case replacing a generic procedure with several specific ones does not make any sense at all - it increases complexity without improving performance.
We are better off with the original generic query:
ALTER PROCEDURE dbo.GetTotalWeight @length INT = NULL , @width INT = NULL , @totalWeight INT OUT AS BEGIN ; SET NOCOUNT ON ; SELECT @totalWeight = SUM([weight]) FROM dbo.Packages WHERE [length] = @length OR [width] = @width ; END ; GO
Conclusion
Although in many cases replacing a generic query with several specific ones does boost performance, we should not use this approach in all cases as a rule of thumb. Instead, we need to prove that splitting a generic query does improve performance before actually doing it.
My other posts discussing T-SQL code reuse and performance are as follows:
Will wrapping up code in a reusable stored procedure hurt performance?
Reuse Your Code with Table-Valued UDFs
Comparing the cost of INSERT ... EXEC and its alternatives
|
-
In T-SQL programming, stored procedure calls are not free, and nested stored procedure calls cost even more. First, we
shall prove it by running benchmarks. Next, we shall determine if the
cost of calling stored procedures is high as compared to the rest of
workload.
Proving that stored procedure calls are not free
Let us begin with a very simple benchmark, using a trivial stored procedure that increments a variable:
CREATE PROCEDURE dbo.IncreaseByOne @i INT OUT AS BEGIN ; SET @i = @i + 1 ; END ; GO
In order to get more or less meaningful comparison, we shall invoke the procedure 100K times, and compare its real execution costs versus the cost of incrementing the variable directly in the body of the loop, as shown in the following script:
CREATE PROCEDURE dbo.DirectUpdateInLoop @NumIterations INT AS BEGIN ; SET NOCOUNT ON ; DECLARE @i INT ; SET @i = 0 ; WHILE @i < @NumIterations BEGIN ; SET @i = @i + 1 ; END ; SELECT @i AS [@i] ; END ; GO
CREATE PROCEDURE dbo.ProcedureCallInLoop @NumIterations INT AS BEGIN ; SET NOCOUNT ON ; DECLARE @i INT ; SET @i = 0 ; WHILE @i < @NumIterations BEGIN ; EXEC dbo.IncreaseByOne @i = @i OUT ; END ; SELECT @i AS [@i] ; END ; GO
EXEC dbo.DirectUpdateInLoop @NumIterations = 100000; GO EXEC dbo.ProcedureCallInLoop @NumIterations = 100000;
To exclude parse and compile time from our comparison, we can just invoke our stored procedures more than once, and disregard their first executions.
The second procedure, dbo.ProcedureCallInLoop , consistently runs much slower - 970 ms vs. 93 ms on my laptop, running SQL 2008 R2. Naturally, your mileage may vary.
Proving that nested stored procedure calls are even more expensive
Clearly, invoking a stored procedure is not free. To determine if nested stored procedure calls are more expensive, we shall create and run the following modules:
CREATE PROCEDURE dbo.WrappedIncreaseByOne @i INT OUT AS BEGIN ; EXEC dbo.IncreaseByOne @i = @i OUT ; END ; GO
CREATE PROCEDURE dbo.WrappedProcedureCallInLoop @NumIterations INT AS BEGIN ; SET NOCOUNT ON ; DECLARE @i INT ; SET @i = 0 ; WHILE @i < @NumIterations BEGIN ; EXEC dbo.WrappedIncreaseByOne @i = @i OUT ; END ; SELECT @i AS [@i] ; END ; GO
EXEC dbo.WrappedProcedureCallInLoop @NumIterations = 10; GO
EXEC dbo.WrappedProcedureCallInLoop @NumIterations = 100000;
Nested calls are consistently slower - dbo.WrappedProcedureCallInLoop runs in 1620 ms vs. 970 ms for the previous tests, dbo.ProcedureCallInLoop.
Comparing the overhead of calling procedures to the overhead of other actions
As we have seen, stored procedure calls are not free, especially nested ones. However, our benchmarks were not very realistic - our stored procedures were extremely trivial.
If a stored procedure actually executes some DML, how big is the overhead of calling it as compared to the overhead of running the DML inside it?
The following procedure executes just a little bit of DML - it updates one row:
CREATE TABLE dbo.Counters ( CounterId INT NOT NULL CONSTRAINT PK_Counters PRIMARY KEY , Amount INT NOT NULL ) ; GO INSERT INTO dbo.Counters ( CounterId, Amount ) VALUES ( 1, 0 ) ; GO CREATE PROCEDURE dbo.IncreaseCounterByOne @CounterId INT AS BEGIN ; SET NOCOUNT ON ; UPDATE dbo.Counters SET Amount = Amount + 1 WHERE CounterId = @CounterId ; END ; GO
As before, we shall be invoking it in a loop, as follows:
CREATE PROCEDURE dbo.IncreaseCounterViaProcedure @CounterId INT , @NumIterations INT AS BEGIN ; SET NOCOUNT ON ; DECLARE @i INT ; SET @i = 0 ; WHILE @i < @NumIterations BEGIN ; EXEC dbo.IncreaseCounterByOne @CounterId = @CounterId ;
SET @i = @i + 1 ; END ; END ; GO
Let us cut-and-paste the body of dbo.IncreaseCounterByOne into the calling module and see if i it improves the performance:
CREATE PROCEDURE dbo.IncreaseCounterViaDirectUpdate @CounterId INT , @NumIterations INT AS BEGIN ; SET NOCOUNT ON ; DECLARE @i INT ; SET @i = 0 ; WHILE @i < @NumIterations BEGIN ; UPDATE dbo.Counters SET Amount = Amount + 1 WHERE CounterId = @CounterId ; SET @i = @i + 1 ; END ; END ; GO
Cut-and-pasting the body of dbo.IncreaseCounterByOne into the loop definitely improves the performance, but not very much: CPU goes down from 3400 ms (dbo.IncreaseCounterViaProcedure) to 2400 ms (dbo.IncreaseCounterViaDirectUpdate), and duration does not change very much either.
For more complex stored procedures, the overhead of making a call should be even less pronounced as compared to the cost of executing DML.
Conclusion
As we have seen, sometimes we can eliminate stored procedure calls and save some CPU. However, we should not expect huge savings, unless we are flattening out deeply nested modules.
Also we have seen that although wrapping up reusable code in a stored procedure does slow down the execution, usually the performance penalty is relatively small.
My other posts discussing T-SQL code reuse and performance are as follows:
More specific queries may run faster, but we should not overdo it.
Reuse Your Code with Table-Valued UDFs
Comparing the cost of INSERT ... EXEC and its alternatives
|
-
-
|
Today I will be speaking at the March 2011 meeting of Chicago user group. Attached are demos and slides.
|
|
|
|
|
|