|
|
|
|
Aaron is a senior consultant for SQL Sentry, Inc., makers of performance monitoring and event management software for SQL Server, Analysis Services, and Windows. He has been blogging here at sqlblog.com since 2006, focusing on manageability, performance, and new features; has been a Microsoft MVP since 1997; tweets as @AaronBertrand; and speaks frequently at user group meetings and SQL Saturday events.
I see a lot of people suggest while loops instead of cursors in situations where row-based processing is required (or, at least, where folks think that row-based processing is required). Sometimes the justification is that constructing a while loop is simpler and more straightforward than constructing a cursor. Others suggest that a while loop is faster than a cursor because, well, it isn't a cursor. Of course the underlying mechanics still represent a cursor, it's just not explicitly stated that way using DECLARE CURSOR.
The difficulty of writing a piece of code should not be the primary factor in avoiding that type of code. I use MERGE - while I will likely never memorize that syntax, I know it is safer and less likely to cause deadlocks than typical "UPSERT" methodologies. Looking up the syntax diagram in Books Online, or using templates or snippets, are good workarounds to having to avoid coding constructs because they're "too hard." The same is true for cursors - sure the syntax is cumbersome, but templates or snippets can do most of the work for you.
Besides, is it that much simpler to set up and use a WHILE loop? And is it really that much more efficient?
Simplicity
I asked a few colleagues to take this code sample and change it from an explicit cursor to a WHILE loop:
DECLARE @schema_name SYSNAME, @object_name SYSNAME, @index_name SYSNAME @s NVARCHAR(MAX) = N'';
DECLARE indexes CURSOR LOCAL STATIC FORWARD_ONLY READ_ONLY FOR SELECT s = OBJECT_SCHEMA_NAME(o.[object_id]), o = o.name, i = i.name FROM sys.objects AS o INNER JOIN sys.indexes AS i ON o.[object_id] = i.[object_id] INNER JOIN ( SELECT [object_id], index_id, row_count = SUM(row_count) FROM sys.dm_db_partition_stats GROUP BY [object_id], index_id ) AS s ON o.[object_id] = s.[object_id] AND i.index_id = s.index_id WHERE o.is_ms_shipped = 0 AND i.index_id >= 1 ORDER BY s.row_count DESC, s, o, i;
OPEN indexes;
FETCH NEXT FROM indexes INTO @schema_name, @object_name, @index_name;
WHILE @@FETCH_STATUS = 0 BEGIN -- we're just concatenating here, but pretend we needed to, -- say, call a stored procedure for each row in the cursor:
SET @s += CHAR(13) + CHAR(10) + N'ALTER INDEX ' + QUOTENAME(@index_name) + ' ON ' + QUOTENAME(@schema_name) + '.' + QUOTENAME(@object_name) + ' REORGANIZE;';
FETCH NEXT FROM indexes INTO @schema_name, @object_name, @index_name; END
CLOSE indexes; DEALLOCATE indexes;
|
Before you crucify me about the code sample, or tell me that I could do this without a cursor or a while loop, please understand that I wasn't really trying to solve this problem, I was only trying to come up with a simple string-building exercise that can be accomplished with a cursor. I know that a cursor isn't required to just return a concatenated string, for example you can use FOR XML PATH:
DECLARE @s NVARCHAR(MAX);
SELECT @s = ( SELECT CHAR(13) + CHAR(10) + 'ALTER INDEX ' + QUOTENAME(i) + ' ON ' + QUOTENAME(s) + '.' + QUOTENAME(o) + ' REORGANIZE;' FROM ( SELECT TOP (1000000) s = OBJECT_SCHEMA_NAME(o.[object_id]), o = o.name, i = i.name FROM sys.objects AS o INNER JOIN sys.indexes AS i ON o.[object_id] = i.[object_id] INNER JOIN ( SELECT [object_id], index_id, row_count = SUM(row_count) FROM sys.dm_db_partition_stats GROUP BY [object_id], index_id ) AS s ON o.[object_id] = s.[object_id] AND i.index_id = s.index_id WHERE o.is_ms_shipped = 0 AND i.index_id >= 1 ORDER BY s.row_count DESC, s, o, i ) AS x FOR XML PATH(''), TYPE ).value('.[1]', 'NVARCHAR(MAX)');
|
Or even simpler concatenation:
DECLARE @s NVARCHAR(MAX) = N'';
SELECT @s += CHAR(13) + CHAR(10) + 'ALTER INDEX ' + QUOTENAME(i) + ' ON ' + QUOTENAME(s) + '.' + QUOTENAME(o) + ' REORGANIZE;' FROM ( SELECT TOP (1000000) s = OBJECT_SCHEMA_NAME(o.[object_id]), o = o.name, i = i.name FROM sys.objects AS o INNER JOIN sys.indexes AS i ON o.[object_id] = i.[object_id] INNER JOIN ( SELECT [object_id], index_id, row_count = SUM(row_count) FROM sys.dm_db_partition_stats GROUP BY [object_id], index_id ) AS s ON o.[object_id] = s.[object_id] AND i.index_id = s.index_id WHERE o.is_ms_shipped = 0 AND i.index_id >= 1 ORDER BY s.row_count DESC, s, o, i ) AS x;
|
(While it's true that these alternatives don't use an explicit cursor or while loop, so may appear to be "set-based," it is important to note that in these approaches the actual output order is not guaranteed. And if you think this is accomplished under the hood without a cursor-like operation, you're only fooling yourself. :-))
Again, the point of the exercise was not to eliminate the cursor, but to see if a simpler, more straightforward while loop could achieve the same result. One of the complications of the desired result is that I want the resulting output to order the indexes by row_count descending. So there is no unique, incrementing column or index to take advantage of, like you might be able to do if you have an IDENTITY column and you don't care about order otherwise. So this was a potential wrench I threw in intentionally because, in my experience, you will often care about order and you won't always have the luxury of a column inherent in the data set that supports the desired order of processing.
I'll show the four while loops that were submitted. They're all quite similar, using either #temp tables or @table variables to hold the intermediate data set before looping. I'm not going to reveal who submitted these approaches, but rather simply label them Colleague #1, #2, #3 and #4 (both to keep them straight and to protect the innocent). I hope they aren't offended that I took a few liberties with their syntax (mostly indenting and statement termination). These are all fairly similar constructs, and in fact I received a 5th contribution just prior to publishing - I didn't include it because it didn't have any elements not present below.
Colleague #1
DECLARE @indexes TABLE ( s SYSNAME, o SYSNAME, i SYSNAME, c BIGINT, rn INT );
INSERT @indexes(s, o, i, c, rn) SELECT s, o, i, c, rn = ROW_NUMBER() OVER (ORDER BY c DESC, s, o, i) FROM ( SELECT s = OBJECT_SCHEMA_NAME(o.[object_id]), o = o.name, i = i.name, c = s.row_count FROM sys.objects AS o INNER JOIN sys.indexes AS i ON o.[object_id] = i.[object_id] INNER JOIN ( SELECT [object_id], index_id, row_count = SUM(row_count) FROM sys.dm_db_partition_stats GROUP BY [object_id], index_id ) AS s ON o.[object_id] = s.[object_id] AND i.index_id = s.index_id WHERE o.is_ms_shipped = 0 AND i.index_id >= 1 ) AS x;
DECLARE @min INT, @max INT, @cur INT = 0, @s NVARCHAR(MAX) = N'';
SELECT @min = MIN(rn), @max = MAX(rn) FROM @indexes;
WHILE (@cur < @max) BEGIN SET @cur += 1;
SELECT @s += CHAR(13) + CHAR(10) + 'ALTER INDEX ' + QUOTENAME(i) + ' ON ' + QUOTENAME(s) + '.' + QUOTENAME(o) + ' REORGANIZE;' FROM @indexes WHERE rn = @cur; END
|
Colleague #2
SELECT RowID = ROW_NUMBER() OVER (ORDER BY s.row_count DESC, OBJECT_SCHEMA_NAME(o.[object_id]), o.name, i.name), s = OBJECT_SCHEMA_NAME(o.[object_id]), o = o.name, i = i.name INTO #Temp FROM sys.objects AS o INNER JOIN sys.indexes AS i ON o.[object_id] = i.[object_id] INNER JOIN ( SELECT [object_id], index_id, row_count = SUM(row_count) FROM sys.dm_db_partition_stats GROUP BY [object_id], index_id ) AS s ON o.[object_id] = s.[object_id] AND i.index_id = s.index_id WHERE o.is_ms_shipped = 0 AND i.index_id >= 1 ORDER BY s.row_count DESC, s, o, i;
DECLARE @CurrentRowID INT, @s NVARCHAR(MAX) = N'';
SELECT @CurrentRowID = MIN(RowID) FROM #Temp;
WHILE @CurrentRowID IS NOT NULL BEGIN SELECT @s += CHAR(13) + CHAR(10) + N'ALTER INDEX ' + QUOTENAME(i) + ' ON ' + QUOTENAME(s) + '.' + QUOTENAME(o) + ' REORGANIZE;' FROM #Temp WHERE RowID = @CurrentRowID;
SELECT @CurrentRowID = MIN(RowID) FROM #Temp WHERE RowID > @CurrentRowID; END
DROP TABLE #Temp;
|
Colleague #3
DECLARE @Temp TABLE ( RowID INT IDENTITY PRIMARY KEY, SchemaName SYSNAME, ObjectName SYSNAME, IndexName SYSNAME );
INSERT INTO @Temp (SchemaName, ObjectName, IndexName) SELECT s = OBJECT_SCHEMA_NAME(o.[object_id]), o = o.name, i = i.name FROM sys.objects AS o INNER JOIN sys.indexes AS i ON o.[object_id] = i.[object_id] INNER JOIN ( SELECT [object_id], index_id, row_count = SUM(row_count) FROM sys.dm_db_partition_stats GROUP BY [object_id], index_id ) AS s ON o.[object_id] = s.[object_id] AND i.index_id = s.index_id WHERE o.is_ms_shipped = 0 AND i.index_id >= 1 ORDER BY s.row_count DESC, s, o, i;
DECLARE @CurrentRowID INT, @s NVARCHAR(MAX) = N'';
SELECT @CurrentRowID = MIN(RowID) FROM @Temp;
WHILE @CurrentRowID IS NOT NULL BEGIN SELECT @s += CHAR(13) + CHAR(10) + N'ALTER INDEX ' + QUOTENAME(IndexName) + ' ON ' + QUOTENAME(SchemaName) + '.' + QUOTENAME(ObjectName) + ' REORGANIZE;' FROM @Temp WHERE RowID = @CurrentRowID;
SELECT @CurrentRowID = MIN(RowID) FROM @Temp WHERE RowID > @CurrentRowID; END
|
Colleague #4
DECLARE @Row INT = 1, @s NVARCHAR(MAX) = N'';
CREATE TABLE #IndexList ( RowID INT IDENTITY(1,1) ,SchemaName SYSNAME ,TableName SYSNAME ,IndexName SYSNAME );
INSERT INTO #IndexList ( SchemaName , TableName , IndexName ) SELECT s = OBJECT_SCHEMA_NAME(o.[object_id]), o = o.name, i = i.name FROM sys.objects AS o INNER JOIN sys.indexes AS i ON o.[object_id] = i.[object_id] INNER JOIN ( SELECT [object_id], index_id, row_count = SUM(row_count) FROM sys.dm_db_partition_stats GROUP BY [object_id], index_id ) AS s ON o.[object_id] = s.[object_id] AND i.index_id = s.index_id WHERE o.is_ms_shipped = 0 AND i.index_id >= 1 ORDER BY s.row_count DESC, s, o, i;
WHILE @Row <= (SELECT MAX(RowID) FROM #IndexList) BEGIN SELECT @s += CHAR(13) + CHAR(10) + N'ALTER INDEX ' + QUOTENAME(IndexName) + ' ON ' + QUOTENAME(SchemaName) + '.' + QUOTENAME(TableName) + ' REORGANIZE;' FROM #IndexList WHERE RowID = @Row;
SET @Row = @Row + 1; END
DROP TABLE #IndexList;
|
I validated that the resulting value of @s matched the cursor result in all four cases. I don't know about you, and maybe I'm just being dense, but I had to
think a lot harder about that than I would have with a cursor,
cumbersome syntax and all. As far as I'm concerned, based on these four submissions from esteemed colleagues, constructing a while loop to solve this problem isn't any easier or more straightforward than a cursor. Especially since a DECLARE CURSOR block can easily be constructed for you using Management Studio's templates (or the new snippets feature in SQL Server 2012).
But what about performance? Surely a cursor is slower than a while loop, because it's a cursor, right?
Performance
As I mentioned in a previous article, the cursor usually gets a bad rap for two reasons: (1) folks think a cursor is bad simply because it says DECLARE CURSOR, and (2) people use a cursor (or any row-by-row processing methodology) when it isn't necessary. In cases where it is the best option (e.g. running totals) or it is necessary (e.g. maintenance tasks, or cases where a stored procedure call must be made for every row or conditionally depend on processing that occurred for the previous row), people just say DECLARE c CURSOR FOR ... when they should usually be using the most efficient cursor declaration possible (LOCAL STATIC FORWARD_ONLY READ_ONLY). And yes, some will argue that FORWARD_ONLY READ_ONLY could be replaced by FAST_FORWARD, but I prefer the former, because the latter can't be combined with STATIC:
Msg 1048, Level 15, State 1, Line 1 Conflicting cursor options STATIC and FAST_FORWARD.
|
Now, you are more than welcome to do your own tests with the various
cursor options, to see which perform best in your scenario, but I've
always come back to the set of options described above. I put my cursor code above into a stored procedure, as well as the two "set-based" concatenations, and the four colleague-provided options. I then ran some performance tests where I measured just duration of each method, executing 1000 times (I ran the middle section three times, then took the average):
SET NOCOUNT ON; GO CREATE TABLE #stats ( rownum INT IDENTITY(1,1), procname SYSNAME, dt DATETIME2 ); GO
INSERT #stats(procname,dt) SELECT '-', SYSDATETIME(); GO EXEC dbo.while_cursor; GO 1000 INSERT #stats(procname,dt) SELECT 'cursor', SYSDATETIME(); GO EXEC dbo.while_xmlconcat; GO 1000 INSERT #stats(procname,dt) SELECT 'xml concat', SYSDATETIME(); GO EXEC dbo.while_simpleconcat; GO 1000 INSERT #stats(procname,dt) SELECT 'simple concat', SYSDATETIME(); GO EXEC dbo.while_colleague1; GO 1000 INSERT #stats(procname,dt) SELECT 'colleague 1 while', SYSDATETIME(); GO EXEC dbo.while_colleague2; GO 1000 INSERT #stats(procname,dt) SELECT 'colleague 2 while', SYSDATETIME(); GO EXEC dbo.while_colleague3; GO 1000 INSERT #stats(procname,dt) SELECT 'colleague 3 while', SYSDATETIME(); GO EXEC dbo.while_colleague4; GO 1000 INSERT #stats(procname,dt) SELECT 'colleague 4 while', SYSDATETIME(); GO
-- using the new LAG functionality in SQL Server 2012:
SELECT procname, duration = DATEDIFF(MILLISECOND, LAG(dt, 1, NULL) OVER (ORDER BY rownum), dt) FROM #stats ORDER BY rownum; GO
DROP TABLE #stats; GO
|
The results (the "winner" highlighted in green):
The XML and simple concatenation approaches aside, the cursor clearly outperformed all of the while loops. This may not necessarily be true in simpler scenarios, so I'm not suggesting that a cursor will always be as good as or better than a while loop. But by the same token, I think I've demonstrated that a while loop isn't always faster or easier to write and understand than a cursor.
I would love to hear your experience, especially any examples where a while loop clearly outperforms a cursor or where it is much easier to write or comprehend.
Comment Notification
If you would like to receive an email when updates are made to this post, please register here
Subscribe to this post's comments using
About AaronBertrand
...about me...
|
|
|
|
|