When you process your rows one by one in a loop, you need to
make sure that your loop never runs infinitely. I will describe three scenarios
with possible infinite loops. There have been many articles discussing feasibility
of loops vs. set-based solutions – I will not repeat any of these arguments. Let
us assume that you need to use some kind of a loop.
One more disclaimer: I am demonstrating situations when
infinite loops are possible, and I prefer to give the simplest possible
examples. I am not demonstrating scenarios where loops are preferable to set
based solutions, so please do not consider my short examples as such.
Both SET and SELECT may leave the value of your variable
unchanged.
See for yourself, the following examples demonstrates both
cases:
DECLARE @i1 INT, @i2 INT, @i3 INT;
SELECT @i1=-1, @i2=-1, @i3=-1;
PRINT 'the following SELECT will not change the value of @i1'
PRINT 'because the result set is empty'
SELECT @i1=1 WHERE 1=2;
SELECT @i1 AS [@i1];
PRINT 'the following SET will change the value of @i2 to NULL'
PRINT 'because the result set is empty'
SET @i2=(SELECT 1 WHERE 1=2);
SELECT @i2 AS [@i2];
PRINT 'the following SELECT will not change the value of @i3'
PRINT 'because it will raise an exception'
SET @i3=(SELECT 1 UNION ALL SELECT 2);
SELECT @i3 AS [@i3];
the following SELECT will not
change the value of @i1
because the result set is
empty
@i1
-----------
-1
(1 row(s) affected)
the following SET will change
the value of @i2 to NULL
because the result set is
empty
@i2
-----------
NULL
(1 row(s) affected)
the following SELECT will not
change the value of @i3
because it will raise an
exception
Msg 512, Level 16, State 1,
Line 13
Subquery returned more than 1
value. This is not permitted when the subquery follows =, !=, <, <= ,
>, >= or when the subquery is used as an expression.
@i3
-----------
-1
(1 row(s) affected)
This is expected behavior, but you need to be aware of it –
otherwise you may end up with an infinite loop.
Make sure you have read Tony Rogerson's post on the topic.
When SELECT fails to change the value of your variable and
the loop runs infinitely
Consider the following table, sample data, and stored
procedure:
CREATE TABLE [data].[Orders](
[OrderID] [int] NOT
NULL,
[OrderDate] [datetime] NOT NULL,
IsProcessed CHAR(1) NOT NULL,
CONSTRAINT
[PK_Orders] PRIMARY KEY
CLUSTERED
(
[OrderID] ASC
)WITH (IGNORE_DUP_KEY = OFF)
)
GO
DELETE FROM [data].[Orders];
INSERT [data].[Orders](
[OrderID],
[OrderDate],
IsProcessed)
SELECT 1, '20090420', 'N' UNION ALL
SELECT 2, '20090421', 'N' UNION ALL
SELECT 3, '20090422', 'N';
CREATE PROCEDURE dbo.ProcessNOrders
@IDsIntervalSize INT
AS
DECLARE @minID INT, @ID INT;
SELECT @minID=MIN([OrderID]), @ID=MIN([OrderID]) FROM [data].[Orders];
WHILE @ID<(@minID+@IDsIntervalSize) BEGIN
UPDATE [data].[Orders]
SET
IsProcessed='Y'
WHERE
[OrderID] = @ID;
SELECT TOP 1 @ID=[OrderID]
FROM [data].[Orders]
WHERE
IsProcessed='N'
ORDER BY [OrderID];
PRINT @ID;
END;
-- this call succeeds
EXEC dbo.ProcessNOrders 2;
GO
-- restore the data
UPDATE [data].[Orders]
SET
IsProcessed='N';
GO
-- this call processes 3 orders and then runs infinitely
-- cancel it
EXEC dbo.ProcessNOrders 10;
A quick fix is very simple: just make sure that your
variable always changes its value before the assignment, as follows:
ALTER PROCEDURE dbo.ProcessNOrders
@IDsIntervalSize INT
AS
DECLARE @minID INT, @ID INT;
SELECT @minID=MIN([OrderID]), @ID=MIN([OrderID]) FROM [data].[Orders];
WHILE @ID<(@minID+@IDsIntervalSize) BEGIN
UPDATE [data].[Orders]
SET
IsProcessed='Y'
WHERE
[OrderID] = @ID;
SET @ID = NULL;
SELECT TOP 1 @ID=[OrderID]
FROM [data].[Orders]
WHERE
IsProcessed='N'
ORDER BY [OrderID];
PRINT @ID;
END;
GO
UPDATE [data].[Orders]
SET
IsProcessed='N';
GO
-- this call processes 3 orders and then completes
EXEC dbo.ProcessNOrders 10;
You can also replace your SELECT with a SET – that will also
fix the problem:
ALTER PROCEDURE dbo.ProcessNOrders
@IDsIntervalSize INT
AS
DECLARE @minID INT, @ID INT;
SELECT @minID=MIN([OrderID]), @ID=MIN([OrderID]) FROM [data].[Orders];
WHILE @ID<(@minID+@IDsIntervalSize) BEGIN
UPDATE [data].[Orders]
SET
IsProcessed='Y'
WHERE
[OrderID] = @ID;
SET @ID = (
SELECT TOP 1
[OrderID]
FROM
[data].[Orders]
WHERE
IsProcessed='N'
ORDER
BY [OrderID]);
PRINT @ID;
END;
When an assignment throws an exception and fails to change
the value of your variable
Create the following objects:
CREATE VIEW dbo.LongestWaitingNotProcessedOrder
AS
SELECT
[OrderID],
[OrderDate],
IsProcessed
FROM [data].[Orders]
WHERE
[OrderDate] = (SELECT MIN([OrderDate]) FROM [data].[Orders] WHERE IsProcessed='N')
AND
IsProcessed='N'
GO
CREATE PROCEDURE dbo.ProcessOrder
@OrderID SMALLINT
AS
SET NOCOUNT ON;
UPDATE [data].[Orders] SET
IsProcessed='Y'
WHERE
[OrderID]=@OrderID;
RETURN @@ERROR;
GO
The following loop will correctly complete if no two
unprocessed orders have the same order date:
UPDATE [data].[Orders]
SET
IsProcessed='N';
DECLARE @ID INT;
SET @ID=-1;
WHILE @ID IS NOT NULL BEGIN
SET @ID=(SELECT [OrderID]
FROM dbo.LongestWaitingNotProcessedOrder);
PRINT @ID;
EXEC dbo.ProcessOrder @OrderID=@ID;
END
As soon as you have two unprocessed orders with the same
order date, the assignment will blow up and fail to change the value of the
variable, resulting in an infinite loop. Run this script, then rerun the above loop
and see for yourself that it runs infinitely:
UPDATE [data].[Orders]
SET
IsProcessed='N';
INSERT [data].[Orders](
[OrderID],
[OrderDate],
IsProcessed)
SELECT 4, '20090421', 'N';
Again there is an easy fix. You can just wrap your loop in a
TRY…CATCH block, as follows:
DECLARE @ID INT;
SET @ID=-1;
BEGIN TRY
WHILE @ID IS NOT NULL BEGIN
SET @ID=(SELECT [OrderID]
FROM
dbo.LongestWaitingNotProcessedOrder);
PRINT @ID;
EXEC dbo.ProcessOrder
@OrderID=@ID;
END
END TRY BEGIN CATCH
SELECT
ERROR_NUMBER(), ERROR_MESSAGE();
END CATCH
Now this loop will complete after the first exception. Yet
if your stored procedure sometimes fails to process an order, you can still get
an infinite loop. I will not describe this scenario in this post.
Reconsider a set-based solution
As you have seen, developing robust loops in Transact SQL
is quite involved, it does have some gotchas. You might want to reconsider a set-based solution instead of
your loop.
Use a table variable
The following approach is also robust:
DECLARE @orderIDs TABLE(n INT, ID INT);
INSERT INTO @orderIDs(n, ID)
SELECT ROW_NUMBER() OVER(ORDER BY [OrderDate]),
[OrderID]
FROM [data].[Orders] WHERE
IsProcessed='N';
DECLARE @ID INT, @n INT, @MaxN INT;
SELECT @n=1, @MaxN = MAX(n) FROM @orderIDs;
WHILE @n <= @MaxN BEGIN
SELECT @ID = ID FROM @orderIDs WHERE n=@n;
PRINT @ID;
EXEC dbo.ProcessOrder @OrderID=@ID;
SET @n=@n+1;
END
Even if your stored procedure sometimes fails to process an
order, your loop will still complete – it will try to process each order only
once.
Here is the first part:Avoiding infinite loops. Part One