Under the description for UPDATE in SQL Server 2000 and 2005 Books Online, you can find the following statement (thanks to SQL Server MVP Steve Kass for pointing me to this passage):
The results of an UPDATE statement are undefined if the statement includes a FROM clause that is not specified in such a way that only one value is available for each column occurrence that is updated, that is if the UPDATE statement is not deterministic.
BOL goes on to give a simple example of such an UPDATE. However, the example doesn't give you a real feel for the nondeterministic behavior. Although the result is supposed to be undefined and indeed you don't know what value UPDATE will end up assigning, the result does come out consistently the same no matter how many times you execute it. At least during my limited number of test runs, UPDATE seems to always pick the first value. But of course you can't count on that because a different execution plan may pick a different value.
Still, it would be nice to see a nondeterministic UPDATE statement actually picks a different value to drive it home that you should not use it in your app, unless of course your app logic wants to take advantage of this particular non-determinism (which is difficult to imagine).
Recently, however, I happened to run into an interesting case where the nondeterministic behavior of UPDATE can be easily reproduced on a multi-core server when query parallelism is used. Here's the script to see the behavior.
First, run the following script to create two tables and a stored procedure in any user database on a SQL Server 2005 instance:
CREATE TABLE t1 (
c1 int NULL,
c2 varchar(50) NULL,
rate int NULL
)
go
CREATE INDEX c1_ind ON t1 (c1)
go
CREATE TABLE t2 (
c1 int NULL,
c2 varchar(50) NULL,
rate int NOT NULL
)
go
drop proc testProc
go
create proc testProc
as
set nocount on
begin tran
UPDATE t1
SET rate = t2.rate
FROM t1 With (Index (c1_ind)), t2
WHERE t1.c1 = t2.c1
and t1.c2 = t2.c2
SELECT 'count_0' = count(*) FROM t1 WHERE rate = 0;
SELECT 'count_1' = count(*) FROM t1 WHERE rate = 1;
SELECT 'rate_sum' = sum(rate) FROM t1
rollback tran
go
Second, populate the two tables with data using the following script:
WITH tmp (c1, c2,c3) as (
SELECT 1,1,1
UNION ALL
SELECT cast(rand(c3)*1000000 as int)%200,
cast(rand(c3)*1000000 as int)%200,
c3 + 1
FROM tmp
WHERE c3 < 100000
)
INSERT t1
SELECT c1, CAST(c2 as varchar(50)), NULL
FROM tmp
OPTION (maxrecursion 0);
WITH tmp (c1, c2, c3) as (
SELECT 0,0,0
UNION ALL
SELECT c1 + 1, c2 + 1, c3 + 1
FROM tmp
WHERE c1 < 200
)
INSERT t2
SELECT cast(c1 as char(6)),
cast(c2 as varchar(50)),
0 -- c3 value = 0
FROM tmp
OPTION (maxrecursion 0);
-- give each c1 value a second c3 value = 1
INSERT t2
SELECT c1, c2, 1
FROM t2;
go
Finally, run the stored procedure multiple times to see that different results are produced:
EXEC testProc;
Apparently, nondeterminism in query processing gives rise to the different results. A few observations are noteworthy:
- The non-determinstic behavior is not a result of different query execution plan. At least, the reported execution plan remains the same for all my test runs.
- If you disable query parallelism by setting max degree of parallelism to 1, the stored procedure will produce the same results.
- On a SQL Server 2000 instance, you probably won't get a parallel plan, and thus won't see different results.
- The size of the table t1 is significant. On my test server, populating table t1 with 70,000 still leads to a parallel query plan, thus different results in multiple executions of the stored procedure. But if I only populate the table with 50,000 rows, I don't get an query parallelism, and I get consistent results. So SQL Server query optimizer is sensitive to the table size in deciding whether to use a parallel plan.