THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Aaron Bertrand

Aaron is a Product Evangelist for SentryOne, makers of Plan Explorer and a performance monitoring and event management platform for the Microsoft Data Platform and VMware. He has been blogging here at sqlblog.com since 2006, focusing on manageability, performance, and new features, and also blogs at blogs.sentryone.com and SQLPerformance.com; has been a Microsoft MVP since 1997; tweets as @AaronBertrand; and speaks frequently at major conferences, user group meetings, and SQL Saturday events worldwide.

Deprecate UPDATE FROM? Not if I can help it!

Fellow MVP Hugo Kornelis (blog) has suggested that the proprietary UPDATE FROM and DELETE FROM syntax, which has worked for several SQL Server versions, should be deprecated in favor of MERGE.  Here is the Connect item he raised:

#332437 : Deprecate UPDATE FROM and DELETE FROM

As you can see, the response is quite divided (more so than any other item that I can recall) - at the time of writing, it was 11 up-votes and 12 down-votes.  I have no shame in admitting that I am one of the people who down-voted Hugo's suggestion back when it was originally posted.  And this blog post may seem like I'm replying quite late (almost three years after the fact), but the truth is that this debate was re-hashed recently due to a DELETE FROM issue that turned out to be a simple aliasing problem.  But nonetheless, it got my gears spinning, so...


Why don't I give my love to MERGE?


The proprietary syntax works fine, in most cases.
  I find UPDATE FROM to be a very elegant way to update the contents of a table based on a join to one or more other tables.  Let's say I have an Orders table and an OrderDetails table.  Since each OrderDetails item may be shipped and handled (and even canceled) separately, there is a [Status] column in the OrderDetails table.  And let's also say that we have discovered that Customer #1 would like to cancel all of his pending orders.  There is no built-in cascade option that will cover this scenario, and we want to make sure that OrderDetails isn't inspected directly to see the (now out-of-date) status on any of this customer's line items.  Before updating the parent row to 'Canceled', I would write an UPDATE FROM like this to affect all of the "child" rows:

UPDATE od
    
SET od.[Status] = 'Canceled'
    
FROM dbo.OrderDetails AS od
    
INNER JOIN dbo.Orders AS o
    
ON od.OrderID = o.OrderID
    
WHERE o.CustomerID = @CustomerID
    
AND o.[OrderStatus] = 'Pending';

ANSI (and --CELKO--, no doubt) would have me change it to this:

UPDATE dbo.OrderDetails
    
SET [Status] = 'Canceled'
    
WHERE EXISTS
    (
      
SELECT *
        
FROM dbo.Orders AS o
        
WHERE o.OrderID = dbo.OrderDetails.OrderID
        
AND o.CustomerID = @CustomerID
        
AND o.OrderStatus = 'Pending'
    
);

Or perhaps a CTE:

;WITH cte AS
(
    
SELECT od.[Status]
      
FROM dbo.Orders AS o
      
INNER JOIN dbo.OrderDetails AS od
      
ON o.OrderID = od.OrderID
      
WHERE o.CustomerID = @CustomerID
      
AND o.OrderStatus = 'Pending'
)
UPDATE cte SET [Status] = 'Canceled';

And Hugo would apparently rather see this (and wait 20 minutes for me to figure out how to write it):

MERGE dbo.OrderDetails AS od
USING
(
    
SELECT OrderID
      
FROM dbo.Orders
      
WHERE CustomerID = @CustomerID
      
AND OrderStatus = 'Pending'
) AS o
ON od.OrderID = o.OrderID
WHEN MATCHED THEN
    UPDATE SET
od.[Status] = 'Canceled';

While completely subjective, and this only represents how I feel at this exact moment in time, I would much rather write the first piece of code than ever see the latter three examples.  The primary reason is that I can comment out the first two lines of the proprietary UPDATE FROM, and change the entire query to a SELECT if I want to check that my JOIN results are correct:

SELECT *
--UPDATE od
--    SET od.[Status] = 'Canceled'
    
FROM dbo.OrderDetails AS od
    
INNER JOIN dbo.Orders AS o
    
ON od.OrderID = o.OrderID
    
WHERE o.CustomerID = @CustomerID
    
AND o.[OrderStatus] = 'Pending';


MERGE is not perfect, either.
  I will concede to Hugo that there are cases where you can write an UPDATE FROM statement that would lead to unpredictable results, but I will submit that you can render a lot of unpredictable results with ANSI-compliant syntax as well (I'll show one below). In the meantime, here are just a few *still active* Connect items (or that are closed as "Won't Fix") where users have discovered bugs in the way that MERGE has been implemented.  Most importantly, that it has problems with filtered indexes, and causes unexpected @@ROWCOUNT results in triggers:

#620367 : MERGE fails to update with filtered index in place

#298395 : Katmai : Merge does not distinguish rowcounts in triggers

#476577 : MERGE can corrupt SCOPE_IDENTITY()

#340578 : Poor error message with MERGE when source/target appear in impossible places

#539084 : Search condition on a non-key column and an ORDER BY in source derived table breaks MERGE completely

#581548 : SQL2008 R2 Merge statement with only table variables fails

#582454 : [T-SQL] Proc with invalid MERGE statement gets compiled

And here are a few cautions from Books Online about MERGE that require a lot of parsing and may raise doubt about its use in your scenario:

CAUTION
It is important to specify only the columns from the target table that are used for matching purposes. That is, specify columns from the target table that are compared to the corresponding column of the source table. Do not attempt to improve query performance by filtering out rows in the target table in the ON clause, such as by specifying AND NOT target_table.column_x = value. Doing so may return unexpected and incorrect results.
CAUTION
Specifying READPAST with WHEN NOT MATCHED [ BY TARGET ] THEN INSERT may result in INSERT operations that violate UNIQUE constraints

 
I'll say all this a different way: the syntax is downright cumbersome.
  You've seen my above example, and no doubt the syntax diagram and examples from Books Online; it's not for the faint of heart. I took a quick poll on twitter and, while not pretending that it is a scientific result, all but one respondent admitted that to write a MERGE statement, they would have to look up the syntax in Books Online. And I was tempted to challenge the one respondent to a closed-book test. This is syntax that has been around for three years (and longer for those of us involved in the SQL Server 2008 beta), but is not easy to master AT ALL.


Shooting yourself in the foot - it's not just for UPDATE FROM

I'm all for deprecation, when warranted (TIMESTAMP or the use of sys.objects, anyone?).  Look, Hugo is a very smart guy, and I respect him greatly, but his reasons for deprecating this syntax just don't hold up. He says that "improperly code join criteria  You can shoot yourself in the foot with any standard syntax, too. A very common example is to not properly correlate the EXISTS clause, in which case ALL rows in OrderDetails are updated:

UPDATE dbo.OrderDetails
    
SET [Status] = 'Canceled'
    
WHERE EXISTS
    (
        
SELECT *
          
FROM dbo.Orders AS o
          
WHERE o.OrderID = o.OrderID -- bad correlation: no error, no warning!
          
AND o.CustomerID = @CustomerID
          
AND o.OrderStatus = 'Pending'
    
);

And you can do something quite similar with a CTE as well:

;WITH cte AS
(
    
SELECT od.[Status]
      
FROM dbo.Orders AS o
      
INNER JOIN dbo.OrderDetails AS od
      
ON o.OrderID = o.OrderID -- bad correlation: no error, no warning!
      
WHERE o.CustomerID = @CustomerID
      
AND o.OrderStatus = 'Pending'
)
UPDATE cte SET [Status] = 'Canceled';

A major benefit of MERGE, as Hugo points out, is that an error message is produced if you accidentally program the same bad correlation:

MERGE dbo.OrderDetails AS od
USING
(
    
SELECT OrderID
      
FROM dbo.Orders
      
WHERE CustomerID = @CustomerID
      
AND OrderStatus = 'Pending'
) AS o
ON o.OrderID = o.OrderID -- bad correlation
WHEN MATCHED THEN
  UPDATE SET
od.[Status] = 'Canceled';

Result:

Msg 8672, Level 16, State 1, Line 5
The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.

So, MERGE will better protect you from shooting yourself in the foot.  But if you're not prone to shooting yourself in the first place...


Conclusion

Hugo also suggests  simply removing this construct is not going to magically free up hours and hours of development time that will be better spent elsewhere.  How much time does he really believe the development team is wasting on UPDATE FROM today? 

I'm not vehemently opposed to MERGE.  I think we should all become familiar with the syntax at some point, and in fact I've implemented it in a few places - but not as a replacement for UPDATE FROM, but rather to get rid of multi-statement "UPSERT" constructs like:

UPDATE ... ;
IF @@ROWCOUNT = 0
    
INSERT ... ;

... or ...

IF EXISTS (...)
    
UPDATE ... ;
ELSE
    INSERT
...;

To me, the "UPSERT" scenario is squarely in the wheelhouse of MERGE, as opposed to "UPDATE FROM."  I just think that it should present the right benefits so that you'll want to use it on your own schedule, rather than when the dev team decides it's time to deprecate the method you're currently using to achieve the same result. Just be careful to use HOLDLOCK when using MERGE for more than one operation - for the guts on why, see this blog post from Dan Guzman.

If we're going to do anything with the way UPDATE FROM works, how about narrowing it down to something more specific and troublesome, such as the "quirky update" syntax?  I think there would be a lot less opposition to having this kind of code raise an 'Incorrect syntax' error, and replace it with better windowed aggregate support:

CREATE TABLE dbo.OrderHistory
(
    
OrderID      INT PRIMARY KEY,
    
OrderTotal   DECIMAL(6,2) NOT NULL,
    
RunningTotal DECIMAL(8,2) NULL
);
GO

INSERT dbo.OrderHistory(OrderID, OrderTotal)
          
SELECT 1,   12.74
    
UNION SELECT 2,  213.55
    
UNION SELECT 3,   67.44;
GO

DECLARE @RunningTotal DECIMAL(8,2) = 0.00;

UPDATE dbo.OrderHistory
    
SET @RunningTotal = RunningTotal = @RunningTotal + OrderTotal
    
FROM dbo.OrderHistory;
GO

SELECT OrderID, OrderTotal, RunningTotal
    
FROM dbo.OrderHistory;
GO

DROP TABLE dbo.OrderHistory;
GO

Having these same running totals aggregated via an OVER() clause - now THAT is something where I can justify the dev team spending more time.

 

Published Monday, January 24, 2011 5:30 PM by AaronBertrand

Comments

 

Richard L. McCutchen said:

I personally think removing UPDATE FROM would be a horrible move on Microsoft's part. As with any language one can shoot themselves in the foot with just about everything, so should all instances where this can happen, in all languages, just be removed?

UPDATE FROM is far easier to understand and write than the MERGE syntax, and I hope Microsoft sees the light and doesn't remove it from TSQL.

January 24, 2011 4:52 PM
 

Jerry said:

"(and wait 20 minutes for me to figure out how to write it)" - LOL.  This is the biggest reason MERGE has been met with a decided lack of enthusiasm with our developers.  No one can ever remember how to use the blasted thing, and they just graviate back to the simpler, and easier-to-use methods that work just as well.

January 24, 2011 5:17 PM
 

RichB said:

Of course, there would be 2 big consequences involved with deprecating it:

1. Few organisations would get round to upgrading to that version.

2. A massive boom in short terms contract work to rewrite the exceptions to the above points code base.

Of course it might also provide an additional stimulus for folk to make the break with SQL entirely - if they have to effectively rewrite it all they may as well take the plunge to one of the competitions products like their boards golf buddies keep suggesting...

January 25, 2011 9:43 AM
 

Alejandro Mesa said:

Aaron,

I agree 100% with you about the dev team dedicating time to support the OVER clause in full, instead focusing on deprecating UPDATE FROM or any other statement / command.

It would also be interesting, and perhaps productive, if the dev team also implement the support for the ANSI standard row value constructor, which could be very handy for multiple columns updates.

UPDATE dbo.T1

SET (c1, c2, c3) = (SELECT T2.c1, T2.c2, T2.c3

                    FROM T2

                    WHERE T2.keycol = T1.keycol)

WHERE keycol = @key;

https://connect.microsoft.com/SQLServer/feedback/details/299231/add-support-for-ansi-standard-row-value-constructors

Concidently, the suggestion was also made by Hugo and it got my vote.

Cheers,

AMB

January 25, 2011 11:55 AM
 

Michael K. Campbell said:

Yup. Totally agree that it would be dump to deprecate this option. Folks that want fuller ANSI compliance can code things that way. (But they're STILL going to run the code on SQL Server and any illusions about being able to magically just port that code to another RDBMS without all sorts of extra work are just delusions.)

And @RichB nails it: Making this change would impact SOOOOO much code already out there that it would definitely hinder sales.

Great post though (and I concur: my fav reason for the 'older' syntax is that it's so much easier to switch to a SELECT and then test).

January 25, 2011 12:16 PM
 

AaronBertrand said:

Yes, validating that the code works is huge.  I can't test a MERGE without either (a) making a copy of all the tables and data involved, or (b) wrapping everything in transactions.  Arguably we should be doing (b) anyway, but it's not necessary when you can easily test your assumptions by performing a SELECT instead of affecting data.

Now, it is worth it to test "the hard way" sometimes, especially in the UPSERT scenario.  But in a lot of cases I just can't justify the time sink.

January 25, 2011 12:25 PM
 

Joshua Kincaid said:

I recommend developers use a Orm. Entity framework will do it right everytime. No need to worry. Then SQL is more of a escape hatch then a hammer for every problem. Who needs a stored proc for every piece of crud? Heaven forbid if you have to add a field to a table. In that case your resigned to an afternoon of plumbing to make sure all you queries on that table are updated.

January 26, 2011 5:13 AM
 

Rhys said:

I like MERGE but I always have to lookup the syntax every time I use it. Load of people struggle with the UPDATE FROM syntax. Let's see how they cope when we throw MERGE into the mix.

Removing or deprecating the UPDATE / DELETE FROM syntax would be a mistake IMHO.

January 31, 2011 9:33 AM
New Comments to this post are disabled

About AaronBertrand

...about me...

This Blog

Syndication

Archives

Privacy Statement