THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Tibor Karaszi

Be careful with constraints calling UDFs

You might just not get what you think. I would be surprised if this hasn't been blogged already, but if so, it would be worth repeating. Here's the deal (example from a forum,, slightly re-worked):

I want the values in one column to be unique, assuming the value in another column is 1. Can I use an UDF for that?

On the surface, yes. You can write an UDF to wich you pass the value which should be conditionally unique and in that UDF check how many rows has this value AND othercolumn = 1. If more than 1 row, then function returns 0, else 1 (or something else to signal "OK" or "Not OK"). Now, you can call this function in a CHECK constraint. Something like CHECK(myFunction(uniqueCol) = 1). this will on the surface do its job, as long as you INSERT into the table. But if you update a row and only set the otherColumn for some row from 0 to 1, then the check constraint will not be checked. The optimizer is smart enough to understand that the update doesn't change anything that we refer to in our CHECK constraint, so why bother checking the constraint? End result here is that the constraint doesn't do what we want it to do. Use a trigger instead (or some other method). Here's a repro:

USE tempdb
GO
IF OBJECT_ID('t'IS NOT NULL DROP TABLE t
IF OBJECT_ID('t_uq'IS NOT NULL DROP FUNCTION t_uq
GO

CREATE TABLE t(c0 INTc1 NVARCHAR(50), c2 bit)
GO

CREATE FUNCTION t_uq(@c1 NVARCHAR(50))
RETURNS bit
AS
BEGIN
 DECLARE 
@ret bit
 
IF (SELECT COUNT(*) FROM WHERE c1 @c1 AND c2 1) > 1
   
SET @ret 0
 
ELSE 
   SET 
@ret 1
 
RETURN @ret
END
GO

ALTER TABLE ADD CONSTRAINT t_c CHECK(dbo.t_uq(c11)

INSERT INTO t(c0c1c2VALUES(1'a'0--OK
INSERT INTO t(c0c1c2VALUES(2'a'0--OK
INSERT INTO t(c0c1c2VALUES(3'b'1--OK
INSERT INTO t(c0c1c2VALUES(4'b'1--Fails

--So far so good, but watch now:

UPDATE SET c2 WHERE c0 2
--No error, the constraint doesn't do its job!

--We have invalid data:
SELECT FROM t

Published Thursday, December 17, 2009 7:28 PM by TiborKaraszi
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Alex Kuznetsov said:

Tibor,

This is wonderful! Can I include this example in my book?

December 17, 2009 1:17 PM
 

mjswart said:

One of the other methods you alluded to could be an indexed view with a unique constraint on it. Based on:

SELECT c1 FROM t WHERE c2 = 1

That seems pretty clean... What do you think?

December 17, 2009 1:19 PM
 

TiborKaraszi said:

Alex,

Thanks. Yes, of course you can. Can you share anything about the book (curious wants to know).

December 17, 2009 1:43 PM
 

Adam Machanic said:

Alex, what book are you writing?

December 17, 2009 1:45 PM
 

TiborKaraszi said:

Michael,

I don't think it is as straight foward to use indexed views (or filtered indexes in 2008) when we have two columns like in this case. The typical case for the trick you suggest is to enforce uniqueness unless we have some typical value (like unique unless the value is NULL). But here we have two values. Consider below:

1, a, 0

2, a, 0

Above is currently valid and will be allowed. But if we change "row 2" from 0 to 1, then the unique index will still allow it, since all the index will se would be the row with value 1:

2, a, 1

But above should not be allowed in combination with below:

1, a, 0

Of course, I'm all ears if I miss something here (slightly tired asfter full day of training). :-)

December 17, 2009 1:46 PM
 

Alejandro Mesa said:

Hi Tibor,

I believe we have this problem since UDFs were introduced in SQL Server 2000.

The main problem I see here, is that this constraint should be a table constraint so it could be verified for each insert / update in the table and not only when column [c1] has been affected.

There is something that I am missing here. Let us look in BOL for the definition / syntaxes of the "alter table" statement, to introduce a new constraint. To me, it looks like:

ALTER TABLE [ database_name . [ schema_name ] . | schema_name . ] table_name

ADD table_constraint

...

If we look at the "alter table" statement you used, then I would expect having a table constraint, but if we check sys.constraints:

SELECT *

FROM sys.check_constraints

WHERE parent_object_id  = object_id('t') and [name] = 't_c' and [type] = 'C';

GO

We will notice that the value for column [parent_column_id] is 2. This is telling us that the constraint is a column constraint.

No matter if we add the constraint during table creation, and we use the syntax for a table constraint:

CREATE TABLE t(

c0 INT,

c1 NVARCHAR(50),

c2 bit,

CONSTRAINT t_c CHECK(dbo.t_uq(c1) = 1)

)

GO

Still, the constraint is created as a column constraint.

- Why is SQL Server creating a column constraint in this case?

- Is there a way to tell SQL Server that we want to enforce this constraint at the table level?

AMB

December 17, 2009 2:17 PM
 

Brad Schulz said:

Maybe I'm missing something, but it seems that UPDATEing the value of c2 to 1 for the row where c0=2 is perfectly valid.  The 'a' is still unique among those rows where c2=1.

If you did the following from the beginning, the CHECK would allow it also:

INSERT 1,'a',0

INSERT 2,'a',1

The 'a' is still unique among the c2=1 rows.

I think what you meant to do in your UPDATE statement was to UPDATE WHERE c0<=2, not c0=2.  Then that would bring about the existence of two 'a' rows with c2=1.  (Or perhaps you meant to have your first INSERT of 'a' to be with a c2=1).

I get your point about the optimizer not bothering to check the constraint because there was no reference to c2 in the CHECK.  If you change the function to accept BOTH c1 and c2 as parameters (i.e. CHECK (dbo.t_uq(c1,c2)=1)), then it works as advertised.

December 17, 2009 2:20 PM
 

Adam Machanic said:

What we need are deferred constraints... The problem is that the UDF is evaluated once per affected row, rather than once for the entire set of rows, so you end up with a discrepancy. Deferred constraints would solve so many interesting data modeling problems--it's a shame that the SQL Server team considers it to be a very low priority item (no doubt due to the fact that most people using the product don't seem to use constraints at all).

December 17, 2009 2:23 PM
 

Brad Schulz said:

@Alejandro:

I think CHECK implies a constraint on a column by definition.  According to the ALTER TABLE table_constraint documentation:

"CHECK... Is a constraint that enforces domain integrity by limiting the possible values that can be entered into a COLUMN or COLUMNs."

I think the only way to enforce a true table constraint is via a trigger, as Tibor suggested.

December 17, 2009 2:27 PM
 

Alejandro Mesa said:

Brad,

> I think CHECK implies a constraint on a column by definition.

Not really, add column [c2] to the udf and you will see that now it is a table constraint.

AMB

December 17, 2009 2:35 PM
 

Alejandro Mesa said:

Adam,

Agreed, "deferred constraints" and "assertions" are two very important features that would be glad to see in future versions of SQL Server.

AMB

December 17, 2009 2:37 PM
 

Brad Schulz said:

@Alejandro:

It kind of acts like a table constraint because that's just how we defined the UDF to act... sort of.

Having a CHECK with a UDF that passes c1 as a parameter makes the optimizer think that it only needs to do the CHECK if there was some kind of change made to c1.  If you add c2 to as a parameter to the UDF, then the optimizer will do the CHECK if there was some kind of change made to EITHER c1 or c2.  If we leave c1 or c2 unchanged, and change some other column besides c1 or c2, then the optimizer wouldn't bother to do the CHECK at all.

At least I think that's what Tibor's point was.

December 17, 2009 2:45 PM
 

TiborKaraszi said:

Hi Alejandro,

Yes, this problem has been there since the 2000 days. I think that we all here agree that this is undesireable behavior by the database engine (my point in the blog was not to justify SQL Server's behaviour but more to draw attention to it). Whether or not to call it a bug, well - that would take some BOL reading and probably hot discussions with MS. I have a very strong feeling that MS are aware of this and considers it to be "by design". :-) Having deferred constraints as Adam suggests would be a nice way to handle this - perhaps deferrable at two levels (end-of-statement or end-of-transaction, where the later is AFAIK ANSI SQL).

I am not so convinced, however, that I agree with your table-level vs. column-level distinction. As I understood the difference between the two is that it is only cosmetics (SQL Syntax). I.e., it doesn't matter which one you end up with, they both do the very same job. And, unless my memory fades, I believe I did look in ANSI SQL many years ago, which supportes this (non-) distinction. Yes, here's a blurb from the ANSI SQL:1999 standard:

"

Ifa <column constraint definition> is specified that contains a <check constraint definition>

CCD, then it is equivalent to the following <table constraint definition>:

CNDCCDCA

"

So, assuming that there is no technical difference between table and column level constraints, then I have no issues with SQL Server converting one to the other...

December 17, 2009 2:46 PM
 

Alex Kuznetsov said:

Tibor,

Thanks! I'm writing a book about defensive programming with Transact SQL, it has a chapter about data integrity. In that chapter I have an example when a UDF in check constraint gives a false negative - it prohibits a valid modification. Your example of a false positive would fit nicely right after it.

I agree with others that deferred constraints and assertions would be very useful. Yet the ability to create check constraints on indexed views would solve quite a few problems too.

December 17, 2009 2:51 PM
 

TiborKaraszi said:

Hi Brad,

The rule here was that as soon as we have any row with 1, then it has to be the only row for that value. I.e., we can't have below two:

a, 0

a, 1

Now, I might have misunderstood the original poster's requrements (in the MSDN forum). If so, let's just call the requirements for this blog my prerogative to adjust requirements to suit my blog posts. ;-)

I'm also aware that having the function to accept both parameters might change things (but is it documented?, will it change over time?, etc). IMO, a very risky busniess to be in. Not to mention some other developer looking at the function and decide to "optimze" it a bit.

December 17, 2009 2:52 PM
 

Brad Schulz said:

>

>I'm also aware that having the function to accept both

>parameters might change things (but is it documented?, will

>it change over time?, etc). IMO, a very risky busniess to be in.

>Not to mention some other developer looking at the function and

>decide to "optimze" it a bit.

>

Too true... Agreed.

I didn't see the original post at the MSDN forum.  I was inferring what was desired by looking at the UDF definition... it counts the number of c2=1 rows in the table that have a c1 value equal to what we're inquiring about.

If the requirement is, as you say, "as soon as we have any row with 1, then it has to be the only row for that value", then the UDF MUST be rewritten to accept BOTH c1 and c2, because the newly-INSERTed row (or the freshly-UPDATEd row) may BECOME the row with c2=1.

If the passed @c2 is equal to 1, then the existence of ANY rows in the table with c1=@c1 makes the CHECK fail.  But if the passed @c2 is NOT equal to 1, then we must check the existence of rows in the table with c1=@c1 AND c2=1, and if any exist, then it should fail.

The UDF definition as it exists now just checks for uniqueness among the c2=1 rows and ONLY those rows.

This is why I got confused.  The UDF definition is inconsistent with the behavior of the expected output.

December 17, 2009 3:04 PM
 

TiborKaraszi said:

Alex,

I trust you will let us know when the book is getting close to release. I've enjoyed your blog posts on the topic!

(Another which can be scary is if you have a constraint which aggregates over some rows, and verifies that the sum is > 0, and you then DELETE a row (check constraint doesn't "fire" for DELETE) leaving some rows left with a negative value so that deletion of a row with positive value causes SUM to be negative.

Hmm, that wasn't a very clear description (me tired) - let me know if you want some code to show this, very easy to repro.)

December 17, 2009 3:07 PM
 

Alejandro Mesa said:

Tibor,

Try adding column [c2] to the UDF. Notice that we can not reference another column from a column constraint, so in this case the constraint is well understood as a table constraint. We do not need to change the body of the function, just passing [c2], and now the constraint works as expected, but it is doing the check row by row, for those that have been affected, which is not good.

No doubt it would be better having a deferred constraint, but there is no need to pass [c2] in order to tell SQL Server that this is a table constraint.

CREATE FUNCTION t_uq(@c1 NVARCHAR(50), @c2 bit)

RETURNS bit

AS

BEGIN

DECLARE @ret bit

IF (SELECT COUNT(*) FROM t WHERE c1 = @c1 AND c2 = 1) > 1

  SET @ret = 0

ELSE  

  SET @ret = 1

RETURN @ret

END

GO

ALTER TABLE t ADD CONSTRAINT t_c CHECK(dbo.t_uq(c1, c2) = 1);

GO

SELECT *

FROM sys.check_constraints

WHERE parent_object_id  = object_id('t') and [name] = 't_c' and [type] = 'C';

GO

...

UPDATE t SET c2 = 1 WHERE c0 <= 2

GO

Here is the error when we try to execute the update statement.

Msg 547, Level 16, State 0, Line 2

The UPDATE statement conflicted with the CHECK constraint "t_c". The conflict occurred in database "tempdb", table "dbo.t".

The statement has been terminated.

I agree that using a trigger in this case will be better.

AMB

December 17, 2009 3:12 PM
 

TiborKaraszi said:

Brad,

Ah, I just typed a long reply stating that I didn't underand your last reply. But I now see what I did. I had an incorrect example in a reply where I stated that two rows one with 0 and other with 1 should not be allowed. I think that is what you commented on - that is not consistent with my blog post requrements. So, depending on whether a row with value 1 need to be alone or can co-exist with a row with value 0, we might or might not need to pass both values to our function. Agreed.

December 17, 2009 3:21 PM
 

Alejandro Mesa said:

Tibor,

I am sorry. Please, disregard my last post.

I was testing the comment from Brad about "... where c0 <= 2" and left it in the code when testing the addition of [c2] to the udf.

AMB

December 17, 2009 3:24 PM
 

TiborKaraszi said:

Alejandro,

Yes, adding column 2 to the constraint definition (function call) will change the behavior. But that is not because the constraint is changed to a table constraint - it is because the constraint now "touches" column c2. So, SQL Server will somewhere internally now have a dependency for this constraint on column 2 and know that it need to "fire" the constraint whenever column 2 changes.

However, we're probably splitting hairs, over whether the reason is because it is a column/table level constraint or whether it is because the constraint definition "touches" column 2. I have a feeling we a agree this can be risky since there is room for usage in a way where the constraint doesn't do its job.

December 17, 2009 3:29 PM
 

Alejandro Mesa said:

Tibor,

Here is the correction.

CREATE FUNCTION t_uq(@c1 NVARCHAR(50), @c2 bit)

RETURNS bit

AS

BEGIN

DECLARE @ret bit

IF @c2 = 1 AND (SELECT COUNT(*) FROM t WHERE c1 = @c1) > 1

 SET @ret = 0

ELSE  

 SET @ret = 1

RETURN @ret

END

GO

ALTER TABLE t ADD CONSTRAINT t_c CHECK(dbo.t_uq(c1, c2) = 1);

GO

SELECT *

FROM sys.check_constraints

WHERE parent_object_id  = object_id('t') and [name] = 't_c' and [type] = 'C';

GO

...

UPDATE t SET c2 = 1 WHERE c0 <= 2

GO

Here is the error when we try to execute the update statement.

Msg 547, Level 16, State 0, Line 2

The UPDATE statement conflicted with the CHECK constraint "t_c". The conflict occurred in database "tempdb", table "dbo.t".

I am not suggesting that this is the way to go. Using a trigger in this case could be better.

AMB

December 17, 2009 3:39 PM
 

Alejandro Mesa said:

Tibor,

>  I have a feeling we a agree this can be risky since there is room for

> usage in a way where the constraint doesn't do its job.

Definitely, and it is good that you blogged about it.

AMB

December 17, 2009 3:51 PM
 

Alejandro Mesa said:

Alex,

Good luck with your book.

Looking forward to read it.

AMB

December 17, 2009 3:59 PM
 

Lubo said:

The constraint will work if the referes to column c2 as mentioned earlier or if there is no reference to any column in that table...

CREATE FUNCTION t_uq()

RETURNS bit

AS

BEGIN

DECLARE @ret bit

IF (SELECT COUNT(*) FROM t WHERE c1 = c1 AND c2 = 1) > 1

  SET @ret = 0

ELSE

  SET @ret = 1

RETURN @ret

END

GO

ALTER TABLE t ADD CONSTRAINT t_c CHECK(dbo.t_uq() = 1) --check constraint is not referencing any column

GO

INSERT INTO t(c0, c1, c2) VALUES(1, 'a', 0) --OK

INSERT INTO t(c0, c1, c2) VALUES(2, 'a', 0) --OK

INSERT INTO t(c0, c1, c2) VALUES(3, 'b', 1) --OK

INSERT INTO t(c0, c1, c2) VALUES(4, 'b', 1) --Fails

UPDATE t SET c2 = 1 WHERE c0 = 2 --Fails

--We have valid data:

SELECT * FROM t

December 18, 2009 12:20 PM
 

Adam Machanic said:

Lubo,

It might work, but it's going to be extremely expensive to evaluate the entire set every time any data modification occurs... What if you need to insert 10,000,000 rows? Do you really want to consider every row in the table that many times?

December 18, 2009 12:47 PM
 

Alex Kuznetsov said:

Tibor,

Your description is clear, thanks!

Alejandro,

Thank your for your kind words!

December 18, 2009 3:19 PM
 

Lubo said:

Well, The performance is something I did not consider.

I just wanted to point out how the constraint is working based on the input. With no parameters the constraint will check the set every time as you said. with present @c2 parameter it will check only when it's needed.

Did anybody think about indexed calculated column?

Would the performance be the same as using indexed view?

December 21, 2009 7:00 AM
 

Richard Jereb said:

Hi folks,

there are so many comments here - probably someone already mentioned all my points and I just missed it ;-)

Surely Adam is right about performance bottlenecks when using this technique. For small tables is the solution absolutely usable.

IMHO the "drawback" Tibor complains about is rather an implementation problem: the UDF allows as many records with a,0 as you wish and only one with a,1. The update IS according to this rule; the constraint does not fire because there is no reason for it to fire.

Try to change like this, this might be more close to your expectations:

CREATE FUNCTION t_uq(@c1 NVARCHAR(50), @c2 bit)

RETURNS bit

AS

BEGIN

DECLARE @ret bit

IF @c2 = 1

 IF (SELECT count(*) FROM t WHERE c1 = @c1) > 1

  SET @ret = 0

 ELSE  

  SET @ret = 1

ELSE

 SET @ret = 1

RETURN @ret

END

GO

ALTER TABLE t ADD CONSTRAINT t_c CHECK(dbo.t_uq(c1, c2) = 1)

Or did I miss the point? Cheers, Richard.

December 21, 2009 7:10 AM
 

venom_zx said:

the workaround

is to just include all the fields that need to be checked as parameters to the UDF regardless of whether the UDF uses these parameters. i've tried this and it works fine for inserts as well as updates in sql server 2008.

appearently MS they knew about it already (maybe too late for the release of sql server 2008)

https://connect.microsoft.com/SQLServer/feedback/details/301828

what i would love to see is check constraints firing for deletion.

October 28, 2010 6:04 AM
 

Laurence said:

I agree with venom_zx conclusion.  The orginal example was confusing and misleading.  If you have more than one field being checked in the contraint logic include that in the UDF.

I spent time figuring this out on my own and came to his same conclusion.  

I would change the first example so it works correctly -- so other people reading the example get the clear picture.

June 27, 2013 2:30 PM
 

Matheussi said:

Great article.

December 20, 2013 10:26 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement