THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Alexander Kuznetsov

Some heap tables may be more prone to deadlocks than identical tables with clustered indexes

Apparently for high isolation levels some heap tables may be more prone to deadlocks than identical tables with clustered indexes. I have a simple repro script which successfully completes if a table has a clustered index but embraces in a deadlock if it runs against a heap. Here is the table:

 

CREATE TABLE dbo.NarrowTable(ID INT NOT NULL, i1 INT NOT NULL, i2 INT NULL, filler CHAR(1) NULL)

GO

SET NOCOUNT ON;

DECLARE @i INT;

SET @i=0;

WHILE @i<100 BEGIN

INSERT INTO dbo.NarrowTable(ID, i1, i2, filler) VALUES(@i,10000-@i,1,'?');

SET @i = @i+1;

END

GO

CREATE UNIQUE NONCLUSTERED INDEX UNQ_NarrowTable_ID ON dbo.NarrowTable(ID);

 

Run the following script simultaneously from two connections:

DECLARE @count INT;

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

SET NOCOUNT ON;

SET @count = 0;

WHILE @count<100000 BEGIN

    UPDATE dbo.NarrowTable SET i2 = 1 - i2 WHERE i1 = 500;

    SET @count = @count + 1;

END

 

I consistently get a deadlock every time I run it:

 

Msg 1205, Level 13, State 18, Line 6

Transaction (Process ID 54) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

 

 Yet if I drop the index and recreate it as a clustered, there are no deadlocks:

 

CREATE UNIQUE CLUSTERED INDEX UNQ_NarrowTable_ID ON dbo.NarrowTable(ID);

 

Note that the column I am searching on, i1, is not indexed at all. If there is an index on i1, there are no deadlocks. Also if I down the isolation level to repeatable read, I do not get deadlocking either.

 

I ran these scripts on 2005. Of course, this may or may not work on other versions.

 

I strongly prefer posting repro scripts to describing the anatomy of deadlocks. I think it is much more useful to let the reader run and troubleshoot, than to just read what the author has to say. Anyway, Jonathan Kehayias has recently posted a very detailed post describing a similar case, when after creating a clustered index deadlocks went away.

 

Here is the post courtesy of Jonathan Kehayias

 

My previous posts about deadlocks are:

 

Reproducing deadlocks involving only one table

 

When Index Covering Prevents Deadlocks

 

Published Wednesday, March 11, 2009 11:05 PM by Alexander Kuznetsov

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Paul Nielsen said:

Fascinating. And SQL Server 2008 (SP1 CPT) has the same behavior.

March 12, 2009 5:13 PM
 

Paul Nielsen said:

Fascinating. And SQL Server 2008 (SP1 CPT) has the same behavior.

March 12, 2009 5:13 PM
 

Greg Linwood said:

It is silly to perform queries against non-indexed columns on a heap, because a heap is not designed to be scanned. All scans are full scans on a heap so a full table lock will be acquired & of course you're going to bump into deadlocks if you mis-use the heap in this way.

If this were a large table on a busy server, you'd kill concurrency either way by forcing a full scan the way you have here.

It's also worth noting that the CIX requires MANY more locks to accomplish its ordered scan & this is also likely to really hurt you on a busy live system.

Again, it is nonsense to benchmark any query that scans a table without an index.

March 12, 2009 6:19 PM
 

Jim McLeod said:

It's a bit of an unfair test to run - the heap has two objects (the heap and the non-clustered index) that will be updated, while the Clustered Index test only has a single object - the Clustered Index.  It's impossible to get a deadlock on a single resource with such a simple query.

This doesn't explain why REPEATABLE READ is able to avoid the deadlock, but lowering the isolation level will reduce the level of locking required (although REPEATABLE READ and SERIALIZABLE are pretty close).

March 13, 2009 12:55 AM
 

Alexander Kuznetsov said:

Greg,

I agree it would be rather stupid to do things like this in production.

Yet this is a repro script, not an example of real life usage. A repro script is expected to run in the same way in different requirements. As such, it has to be as simple as possible. If I posted a real life case, I guess it would be so big that nobody would read it all the way through ;).

The deadlocks I am observing do not occur because a table level lock is needed - it is also needed for lower isolation levels. But only for serializable it first acquires a shared lock, then attempts to promote it to an exclusive one. This is what in my environment leads to a deadlock - acquiring a shared lock first, then promoting it. Does it make sense to you?

Jim,

There is no need to update an index on ID when you modify another column i2.

March 13, 2009 10:35 AM
 

Greg Linwood said:

The title of your blog post does suggest this information relates to real life usage. I urge you to correct this as some might mis-interpret the information posted as meaning that heaps are subject to more deadlocking that CIXs, which is simply not correct.

As for whether the deadlock is a cyclical or conversion type deadlock, this would be easy to detect through tracing the locks acquired whilst running your example. I'll leave this to you though as I consider diagnosing deliberately unindexed heaps as a waste of time b/c heaps are not supposed to be queried directly / without indexes (as heaps have no location awareness for rows by design)

March 13, 2009 6:44 PM
 

Jim McLeod said:

Right you are, Alexander - I missed that part.  If you remove the non-clustered index on the heap, you still get the deadlock.

March 17, 2009 2:18 AM
 

Alexander Kuznetsov said:

I have already described several deadlock scenarios that involve only one table in another post. This

March 26, 2009 12:00 PM
 

Paul White said:

I hope I'm not missing something really obvious here, but the behaviour described seems quite reasonableā€½ :)

At an isolation level of SERIALIZABLE, we are asking the database engine to take range-locks to prevent other transactions altering data within the range our query examines.

That's quite easily done when an index is present on the searched column (i1), or if there is a clustered index on the table.  SQL Server is quite happy to take Range S-U locks on KEYs - but it can't do that for the unsorted RIDs of a heap.

Without a means of range-locking, SQL Server has no option but to take an exclusive lock on the table, in order to honour the isolation level.  

The deadlock I see is (from memory) where both SPIDs get (compatible) TAB-IX locks because this is an update query, then a short time later both try to get TAB-X because of the range-locking issue.  The window of 'opportunity' is quite small, which is why I believe the large number of iterations (and preferably a multi-core machine) is required.

At repeatable-read on a heap without a seekable index, the engine can just take the more usual IX table lock, IU on scanned pages, acquired-and-released U locks on RIDs, and finally a page-IX and row-X lock associated with the one row that gets changed.  (These locks just might get escalated, but a table lock is not required per se, and can be avoided with hints and trace flags as necessary).

Read committed is exactly the same, except that only the IX table and page locks and the one X RID lock is held to transaction completion.

As a final point of possible interest, the engine seems to take a new Range S-U lock for each iteration.  I guess this must be required for correctness, but I can't quite fully convince myself of the exact reason!

At any rate, the number of such locks grows rapidly, and will also escalate to TAB-X if the lock manager gets a moment to think.

Comments and/or corrections are most welcome :)

March 29, 2009 10:15 PM
 

Alexander Kuznetsov said:

Paul,

Your analysis makes sense to me.

March 30, 2009 9:50 AM
 

Paul White said:

Thanks for taking the time to reply Alexander, and thanks for yet another thought-provoking entry!

March 31, 2009 12:34 AM
 

Greg Linwood said:

Paul - it seems to me that you've explained the lock acquistion steps well.

The big picture, however, is that update transactions should never be performed against unindexed structures (whether heaps or CIXs), especially where a high level of transaction isolation is needed.

What I found disappointing / even misleading about this post was the topic, as it suggests that the problem is with heaps where the problem was really the inappropriate usage of heaps in Alex's example.

Some who read this might take the blog title at face value & draw the wrong conclusion that they should never use heaps.

I know Alex well & he's a smart guy, but this post is nothing more than an illogical beat up on heaps & deserves critical analysis.

March 31, 2009 4:13 PM
 

Paul White said:

Hey Greg,

Thanks, but I think you're being a bit hard on Alex!

Naturally, it would make sense to have a non-clustered index on column i1 in this example - because it could be used to support the Range S-U lock required (avoiding the deadlock) - quite apart from its usefulness in seeking to the row to update.

The catch is that the non-clustered index must have the range column in the first key position, whereas a clustered index on any column(s) would do the range-locking job nicely.

You have to understand that the repro provided is simply there to illustrate the point made - it's not meant to be taken too seriously outside of that context.

I found the article useful and interesting - I will certainly be mindful of this behaviour in future.

It seems best to ensure that tables to be accessed at the serializable level have a clustered index.

The alternative - trying to ensure that an appropriate non-clustered index always exists on a heap for all potential update queries - seems unworkable to me.

BTW some update operations are more efficiently done against an unindexed heap than with the use of an index.  Full scans are sometimes the optimal choice!

Having said all that, there are many cases where a heap is 'better' than a clustered index, for a given value of 'better'.  Just adding a bit of balance there so you don't think I'm all anti-heap.  I'm not.  It's all about making the appropriate choice in different circumstances.

Cheers!

March 31, 2009 5:25 PM
 

Greg Linwood said:

There is no reason to say that all serialized transactions should be supported by Clustered Indexes. All that's really required is an Index with the search condition as a leading key, whether it be clustered or not.

Your argument about not being able to ensure appropriate non-clustered indexes always exist actually works the other way around.

Generally speaking, CIXs suit OLTP workloads where the queries are well known & the impact from bookmark / key lookups can be managed / mitigated.

On DSS databases the query workload cannot really be known in advance so the cost of bookmark / key lookups cannot be managed (as you can't know in advance which indexes to create). What you really need in these scenarios is scan efficiency which can be killed by bookmark / key lookups vs the far more efficient RID lookups performed on heaps.

In short, there's a strong case for general use of CIXs with OLTP (known) workloads, though not in all cases, and a strong case for heaps in DSS (unknown) workloads (also not in all cases).

I find it amazing that Microsoft publishes TPC-H benchmarks using CIXs when they're losing in the 10TB+ stakes vs Oracle who always use heaps. Having SQL Server remain behind in the TPC-H benchmarks due to simple mis-configuration of the physical db is really unfortunately as it leads to lost sales when CIOs interpret the results as meaning that SQL Server isn't a good platform for large Data Warehouses..

March 31, 2009 8:28 PM
 

Paul White said:

Greg,

The presence of a clustered index means that a range-lock can always be accommodated (thereby avoiding the deadlock scenario Alex pointed out) whereas a heap would require one non-clustered index per column to achieve the same result.  I don't understand why you say the argument works the other way around - perhaps you could go into a little more detail on that?

The other points you raise might strictly be a bit off-topic, but I hope Alex will forgive me for responding:

The whole discussion of key lookups versus RID lookups in an interesting one, but the point you raise is by no means the only or even deciding factor even on large DSS systems.

Your article on 'Debunking myths about clustered indexes' shows that an appropriate clustered index suitable for useful partial range-searches was the most important factor in that case, for example.

I'm not sure it is safe to generalise on when clustered indexes or heaps are the optimal solution - each case requires specific analysis bearing in mind the unique features, circumstances, and priorities of each installation.

I would like to think that there are other factors in CIO's purchasing decisions than the very artificial TPC benchmarks - but who can say?

I look forward to reading the exchanges between you and the M$FT engineers responsible for the configuration one day - would make an interesting read, I would imagine!

</Paul>

March 31, 2009 10:21 PM
 

Greg Linwood said:

Can you describe an example of where an application would really want all rows in a table updated concurrently? If not, the difference between the behaviour of CIXs & heaps in this regard is really fairly pointless.

As for generalisations, I never said "always" about anything! In fact, this is my entire point in the discussion about CIXs vs HEAPs. Some proclaim that CIXs should be used universally & a few even go as far as to say HEAPs should be removed from SQL Server altogether. I simply say that each tool should be used for the purpose it best suits & that there are plenty of good uses for HEAPs..

April 1, 2009 4:19 AM
 

Paul White said:

Hi Greg,

As far as the generalization issue is concerned, please don't misunderstand me - I think we agree on this one to a large extent!

I was referring to the _implication_ that clustered indexes were by default best for OLTP and heaps best for DSS.  That there may be some correlation there is not something I would dispute energetically.

My point was an attempt to reinforce the 'not in all cases' message.  The casual reader could conceivably have taken a CIX = OLTP, HEAP = DSS message away, if skim-read.  If it seemed otherwise then I apologize for that.

Anyway, going back to the first part of your reply, the point is that the nonclustered index needs to be keyed on the column which appears in the search argument (e.g. i1 = 500 in Alex's script) in order to provide a path for the lock manager to provide a range-lock.

Seeing how we cannot know in advance which columns may be referenced in SARGs in all queries that will run against the table at the serializable level - the implication is that each column would need a nonclustered index, in order to provide the same protection against the deadlock that a clustered index would give.

In order for fewer nonclustered indexes to be used, we would have to assume that one or more columns must always be specified in any query at the serializable level.  That seems an unsafe assumption, and not an example of 'defensive programming'!

As a side note, I admit I am uncertain whether the engine can or will use a nonclustered index on column A (and only A) for range-locking if columns A and B and C are present in the search arguments.  I will test that now.

I hope that my logic is sound and that we can now agree that the difference in behaviour between CIXs and HEAPs, whilst non mainstream by any means, is certainly not 'pointless' :)

As always, I am happy to learn new things through correction of any faulty reasoning!

</Paul>

April 1, 2009 4:44 AM
 

Greg Linwood said:

What do you mean "we cannot know in advance which columns may be referenced in SARGs in all queries that will run against the table at the serializable level"? Of course we can!!

Anyone who writes a serializable transaction without implementing the required indexes won't get far before bumping into LOTs of problems (deadlocking, blocking etc)

Keep in mind that serializable transactions are also only used in OLTP systems & no OLTP system can scale without indexing requirements being identified & implemented.

This blog is indeed pointless b/c no sane DBA would allow a query such as the one presented here to run on a production system without the necessary index. Any discussion about access methods against unindexed HEAPs is pointless b/c HEAPs aren't intended to be used for query optimisation!

April 1, 2009 5:38 AM
 

Paul White said:

Greg,

Consider the following query, consistent with Alex's script:

UPDATE dbo.NarrowTable WITH (SERIALIZABLE)

SET i2 = 1 - i2

WHERE i1 BETWEEN 9949 AND 9951 AND ID BETWEEN 49 AND 51;

There is no non-clustered index which will allow the appropriate range of rows to be locked.  An exclusive table lock is taken after an intent-exclusive, and the deadlock occurs.

Creating a clustered index on the (daft choice to illustrate the point) filler column, allows the correct range-locks to be taken and the deadlock does not occur.

Unless we can restrict all SARGs to equality predicates, the non-clustered index(es) solution on a heap fails.

Therefore, this type of deadlock can only occur on heaps, not tables with clustered indexes - which was the original point ;)

Cheers,

Paul

April 1, 2009 6:52 AM
 

Greg Linwood said:

There is no point in discussing the performance behaviour of unindexed HEAPs b/c HEAPs are not designed to be used without indexes. It's not much different from discussing the effects of running a busy database without ANY indexes - it's simply a waste of time discussing blatent poor practise.

April 1, 2009 7:07 AM
 

Alexander Kuznetsov said:

Paul,

I agree that "it is not safe to generalise on when clustered indexes or heaps are the optimal solution - each case requires specific analysis bearing in mind the unique features, circumstances, and priorities of each installation".

This is why I came up with a really simple repro script, so that anyone can tweak it and play with it. The original repro was much more complex, but I kept applying Occum's razor until it was as simple as possible. I strongly believe in see-for-yourself approach because we can learn more when we try things out than when we just read about them.

Greg,

I disagree with your classification of systems as either pure OLTP or pure DSS. I am exposed to quite a few systems which are somewhere between OLTP and DSS, and so are many other database professionals. I strongly encourage you to wrap up your comments as an article - they are almost ready to be published as such!

April 1, 2009 11:04 AM
 

Paul White said:

Hey Alex(ander),

You're so right about the see-for-yourself thing.  I had so much fun based on the simple repro script - it really got me thinking and taught me a couple of interesting new things!

So thanks again for the great article, and keep them coming!

</Paul>

April 1, 2009 3:54 PM
 

Greg Linwood said:

Alex - I never said OLTPs or DSSs are "pure" in any way. I was pretty careful to point out a few times that there are no universal truths, so I'm not sure how you read this into anything I wrote here.

My main beef with this discussion is that inexperienced readers might draw a conclusion that they should never use HEAPs out of fear of deadlocks.

This blog discusses a completely unrealistic scenario that could only be described as a deliberate miss-use of a feature.

Testing what happens when you perform multiple concurrent updates on an unindexed heap is not much different from testing what happens on a database with all its primary keys, foreign keys & indexes removed - it's a meaningless test & publishing conclusions from such a test is probably misleading, and some inexperienced readers are likely to draw illogical conclusions that they should NEVER use HEAPs.

I look forward to a rigorous discussion at the next MVP summit (c:

April 1, 2009 7:10 PM
 

Paul White said:

Hey Greg,

Seeing as I won't be able to join you for the summit, how about enlightening me as to which non-clustered indexes I should add to the heap in Alex's example to solve the range-lock problem?  

As it stands, I can't see any way aside from adding a clustered index, to avoid the deadlock scenario I referred to before:

UPDATE dbo.NarrowTable WITH (SERIALIZABLE)

SET i2 = 1 - i2

WHERE i1 BETWEEN 9949 AND 9951 AND ID BETWEEN 49 AND 51;

If you can't, I'll guess the only conclusion is that "Some heap tables may be more prone to deadlocks than identical tables with clustered indexes" ;)

</Paul>

April 1, 2009 9:12 PM
 

Greg Linwood said:

Database Design 101 - all tables should have a primary key (E.F.Codd) or a candidate key (C.J.Date) otherwise they are not actually tables.

Try adding a primary key or a unique constraint & I'd expect your deadlock problem to disappear.

April 2, 2009 2:41 AM
 

Paul White said:

Well Greg,

It may be Database Design 101 but it makes no difference to the matter in hand.  I have just created a non-clustered PK on the ID column in dbo.NarrowTable and run the example update I keep posting.

Guess what?

IX on table followed by TAB-X -> deadlock as described.  It's no surprise really - Alex's original had a unique NC index!

The *non-clustered* primary key is no better from a range locking point of view, where the predicate is non-trivial over more than the PK key column.

I would encourage you to experiment with real data before making condescending remarks quoting Codd and Date!

All in good fun...

Paul

P.S. A clustered primary key works like magic lol

April 2, 2009 3:06 AM
 

Greg Linwood said:

Paul,

I meant that the non-clustered PK or unique IX should go on column [i1] rather than [ID] because [i1] is the column being filtered in the only query in this example. I'm not sure why you'd index [ID] b/c it's not being used as a filter by the update query. I also can't see a reason why Alex indexed [ID] either given that no query in the example actually uses that column.

Anyway, if you add a PK or Unique NCIX to [i1] the deadlocks will disappear.

If you don't make the index on [i1] unique, SQL Server simply optimizes the query incorrectly which results in more locking than is necessary (Object IX-TAB + Object X-TAB as you pointed out). However, it is really only an optimization error & one that only occurs if the table is tiny (two pages or less).

If you increase the number of rows inserted above ~1100, the query optimizer changes from performing a table SCAN to an index seek (assuming there's an index on [i1], whether unique or not). Try increasing the # of rows inserted to 1100 & put a NCIX on [i1] (or even better [i1],[i2]) & see for yourself that the deadlocks disappear.

The scan / seek decision is purely a cost based one and the scan in this scenario is clearly an optimisation error. I think the optimizer goes for the scan where only 100 rows exist in the table as it only takes a single page read vs three ([i1] NCIX seek + RID lookup). IMO, when under serialization isolation the optimizer should be smarter & choose the seek & this is just an optimization bug. I'm sure the optimizer guys are aware of this anomoly but probably don't consider it a high priority given that it's manageable with hints or plan guides & also that the incidence of this actually occurring would be pretty low.

As mentioned above, it is possible to simply use an index hint or plan guide to ensure this query uses a NCIX on [i1] or [i1],[i2], though it would obviously be better if the optimizer didn't mess things up in the first place.

Note that statistics problems can also cause (other) problems with CIXs so this type of problem isn't confined to HEAPs & using hints is not unusual to solve these sorts of problems.

So it shoud be noted that the obscure point Alex has brought up is only limited to tiny tables (stored on heaps with less than three pages) and can be managed with hints or plan guides. There's obviously no reason not to store these types of tables on Clustered Indexes, but there's also no reason to read anything into this for tables any larger than two pages.

April 2, 2009 8:29 AM
 

Alexander Kuznetsov said:

Greg and Paul,

This was a most interesting discussion, thank you much for your participation!

Greg,

Because you are so passionate and knowledgeable about this topic, I strongly encourage you to write about it in your own way. When you are done with it, I will embed a link to it right in my post, preceded by the following:

Note: this post is controversial. Make sure that you have read the alternative point of view as presented by Greg Linwood.

April 2, 2009 4:43 PM
 

Paul White said:

Hi Greg and many thanks for the interesting and considered reply.

I actually tested that scenario (PK on i1) a while back, when I wrote:

"As a side note, I admit I am uncertain whether the engine can or will use a non-clustered index on column A (and only A) for range-locking if columns A and B and C are present in the search arguments.  I will test that now."

What I missed was that the exclusive table lock is only taken when there are relatively few rows in the table, as you point out.

Re-running the test with 100,000 rows in the table shows that range locks are taken on the non-clustered index, even if there is no SARG on the indexed column!

Which is kind of a relief and a disappointment at the same time, if I'm honest.

[Of course the locks quickly escalate to a TAB-X, even with a ROWLOCK hint, but that is fair and occurs with a CIX too].

That just leaves us with the sensible requirement to have a unique index on a heap to support queries running at serializable, and to use an index hint or plan guide template if circumstances dictate.

Having said all that, the article is still a good one in my view - even if only because it gave us so much to think about!

Thanks Greg!

/Paul

April 2, 2009 4:46 PM
 

Paul White said:

Alex,

Our comments overlapped there.  I'm glad you enjoyed the discussion and much as I (and hopefully Greg) did!

I do feel a bit bad for spamming your comments though :(

Cheers

/Paul

April 2, 2009 4:48 PM
 

Greg Linwood said:

Paul, I also enjoyed this discussion. I hope you do make it to a summit one day (c:

Alex - I wish I had more time to blog but I'll try & get something together time permitting.

April 2, 2009 4:58 PM

Leave a Comment

(required) 
(required) 
Submit

About Alexander Kuznetsov

Alex Kuznetsov has been working with object oriented languages, mostly C# and C++, as well as with databases for more than a decade. He has worked with Sybase, SQL Server, Oracle and DB2. He regularly blogs on sqlblog.com, mostly about database unit testing, defensive programming, and query optimization. Alex has written a book entitled "Defensive Database Programming with Transact-SQL" and several articles on simple-talk.com and devx.com. Currently he works at DRW Trading Group in Chicago, where he leads a team of developers, practicing agile development, defensive programming, TDD, and database unit testing.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement