THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Alexander Kuznetsov

Myth Busting: COUNT(*) under REPEATABLE READ may return wrong results

This came up on, where I read the following claim (in a discussion thread, not in an article): "COUNT(*) is only guaranteed accurate if your isolation level is REPEATABLE READ, SERIALIZABLE, or SNAPSHOT". I have a repro script which shows how COUNT(*) running under REPEATABLE READ returns wrong results with high concurrency.


We need a table with data and a function that provides random  integers:

 CREATE TABLE dbo.WideTable
RandomValue INT NOT NULL ,
    ) ;

-- inserts 10,000 rows
INSERT INTO dbo.WideTable(RandomValue, Filler)
SELECT Number, 'just some chars'
FROM data.Numbers

CREATE VIEW dbo.WrappedRandView
RAND() AS RandomValue ;

CREATE FUNCTION dbo.RandomInt ( @Multiplier INT )
@ret INT ;
SET @ret = ( SELECT CAST(RandomValue * @Multiplier AS INT)
FROM   dbo.WrappedRandView
) ;
RETURN @ret ;

Running repro script

In one tab, we shall be adding 10 rows at a time, 10000 times:

DECLARE @randomInts TABLE ( randomInt INT ) ;

DECLARE @numIterations INT ,
@iterationNumber INT ;
SET @numIterations = 10000 ;
SET @iterationNumber = 1 ;

WHILE @iterationNumber <= @numIterations
        INSERT  INTO
( randomInt
SELECT  dbo.RandomInt(10000)
FROM    Data.Numbers
WHERE   Number < 10 ;
INSERT  dbo.WideTable
( RandomValue ,
SELECT  randomInt ,
'some chars'
FROM    @randomInts ;
DELETE  FROM @randomInts ;
SET @iterationNumber = @iterationNumber + 1 ;                                        


In another tab, COUNT(*) should always return a multiple of 10, but the select at the bottom returns all COUNT(*) that are not multiples of 10:


DECLARE @counts TABLE ( cnt INT ) ;
DECLARE @numIterations INT ,
@iterationNumber INT ;
SET @numIterations = 1000 ;
SET @iterationNumber = 1 ;

WHILE @iterationNumber <= @numIterations
        INSERT  INTO
( cnt )
FROM    dbo.WideTable ;
SET @iterationNumber = @iterationNumber + 1 ;                                        
FROM    @counts
WHERE   cnt % 10 > 0 ;


However, when I ran the script above I got hundreds of incorrect results:




Overall, 755 times out of 10000 the COUNT(*) results were not multiples of 10.

Of course, every time time we run this repro script, we shall be getting somewhat different results, but we should still frequently get wrong totals.





Published Thursday, October 21, 2010 12:04 PM by Alexander Kuznetsov



Armando Prato said:

You know, this is one of the reasons I love this blog. I learned something today... I had no idea that could happen under high load.   Thanks for sharing!

October 22, 2010 9:32 AM

Alexander Kuznetsov said:

No problem, Armando.

I knew such things can happen for quite a while. You know, we work with high concurrency, we get to know what is happening. Yet it was not exactly trivial for me to come up with a repro script with reliably works every time. So I was holding on this until I had a good repro.

October 22, 2010 5:19 PM

TheSQLGuru said:

I wonder if the same results would be forthcoming if you used temporary tables instead of table variables.  Also, what is your default isolation level (i.e. the one that would be in effect in for the insert executions)?

October 23, 2010 10:08 AM

AlexK said:


I'll try it out with temp tables on Monday. The default isolation is READ COMMITTED. I encourage you to tweak the scripts and run them, so that you can see for yourself.

October 23, 2010 12:10 PM

Kalen Delaney said:

Hi Alex

The article on SQL Server Central is wrong. All that READ COMMITTED guarantees is that we won't read UNCOMMITTED data. There is nothing in the definition that count(*) or any aggregate is guaranteed.

This is a great example to illustrate this behavior.

This post has been selected as blog post of the day on my SQL Server Internals FaceBook page, and I have added a couple more comments about it also.

October 24, 2010 5:20 PM

Paul White said:

Hey Alex,

Another great post.  It reminded me of your previous entries on the wider topic:

I think those should be required reading for anyone who wants to understand what isolation levels do, and do not, guarantee.

The confusion around REPEATABLE READ is puzzling to me - after all the isolation level is pretty well-named.  It guarantees that rows read inside a transaction can be re-read and they'll still be there.  It does not say anything about new rows added by concurrent activity (which may appear as phantoms on subsequent reads of the same range).

For my money, COUNT(*) only has a well-defined meaning if we take steps to ensure that the contents of the table are stable during the statement - such as taking a table lock, or running at SERIALIZABLE, or one of the row-versioning isolation levels.


October 25, 2010 3:45 AM

Alexander Kuznetsov said:

Hi Kalen,

Thank you much for featuring my post on Facebook.

Just wanted to clarify that this myth was mentioned in a discussion, not in article. I corrected my post to make it clear. Thanks!

October 25, 2010 10:42 AM

Alexander Kuznetsov said:

Hey Paul,

I agree that I am kind of repeating myself. However, the myths persist, so I keep sharing what I have learned, using slightly different examples.

I am not with you "The confusion around REPEATABLE READ is puzzling to me - after all the isolation level is pretty well-named".

I find REPEATABLE READ term to be very confusing - if I repeat my read, I may get different results. I think SELECTED ROWS LOCKED would be clearer to me. What do you think?

October 25, 2010 11:11 AM

Paul White said:

Oh no - I'm not accusing you of repeating yourself!  This is important stuff, so I wanted to link back to related things you had posted before for the benefit of others who may not have seen it.

I take your point about REPEATABLE READ - it hadn't occurred to me to think of it that way.  SELECTED ROWS LOCKED is perhaps better :)

October 25, 2010 8:35 PM

Kalen Delaney said:

I wouldn't want to call it SELECTED ROWS LOCKED because the isolation levels are intended to indicate acceptable behaviors; they are not supposed to indicate how those behaviors are enforced.

October 25, 2010 8:52 PM

Paul White said:


If NOLOCK is an acceptable synonym for READ UNCOMMITTED, why would SELECTED ROWS LOCKED not be acceptable as a synonym for REPEATABLE READ?


October 25, 2010 9:11 PM

Kalen Delaney said:

NOLOCK was introduced in the Sybase product in way early days, before there was a concerted effort to support the actual ANSI isolation levels. Using READ UNCOMMITTED is a step forwards. In addition, NOLOCK is just a hint, it is not an isolation level. And as a hint, it is very misleading. Also for me personally, it is NOT an acceptable synonym.

SELECTED ROWS LOCKED is just too limiting. It's a step backwards.  It presumes there is no other way to attain the behaviors defined by REPEATABLE READ.

October 25, 2010 10:28 PM

Paul White said:

I guess it depends on what one considers more important - conveying the technical intentions of the isolation level, or helping the majority of people understand what it means in practice.

I tend to agree (now) with Alex, in that REPEATABLE READ does not actually mean that your reads are (exactly) repeatable.  In that sense, it is potentially misleading - you certainly can repeatably read the same rows, but you might get extra ones too.  One might argue that that is obvious given that SERIALIZABLE exists, but I think 'obvious' would be too strong.


October 25, 2010 10:51 PM

AlexK said:


I like this REPEATABLE READ FOR (PREVIOUSLY) SELECTED ROWS nomenclature. I would go for it. With autocomplete we don't have to actually type it - the tools would do it for us, so additional length is OK.

Come to think of it, SERIALIZABLE actually means "REPEATABLE READ". I guess it is too late to change now...

October 25, 2010 11:50 PM

Paul White said:

Ha!  Yes - if we are talking purely about practical considerations then SERIALIZABLE 'means' fully repeatable reads by avoiding dirty reads, nonrepeatable reads, and phantoms (as does snapshot isolation (but not read committed (snapshot))).

Strictly, SERIALIZABLE guarantees that there is a way to run transactions serially to get the same results as when run 'simultaneously'.  The current implementation of the isolation level gives us range locks and so on, but as Kalen mentioned before, these things are implementation details.

Fascinating stuff.

October 26, 2010 1:09 AM
New Comments to this post are disabled

About Alexander Kuznetsov

Alex Kuznetsov has been working with object oriented languages, mostly C# and C++, as well as with databases for more than a decade. He has worked with Sybase, SQL Server, Oracle and DB2. He regularly blogs on, mostly about database unit testing, defensive programming, and query optimization. Alex has written a book entitled "Defensive Database Programming with Transact-SQL" and several articles on and Currently he works as an agile developer.

This Blog


Privacy Statement