THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

Geek City: How Many Rows?

In my last post, I told you I'd post something technical before I left for Sweden. I almost made it. I wrote this in the Seattle airport, but wasn't able to post it until I got to Stockholm.

For as long as I've been working with SQL Server, I've been hearing/reading questions about how to quickly get a count of the number of rows in a table. We've always had the stored procedure sp_spaceused which would give us a count, but we've always be warned that the count was not guaranteed to be completely accurate. Way back in Sybase versions 3 and 4, that was an enormous understatement. Working in Tech Support, we recommended that people not rely on this value at all, and internally, we referred to the procedure as sp_space_useless.

Things have improved over the years, and in many cases, the sp_spaceused procedure, or a direct SELECT from sysindexes could give a reasonably accurate count, but still, there has never been a guarantee. Starting in SQL Server 2005, sysindexes became only a compatibility view, and the recommended system object was the catalog view sys.partitions. Sys.partitions keeps track of rows per partition, so the following query would return multiple rows for a partitioned table:

SELECT rows  FROM sys.partitions
WHERE object_id = object_id('name of table')
AND index_id < 2;

If you want one value, rather than one per partition, you could simply ask for the SUM:

SELECT SUM(rows)  FROM sys.partitions
WHERE object_id = object_id('name of table')
AND index_id < 2;

But even in SQL Server 2008, it is not guaranteed that you can get an exact count of rows without actually counting. The only guaranteed way to get an accurate value is to SELECT from the table:

SELECT count(*) FROM [name of table];

But what does it mean to get an accurate count?

And when would the count not be accurate when selecting from sys.partitions? Microsoft doesn't reveal all the cases in which you might not get an accurate value, all they will admit is that the count is not guaranteed. You can actually reproduce one case where you might not want to trust the count, by examining the count in one connection while another connection is changing the number of rows.

Start by creating a copy of a table in the AdventureWorks2008 database:

USE AdventureWorks2008;
GO
IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'Header')
    DROP TABLE Header
GO
SELECT * INTO Header
FROM Sales.SalesOrderHeader;
GO

You should have 31,465 rows in the new table.

Next, get ready to start a transaction to add more rows to the table:

-- Insert more rows:

USE AdventureWorks2008;
GO

BEGIN TRAN
INSERT INTO Header
SELECT  [RevisionNumber]
      ,[OrderDate]
      ,[DueDate]
      ,[ShipDate]
      ,[Status]
      ,[OnlineOrderFlag]
      ,[SalesOrderNumber]
      ,[PurchaseOrderNumber]
      ,[AccountNumber]
      ,[CustomerID]
      ,[SalesPersonID]
      ,[TerritoryID]
      ,[BillToAddressID]
      ,[ShipToAddressID]
      ,[ShipMethodID]
      ,[CreditCardID]
      ,[CreditCardApprovalCode]
      ,[CurrencyRateID]
      ,[SubTotal]
      ,[TaxAmt]
      ,[Freight]
      ,[TotalDue]
      ,[Comment]
      ,[rowguid]
      ,[ModifiedDate]
  FROM [Sales].[SalesOrderHeader];
GO 10

ROLLBACK TRAN;
GO

In another window, you can run a query to inspect the count:

USE AdventureWorks2008;
GO
SELECT rows  FROM sys.partitions
WHERE object_id = object_id('Header');

So now start the INSERT, and while it is executing, run the SELECT repeatedly.  You should see the count increasing as more rows are added, and then you can see it DECREASING as the ROLLBACK happens, and the values returned are not always an even multiple of 31,465.

So which of these counts are accurate? 

As we've been told, to get a true count, you'll need to use COUNT(*):

SELECT count(*) FROM Header;

However, this query will BLOCK when trying to get a value before the transaction is completely rolled back, at least if you're in the default READ COMMITTED isolation level. Selecting from sys.partitions will NOT block.

If your transactions are short, you won't get long term blocking, but if you have lots and lots of transactions inserting new rows continuously, what would an 'accurate' count really mean? The value returned one second would be completely different from the value returned the next second.

So you need to decide what 'accurate' really means, and maybe you'll decide that sys.partitions is accurate enough.

Inaccurately yours,

~Kalen

Published Monday, December 07, 2009 10:59 AM by Kalen Delaney

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Alexander Kuznetsov said:

Very interesting, Kalen! It would be nice to see if this behaves differently under different isolation levels. It looks like your isolation level was READ COMMITTED for all your examples, correct?

December 7, 2009 2:05 PM
 

Adam Machanic said:

Agreed, interesting post. One other consideration with regard to "accurate" is READ COMMITTED SNAPSHOT - you'll get an answer back from COUNT(*), but it may be grossly inaccurate if a long transaction is running...

December 7, 2009 2:33 PM
 

Michael Zilberstein said:

Kalen, thanks for an interesting post.

Select from sys.indexes JOIN sys.partitions actually blocks something! Recently I had to create a job that does precisely this - select from sys.indexes JOIN sys.partitions where object_id = object_id('MyTable') - it was part of the workaround to suspected bug that is still open with severity A at Microsoft (actually, I'm waiting right now for a call from support engineer). This job failed couple of times on deadlock! I didn't have time to investigate the output of 1222 trace flag yet - probably it deserves separate post :-).

December 7, 2009 4:15 PM
 

Kalen Delaney said:

Michael, it will be very interesting to see what you find out from support, because I don't see any locks being acquired for this query.

December 7, 2009 4:28 PM
 

Michael Zilberstein said:

Kalen, resources in contention are table in a database (user table - not system one) and sys.sysidxstats.

First session creates primary key for TableOne.

Second session executes

SELECT i.name, p.[rows]

FROM

sys.indexes i

INNER JOIN sys.partitions p ON i.index_id = p.index_id AND i.[object_id] = p.[object_id]

WHERE i.[object_id] = OBJECT_ID('TableTwo')

First session holds Sch-M lock on TableOne and requires exclusive lock on sys.sysidxstats.

Second session holds Shared lock on sys.sysidxstats and requires Sch-S lock on TableOne.

Looks like internally Microsoft doesn't use indexes - second session has nothing to do with TableOne.

December 7, 2009 6:08 PM
 

Gent Rakacolli said:

I thought row_count in sys.dm_db_partition_stats was supposed to be "accurate but transactionally inconsistent".

The following queries would behave similarly:

select sum(p.row_count) as NO_OF_ROWS

from sys.dm_db_partition_stats  as p

where p.object_id = object_id('Header');

select count(*) as NO_OF_ROWS

from Header with (nolock);

While using row_count from sys.dm_db_partition_stats might give largely misleading results on a busy system, it would provide a way to give a quick and accurate count of rows in a table in a "stagnant" system.

I thought I had seen somewhere online that sys.dm_db_partition_stats provides accurate counts, but I am unable to find it again.

BOL is explicit in regards to rows in sys.partitions as being an "Approximate number of rows in this partition", but it is not explicit regarding the accuracy for row_count in sys.dm_db_partition_stats. It says "Number of rows within the partition".

December 9, 2009 11:10 AM
 

Brian Nordberg said:

I too am receiving Deadlocks where my row count is the victim. If I attempt to get a row count while altering a table, adding an index... It will deadlock right off. Was anyone ever able to resolve deadlocks?

My query looks like:

SELECT

        OBJECT_SCHEMA_NAME(object_id) AS SchemaName

      , OBJECT_NAME (object_id) as TableName

      , SUM(CASE WHEN index_id < 2 THEN row_count ELSE 0 END) AS TotalRows

FROM sys.dm_db_partition_stats

WHERE OBJECT_SCHEMA_NAME(object_id) <>'SYS'

GROUP BY object_id;

September 9, 2014 10:40 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement