THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Andrew Kelly

Exists Vs. Count(*) - The battle never ends...

    I am still amazed at how many of the database applications written today still disregard some basic rules of thumb when it comes to accessing the data. One in particular is the use of COUNT(*) to check to see if there are any rows that match some criteria. The technique of using EXISTS over COUNT(*) has been widely publicized and is in pretty much every best practices document I have come across. So why are database developers still using COUNT(*) instead of EXISTS all over the place?  If it is because people just don't believe or recognize the benefits of it? Or is the concept of EXISTS difficult for new programmers to grasp?  I am not really sure since I have heard both views. So to cover both bases I will show a little demo of why EXISTS is almost always a better way to code when you simply need to see if there is at least 1 row that matches some condition in the WHERE clause. Of course if you really need to know exactly how many match that condition then COUNT(*) is appropriate so hopefully this won't confuse anyone in that regard.

Lets use the Adventureworks database and turn statistics IO on so we can see the number of reads associated with each query. We will then compare COUNT(*) with EXISTS so there is no mistake on how much work is being done in relation to each other.

USE Adventureworks
GO

SET STATISTICS IO ON

GO

 

First lets look at a situation in which there is an index to satisfy the WHERE clause and there are only 2 matching rows:

IF (SELECT COUNT(*) FROM sales.salesorderdetail
WHERE ProductID = 870 ) > 0

    Print 'Yes'

IF EXISTS (SELECT * FROM sales.salesorderdetail WHERE ProductID = 870)
    Print 'Yes'

Table 'SalesOrderDetail'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderDetail'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

As we can see from the Logical reads there was only a difference of 1 between the two queries since the amount of matching rows was so small. So in this case there was not a significant difference although it was still 1/3 more expensive to use COUNT(*) instead of EXISTS.

 

Now lets do the same but with 4688 matching rows:

IF (SELECT COUNT(*) FROM sales.salesorderdetail
WHERE ProductID = 897 ) > 0

    Print 'Yes'

IF EXISTS (SELECT * FROM sales.salesorderdetail WHERE ProductID = 897)
    Print 'Yes'

 

Table 'SalesOrderDetail'. Scan count 1, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderDetail'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

Now the cost for the COUNT(*) is over 5 times the EXISTS. This still may not seem like a lot to you. But if you were calling this queries thousands of times a second this would be a big deal. And remember this index is pretty small overall still.

 

OK now lets try this on a column with no index in that same table. There are 357 rows that match but there are a total of 121,317 in the table.

IF (SELECT COUNT(*) FROM sales.salesorderdetail
WHERE ModifiedDate = '20010701 00:00:00.000' ) > 0

    Print 'Yes'

IF EXISTS (SELECT * FROM sales.salesorderdetail WHERE ModifiedDate = '20010701 00:00:00.000')
    Print 'Yes'

 

Table 'SalesOrderDetail'. Scan count 1, logical reads 1241, physical reads 0, read-ahead reads 331, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderDetail'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 

This is now almost 250 times more expensive to do a COUNT(*) vs. an EXISTS. Both queries scanned the table but the EXISTS was able to at least do a partial scan do to the fact it can stop after it finds the very first matching row. Where as the COUNT(*) must read each and every row in the entire table to determine if they match the criteria and how many there are. That is the key folks. The ability to stop working after the first row that meets the criteria of the WHERE clause is what makes EXISTS so efficient.  The optimizer knows of this behavior and can factor that in as well. Now keep in mind that these tables are relatively small compared to most databases in the real world. So the figures of the COUNT(*) queries would be multiplied many times on larger tables. You could easily get hundred's of thousands of reads or more on tables with millions of rows but the EXISTS will still only have just a few reads on any queries that can use an index to satisfy the WHERE clause.

Hopefully this will help to persuade those last holdouts who insist on using COUNT(*) everywhere even when EXISTS is the clear choice.  One last note, make sure to turn off the statistics IO when done.

SET STATISTICS IO OFF

Published Saturday, December 15, 2007 9:27 PM by Andrew Kelly

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Adam Machanic said:

Hi Andy,

It's interesting to note that in SQL Server 2005 if there is a index available to allow a seek, the COUNT(*) > 0 test will be optimized and behave the same as EXISTS.  That said, I totally agree -- never use COUNT(*)!  But at least the optimizer can help sometimes...

Following is a demo of this in case anyone is interested:

---

create index ix_modified

on sales.salesorderdetail (modifieddate)

include (CarrierTrackingNumber,OrderQty,ProductID,SpecialOfferID,

UnitPrice,UnitPriceDiscount,LineTotal,rowguid)

go

set statistics io on

go

raiserror ('full count', 10, 1) with nowait

SELECT COUNT(*) FROM sales.salesorderdetail

WHERE ModifiedDate = '20010701 00:00:00.000'

go

raiserror ('if count(*)', 10, 1) with nowait

IF (SELECT COUNT(*) FROM sales.salesorderdetail

WHERE ModifiedDate = '20010701 00:00:00.000' ) > 0

   Print 'Yes'

go

raiserror ('if exists', 10, 1) with nowait

IF EXISTS (SELECT * FROM sales.salesorderdetail WHERE ModifiedDate = '20010701 00:00:00.000')

   Print 'Yes'

go

drop index ix_modified on sales.salesorderdetail

go

---

December 15, 2007 8:53 PM
 

Andrew Kelly said:

Yes I debated if I should add that tidbit or not :).  It does not work that way in 2000 and to me EXISTS is much clearer as to the intent to me and makes it easier to read the code.

December 15, 2007 9:45 PM
 

Linchi Shea said:

DB2 has an interesting query rewrite feature that is built into the database engine. "IF (SELECT COUNT(*) FROM sales.salesorderdetail WHERE ModifiedDate = '20010701 00:00:00.000' ) > 0" would be a perfect candidate for such a rewrite. I wonder whether SQL2005 has started the work to add a query rewrite layer, or this improvement is built into the query optimizer. It seems to me that it would make more sense to rewrite it behind the scene before sending it to the optimizer than to tinker with the optimizer for these specific cases.

December 16, 2007 1:35 AM
 

Denis Gobo said:

December 16, 2007 6:55 AM
 

Russ Bell said:

I'm using SQL Server 9.0.3175 and the last select yields the same with either method:

Table 'SalesOrderDetail'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Yes

That said, I do like the EXISTS syntax.

Another not too well publicized fact is that selecting a constant instead of * in the COUNT function can dramatically reduce IO.  For example:  SELECT COUNT(0) FROM sales.salesorderdetail.  Any constant will do.  This is particularly helpful when querying a wide table with many columns.

December 24, 2007 9:27 AM
 

dbaduck said:

Also, to mention that it is also helpful when using EXISTS to use a constant instead of a *, especially for wide tables as was indicated above.

EXISTS (SELECCT

January 4, 2008 12:53 PM
 

dbaduck said:

Let me try that again.

Also, to mention that it is also helpful when using EXISTS to use a constant instead of a *, especially for wide tables as was indicated above.

EXISTS (SELECT 1 FROM sales.salesorderdetail WHERE conditionexists)

January 4, 2008 12:54 PM
 

Andrew Kelly said:

Actually a * is preferred in this case.  It does not mean the same as in a select list where it returns all columns. It gives the optimizer it's choice of how to best run the query. A constant like 1 should work fine as well but never use a column name.

January 4, 2008 3:33 PM
 

ML said:

Actually, the SELECT clause of the query used in the EXISTS predicate is irrelevant.

Compare the execution plans. The optimizer is smart enough.

January 8, 2008 4:54 AM
 

Andrew Kelly said:

ML is correct.  Inside the EXISTS sub query it has little meaning.  I was still thinking of the outer query and count(x).

January 8, 2008 9:05 AM
 

TheSQLGuru said:

As you say, this is indeed a common practice by the unknowing.  Good examples, and I agree with your choice to not mention a few of the deeper issues pointed out by posters.

One very minor point is that 2 reads to 3 reads is a 50% increase as opposed to a 33% increase - which makes your point even more strongly. :-))

January 12, 2008 1:46 PM
 

Snik said:

Why not use

EXISTS (SELECT TOP 1 1 FROM tbl....)

June 10, 2009 1:17 PM
 

Baffled said:

Why do I frequently find (in SQL 2000, 2005, or 2008) that

SELECT <column list> FROM <table1> INNER JOIN <table2> WHERE <condition>

IF @@ROWCOUNT > 0

<Do Something>

is orders of magnitude faster than

IF EXISTS (SELECT * FROM <table1> INNER JOIN <table2> WHERE <condition>)

<Do Something>

Both return the same execution plan, but CPU, reads, and duration may be 600 - 3000X greater for the IF EXISTS.

December 10, 2009 12:48 PM
 

Andrew Kelly said:

Can you produce an example?  That should definately not be the case and I have not seen that trend before.

December 10, 2009 1:33 PM
 

Jim S. Carmelo said:

Snik said:

Why not use

EXISTS (SELECT TOP 1 1 FROM tbl....)

Well why not? because the sentence "exists" comes true when found the first row in the select query, and finish the statement, not continued searching.

Well sorry wheter my english is bad.

Greetings

December 31, 2009 10:53 AM
 

John said:

For application developers, you want a value returned to see if the condition passed:

SELECT COUNT(ID) FROM sales.salesorderdetail WHERE ProductID = 870

IF EXISTS (SELECT * FROM sales.salesorderdetail WHERE ProductID = 870)

   RETURN 1

And in code I would check value > 0.

Also, does this issue still apply to SQL Server 2005+? My understanding was this was fixed.

December 11, 2011 2:19 AM
 

Andrew Kelly said:

John, the principle behind this has not changed as it is basically related to the amount of work that needs to be done for a COUNT(*) vs. EXISTS().

December 11, 2011 6:46 PM
 

CadentOrange said:

What principle would that be? The first comment by Adam Machanic already demonstrates that this is count(*) and exists() are equivalent performance wise because count(*) gets optimized to exists() by the query optimizer.

If count(*) is clearer and more familiar to the developer, why use exists()?

February 1, 2012 6:48 AM
 

Shawkat said:

I have tried myself the following:

SET STATISTICS IO ON;

GO

begin

select COUNT(*) Total from CMCONTRACT where CONTRACTDATE>'1900-01-01';

if EXISTS(select * from CMCONTRACT where CONTRACTDATE>'1900-01-01')

print 'Existing yes..';

else

print 'Existing nooo';

SET STATISTICS IO off;

end

And read the result:

(1 row(s) affected)

Table 'CMCONTRACT'. Scan count 3, logical reads 104174, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'CMCONTRACT'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Existing yes..

January 16, 2013 12:33 PM
 

Vova said:

Hi, Andrew !

Please look at 2 tables in

http://forum.foxclub.ru/read.php?28,580685

In table 2. COUNT(*) faster than exists. Why ?

January 28, 2013 10:37 AM
 

QMaster said:

Thanks man, This is very helpful and below link is worth to find some truth about Exist may help too.

http://bradsruminations.blogspot.com/2009/09/age-old-select-vs-select-1-debate.html

February 13, 2014 7:11 AM
 

Andrew Kelly said:

Vova, it is hard to tell from that link what these tables actually consist of and thus impossible to say. However if the tables are very small all bets are off.  And speed (duration) is not always a good indicator of what is happening as many factors can affect duration. Take a look at the amount of work that is being done such as pages read via SET STATISTICS IO ON.

February 13, 2014 8:59 AM
 

Claire said:

Here are a couple of reasons I can think of why COUNT is still used:  1) Believe it or not, there are still many legacy databases in production (yes, even in the 21st Century), that do not support EXISTS.  2) Higher priority tasks have not warranted going back through the code to retrofit new methods as they have become available.  To sum it up, sometimes when something seems so obvious, we should question if there is more to the story.  In other words, you have assumed the "hold-outs' are being obstinate or backward, but perhaps that is not the case, just sayin'.

April 4, 2014 4:32 PM
 

Andrew Kelly said:

Claire, I specifically stated "database applications written today" in hopes to avoid that argument. If people are still writing code today because that is how they did it in the past I don't see that as a valid reason to keep doing it. We are only talking about SQL Server and applications written today. Maybe I wasn't clear enough about it but I really was referring to newly written code. Sorry for the confusion.

April 4, 2014 4:47 PM
 

Mysql Stored way with cursor | Search C said:

May 6, 2014 6:13 PM

Leave a Comment

(required) 
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement