THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Page Free Space

See also my SQL Server technical articles on SQLperformance.com

Fun with Aggregates

Published Monday, March 12, 2012 5:49 AM by Paul White

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

PassedBI said:

GJ! It was interesting to follow your examples. Dangerous subtle nuance, though I never skip GROUP BY clause.

March 11, 2012 2:28 PM
 

Alejandro Mesa said:

Paul,

Nice reading for a Monday morning :)

Talking about fun, I wonder why the cardinality estimate, out from the filter, changes if we use SUM(1) instead COUNT_BIG(*) in the following query.

SELECT

p.Name

FROM

(

SELECT th.ProductID

FROM Production.TransactionHistory AS th

GROUP BY th.ProductID

HAVING SUM(1) < 10

) AS q1

JOIN

Production.Product AS p

ON p.ProductID = q1.ProductID;

--

AMB

March 12, 2012 10:05 AM
 

Paul White said:

PassedBI,

I'm not sure I see what you mean about never skipping the GROUP BY clause.  There are certainly times when a scalar aggregate is what is required (counting all the rows in a table for example).  There are also times when the scalar semantic is necessary for correct results.  It's all about being aware of the differences.

Paul

March 12, 2012 10:26 AM
 

Paul White said:

Hi AMB,

With COUNT_BIG(*), cardinality estimation calculates an estimate from (derived) statistics.  Substituting SUM(1) results in a guess of 132.3 rows (30% of the estimated 441 rows from the aggregate).

Paul

March 12, 2012 10:33 AM
 

Alejandro Mesa said:

Thanks, Paul!

March 12, 2012 11:29 AM
 

PassedBI said:

Paul White,

I never skip GROUP BY clause

- when using HAVING condition - should I write.

Just never had the case when need use HAVING without GROUP BY.

March 12, 2012 2:48 PM
 

Rishabh K said:

Hi Paul,

I really enjoy reading your post. I think that with Sum(1) the cardinality estimate is calculated as All density for ProductID * No of rows after stream aggregation ,i.e, 0.002267574*441~ 1. Its fine  but I wonder why there is difference between count_Big and sum with regard to cardinality estimation. I tried with count(1) but still the same estimations

SELECT th.ProductID,COUNT_big(1),SUM(1)

FROM Production.TransactionHistory AS th

GROUP BY th.ProductID

having count_big(1)<10

Thanks for all your great articles

March 13, 2012 7:57 AM
 

AlexK said:

Paul,

I enjoyed reading your post, but I think HAVING clause is redundant. As such, it should be eliminated - that would simplify and improve SQL. There would be less things to learn, less chances to shoot ourselves in the foot.

What do you think?

March 13, 2012 10:01 AM
 

Paul White said:

Rishabh K,

Thanks.  It's the other way around: the cardinality estimate for COUNT is derived from statistics but SUM(1) results in a 30% guess.

Anyway, the reason for the difference is just that cardinality estimation knows how to predict a value for COUNT, but not for SUM(1), so it guesses.

I suppose cardinality estimation could treat the special case of SUM(1) as COUNT (though they produce different results on empty sets) but it doesn't.  More generally, SUM(1) is just a special case of SUM(constant) and there's no sensible way to handle statistics given a comparison on that result.

Finally, COUNT(1) and COUNT(*) are the same.  When you write COUNT(1) in a query, the query plan will show the aggregate as count star.

Paul

March 13, 2012 10:05 AM
 

Paul White said:

Hi Alex,

Good to hear from you.  I'm starting to think I over-emphasised the HAVING examples :)  Forgive me if I am misunderstanding you, but this post is not about HAVING - it is about the subtle semantic differences between aggregate queries with and without GROUP BY:

-- Try with the GROUP BY commented and uncommented

-- 23 versus 86 rows

-- No HAVING in sight!

SELECT

   p.Name

FROM Production.Product AS p

WHERE

(

   SELECT COUNT_BIG(*)

   FROM Production.TransactionHistory AS th

   WHERE th.ProductID = p.ProductID

   --GROUP BY p.ProductID

) < 10;

Anyway, yes I agree HAVING is pure sugar; we can always replace it with a filter over a derived table.  Nevertheless, it is standard SQL and hardly likely to be removed, whatever I think about it.

Paul

March 13, 2012 10:15 AM
 

Praveen kumar pddi said:

Hi Paul,

Can you try this variation of solution and provide your valuable inputs  by comparing existing solutions?

SELECT p.Name--,rn,cnt

FROM

(

select *

from

(

select *, row_number() over (partition by ProductID, cnt order by ProductID, cnt) as rn

from

(

   SELECT

       th.ProductID,

       cnt = COUNT_BIG(th.ProductID) over (partition by th.ProductID)

   FROM Production.TransactionHistory AS th

) AS q1

WHERE

   q1.cnt < 10 -- here , we are interested in mere 10  or less

) AS q1

WHERE

--above query returns multiple records for each productID , as we are fetching the data from transactional table

and rn=1

-- rn=1 will fetch one row per each product, in a way,  we are removing duplicates

) as q1

JOIN Production.Product AS p

   ON p.ProductID = q1.ProductID

March 19, 2012 9:45 AM

Leave a Comment

(required) 
(required) 
Submit
Privacy Statement