I'm not sure I see what you mean about never skipping the GROUP BY clause. There are certainly times when a scalar aggregate is what is required (counting all the rows in a table for example). There are also times when the scalar semantic is necessary for correct results. It's all about being aware of the differences.

With COUNT_BIG(*), cardinality estimation calculates an estimate from (derived) statistics. Substituting SUM(1) results in a guess of 132.3 rows (30% of the estimated 441 rows from the aggregate).

I really enjoy reading your post. I think that with Sum(1) the cardinality estimate is calculated as All density for ProductID * No of rows after stream aggregation ,i.e, 0.002267574*441~ 1. Its fine but I wonder why there is difference between count_Big and sum with regard to cardinality estimation. I tried with count(1) but still the same estimations

I enjoyed reading your post, but I think HAVING clause is redundant. As such, it should be eliminated - that would simplify and improve SQL. There would be less things to learn, less chances to shoot ourselves in the foot.

Thanks. It's the other way around: the cardinality estimate for COUNT is derived from statistics but SUM(1) results in a 30% guess.

Anyway, the reason for the difference is just that cardinality estimation knows how to predict a value for COUNT, but not for SUM(1), so it guesses.

I suppose cardinality estimation could treat the special case of SUM(1) as COUNT (though they produce different results on empty sets) but it doesn't. More generally, SUM(1) is just a special case of SUM(constant) and there's no sensible way to handle statistics given a comparison on that result.

Finally, COUNT(1) and COUNT(*) are the same. When you write COUNT(1) in a query, the query plan will show the aggregate as count star.

Good to hear from you. I'm starting to think I over-emphasised the HAVING examples :) Forgive me if I am misunderstanding you, but this post is not about HAVING - it is about the subtle semantic differences between aggregate queries with and without GROUP BY:

-- Try with the GROUP BY commented and uncommented

-- 23 versus 86 rows

-- No HAVING in sight!

SELECT

p.Name

FROM Production.Product AS p

WHERE

(

SELECT COUNT_BIG(*)

FROM Production.TransactionHistory AS th

WHERE th.ProductID = p.ProductID

--GROUP BY p.ProductID

) < 10;

Anyway, yes I agree HAVING is pure sugar; we can always replace it with a filter over a derived table. Nevertheless, it is standard SQL and hardly likely to be removed, whatever I think about it.

## Comments

## PassedBI said:

GJ! It was interesting to follow your examples. Dangerous subtle nuance, though I never skip GROUP BY clause.

## Alejandro Mesa said:

Paul,

Nice reading for a Monday morning :)

Talking about fun, I wonder why the cardinality estimate, out from the filter, changes if we use SUM(1) instead COUNT_BIG(*) in the following query.

SELECT

p.Name

FROM

(

SELECT th.ProductID

FROM Production.TransactionHistory AS th

GROUP BY th.ProductID

HAVING SUM(1) < 10

) AS q1

JOIN

Production.Product AS p

ON p.ProductID = q1.ProductID;

--

AMB

## Paul White said:

PassedBI,

I'm not sure I see what you mean about never skipping the GROUP BY clause. There are certainly times when a scalar aggregate is what is required (counting all the rows in a table for example). There are also times when the scalar semantic is necessary for correct results. It's all about being aware of the differences.

Paul

## Paul White said:

Hi AMB,

With COUNT_BIG(*), cardinality estimation calculates an estimate from (derived) statistics. Substituting SUM(1) results in a guess of 132.3 rows (30% of the estimated 441 rows from the aggregate).

Paul

## Alejandro Mesa said:

Thanks, Paul!

## PassedBI said:

Paul White,

I never skip GROUP BY clause

- when using HAVING condition - should I write.

Just never had the case when need use HAVING without GROUP BY.

## Rishabh K said:

Hi Paul,

I really enjoy reading your post. I think that with Sum(1) the cardinality estimate is calculated as All density for ProductID * No of rows after stream aggregation ,i.e, 0.002267574*441~ 1. Its fine but I wonder why there is difference between count_Big and sum with regard to cardinality estimation. I tried with count(1) but still the same estimations

SELECT th.ProductID,COUNT_big(1),SUM(1)

FROM Production.TransactionHistory AS th

GROUP BY th.ProductID

having count_big(1)<10

Thanks for all your great articles

## AlexK said:

Paul,

I enjoyed reading your post, but I think HAVING clause is redundant. As such, it should be eliminated - that would simplify and improve SQL. There would be less things to learn, less chances to shoot ourselves in the foot.

What do you think?

## Paul White said:

Rishabh K,

Thanks. It's the other way around: the cardinality estimate for COUNT is derived from statistics but SUM(1) results in a 30% guess.

Anyway, the reason for the difference is just that cardinality estimation knows how to predict a value for COUNT, but not for SUM(1), so it guesses.

I suppose cardinality estimation could treat the special case of SUM(1) as COUNT (though they produce different results on empty sets) but it doesn't. More generally, SUM(1) is just a special case of SUM(constant) and there's no sensible way to handle statistics given a comparison on that result.

Finally, COUNT(1) and COUNT(*) are the same. When you write COUNT(1) in a query, the query plan will show the aggregate as count star.

Paul

## Paul White said:

Hi Alex,

Good to hear from you. I'm starting to think I over-emphasised the HAVING examples :) Forgive me if I am misunderstanding you, but this post is not about HAVING - it is about the subtle semantic differences between aggregate queries with and without GROUP BY:

-- Try with the GROUP BY commented and uncommented

-- 23 versus 86 rows

-- No HAVING in sight!

SELECT

p.Name

FROM Production.Product AS p

WHERE

(

SELECT COUNT_BIG(*)

FROM Production.TransactionHistory AS th

WHERE th.ProductID = p.ProductID

--GROUP BY p.ProductID

) < 10;

Anyway, yes I agree HAVING is pure sugar; we can always replace it with a filter over a derived table. Nevertheless, it is standard SQL and hardly likely to be removed, whatever I think about it.

Paul

## Praveen kumar pddi said:

Hi Paul,

Can you try this variation of solution and provide your valuable inputs by comparing existing solutions?

SELECT p.Name--,rn,cnt

FROM

(

select *

from

(

select *, row_number() over (partition by ProductID, cnt order by ProductID, cnt) as rn

from

(

SELECT

th.ProductID,

cnt = COUNT_BIG(th.ProductID) over (partition by th.ProductID)

FROM Production.TransactionHistory AS th

) AS q1

WHERE

q1.cnt < 10 -- here , we are interested in mere 10 or less

) AS q1

WHERE

--above query returns multiple records for each productID , as we are fetching the data from transactional table

and rn=1

-- rn=1 will fetch one row per each product, in a way, we are removing duplicates

) as q1

JOIN Production.Product AS p

ON p.ProductID = q1.ProductID