THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Page Free Space

See also my SQL Server technical articles on SQLperformance.com

SQL Server Bug: Slow T-SQL Sums and Averages

Published Sunday, February 27, 2011 1:07 PM by Paul White

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Alejandro Mesa said:

Paul,

Great post, as always.

There is no doubt about your last statement, and I always encourage database developers to kick it up a notch by learning about the internals as much as they can.

--

AMB

February 27, 2011 2:52 PM
 

Fabiano Amorim said:

So far the best post I read in the year, thanks for that.

February 28, 2011 2:55 PM
 

Fabiano Amorim said:

Paul, I just realized that it only happens with "Scalar Aggregations"... because it has to deal with the NULL return... so the question is, WHY scalar aggregations has to return NULL if null don't means nothing:-( ?

If it needs to return something why not return zero?

Do you know if it is a a Standard thing?

Thanks

February 28, 2011 3:13 PM
 

Alexander Kuznetsov said:

Very interesting read, Paul, thanks for writing! I was wondering if it was a real life problem that prompted this research?

Regarding the following Fabiano's observation: "because it has to deal with the NULL return", maybe COALESCE(SUM(...),0) instead of SUM(...) might eliminate this counting, because there is no need to return a NULL?

February 28, 2011 10:31 PM
 

Paul White said:

Hey Fabiano,

Thanks very much - yes, it is in the SQL standard that a SUM of an empty set is NULL.  There's probably a good reason for it, but perhaps it is just a 'convenience' thing, I'm not really sure.

Paul

February 28, 2011 10:55 PM
 

Paul White said:

Hi Alex,

No, it wasn't a real-life problem - it came out of some research I was doing for an article on parallelism.  One test (with MIN or MAX) ran twice as fast as when I used SUM, so I started to look deeper...

You would think that ISNULL or COALESCE would eliminate the unnecessary count, but it doesn't.  This is something I tested at the time - the optimizer does not realize that the count is just there to decide about a NULL return, and simply wraps the final scalar computation with ISNULL (or the CASE expression COALESCE expands to).

As I mentioned in the main text, a better solution might have been for the aggregate to set a flag to true as soon as it encounters the first row.  There's no need to perform a full count - the final Compute Scalar just needs to know if at least one row was seen.

There is no current build of SQL Server that does not suffer from this bug, so it is something to be aware of when using SUM or AVG on a computed expression, especially if it is CPU-intensive.

Thanks for the comment.

Paul

February 28, 2011 11:02 PM
 

Brad Schulz said:

I've been incredibly busy during the last several weeks and have not done any blog reading at all, until today.  Naturally yours was the first one I caught up on, and your last 6 articles are brilliant as always.

Interesting about the Compute Scalar not really executing its expressions as rows flow through it... perhaps this is why estimated query plans always have the thinnest arrow possible flowing out of the left of the Compute Scalar even if the flow coming into its right is huge... it's really just kind of an "adjunct" of the upline operator.

Thanks again for all the incredible content!

Best...

--Brad

March 1, 2011 3:30 PM
 

Kendra Little said:

I always enjoy reading your posts--  I learn not only about the specific topic, but also about how to read plans better.

Your code samples and images are really presented beautifully, by the way. Not only looks good, but makes it easier to follow along.

March 1, 2011 8:35 PM
 

Paul White said:

Hi Brad,

Compute Scalar is a bit of an odd-ball, it doesn't really exist in the executable plan in the same way as more 'real' iterators like Sort or Join, so there aren't any statistics for actual rows, so you only ever see the estimated rows inherited from the operator below it in the tree.

Paul

March 2, 2011 4:30 AM
 

Paul White said:

Hey Kendra - I just noticed I forgot to reply to you!  I'm becoming more aware of the need to explain how execution plans work in my posts, so thanks for mentioning that; I'll certainly bear it in mind for the future.

The comments about presentation are especially appreciated - I am still in total awe of the talent you showed in the amazing Isolation Levels broadsheet you did!

Paul

March 3, 2011 7:12 AM

Leave a Comment

(required) 
(required) 
Submit
Privacy Statement