THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Peter DeBetta's SQL Programming Blog

Peter DeBetta's blog about programming in SQL Server using technologies such as T-SQL, .NET, CLR, C#, VB, Visual Studio, and SQL Server Management Studio.

Medians, Actual Query Cost, and Statistics

I was going to comment on Adam's post "Medians, ROW_NUMBERs, and performance" (which was a follow-up to Aaron's post "Take the high road to middle ground") and I realized that I needed to show some code and alas, comments are not so nice about formatting code, so I am posting a follow-up post...

At the PASS Summit back in November, Itzik and I were examining this issue of the QP doing the table spool when one would think it shouldn't need to and we were both stumped as to why. But further examination revealed some strangeness: Even though the logical reads for Ben-Gan's solution is quite high, the actual query cost is less than Celko's solution. It makes one wonder: Which is the better indicator of better perfromance - actual query cost or statistics such as time and IO? I have seen people use actual query cost as the basis for choosing one solution over another. Is using actual query cost a valid way of deciding which solution to choose? Like I said, it makes one wonder....

For Ben-Gan's solution, I also was dissatisfied with the QP's choice in doing the table spool. And so I decided to create a variation on Ben-Gan's solution that calculated the count for each CustomerID separately. Although the logical reads is higher than Celko's solution, it is only 748 as compared to the 703 of Celko's solution; and better yet, the execution time is [on average] just slightly better and the query cost is always better (37% versus 63%). Put a supporting index in place (CustomerID, TotalDue) and this solution's logical reads drops to 121 (versus 76 for Celko's) but is consistently better in execution time and actual query cost (12% versus 88%)

Here is the query:

;with c as
  (SELECT CustomerId , TotalDue, 
    Row_Number()
      OVER (PARTITION BY CustomerId ORDER BY TotalDue) as rn
  FROM Sales.SalesOrderHeader
)
,
d as
  (SELECT CustomerId ,
    (COUNT(*)+1) /2 as cnt1,
    (COUNT(*)+2) /2 as cnt2
  FROM Sales.SalesOrderHeader
  GROUP BY CustomerID
)
SELECT c.CustomerId, Avg(c.TotalDue) AS Median
FROM C
  INNER JOIN d
    ON c.CustomerId = d.CustomerId AND c.rn IN (d.cnt1, d.cnt2)
GROUP BY c.CustomerId

I agree with Adam that cost-based optimization is far from perfect. But I don't like the inconsistency in the various results of STATISTICS IO, STATISTICS TIME, and actual query cost. And so what is the best way to decide which query performs better and which will scale better: STATISTICS IO, STATISTICS TIME, actual query cost, a combination of all of these, or perhaps something else (such as a detailed analysis of the actual query plans)?

Published Wednesday, December 20, 2006 5:48 AM by Peter W. DeBetta
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Adam Machanic said:

Hi Peter,

I do not believe there is any such metric as "actual query cost".  Query cost is nothing more than the estimated time, in seconds, that the QO thinks it will take to run the query on a certain computer that was used in the MS labs during the SQL Server 7.0 development phase. So to get "actual cost" you'd have to get that computer and run the query there.  That's why the cost and the actual runtime / number of I/Os are so different--estimates can be quite misleading, especially in the case of nested loops and table spools.  If the optimizer doesn't have quite enough information, it might think it will only have to do a certain loop a few times, and in reality may have to do millions of iterations (resulting in a huge amount of I/O and horrible perf)... In one case a couple of years ago, I recall tuning a query that was estimated at 0.09, and that was taking hours to run.  The solution I came up with gave an estimate in the 2000-range, yet ran in a couple of seconds!

In most query tuning situations I encounter, I/O is the most important metric.  It's much more common to see I/O-bound servers than CPU-bound servers, especially with badly-designed, under-normalized databases that have a lot of needless repetition.
December 20, 2006 10:19 AM
 

JayMin said:

How well does the sql2005 stack up to the sql 2000 rownumber() ???

What is really the fastest paging mechanism and under which cases???

Can ANYONE REALLY ANSWER THIS?

April 24, 2007 7:12 AM

Leave a Comment

(required) 
(required) 
Submit

About Peter W. DeBetta

Peter DeBetta works for Microsoft as an Application Platform Technology Strategist, specializing in strategies, design, implementation, and deployment of Microsoft SQL Server and .NET solutions. Peter writes courseware, articles, and books – most recently the title Introducing SQL Server 2008 from Microsoft Press. Peter speaks at conferences around the world, including TechEd, SQL PASS Community Summit, DevTeach, SQL Connections, DevWeek, and VSLive!

When Peter isn’t working, you can find him singing and playing guitar (click here to hear an original song by Peter), taking pictures, or simply enjoying life with his wife, son, and daughter.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement