In the previous post, I mention that in an earlier post on IO cost structure, that the cost of a table scan was approximately 1 CPU-micro sec per page, plus approximately 0.05 us per row for just a SELECT COUNT(*)
This was calibrated on a 2-way Xeon 5430 Quad-Core 2.66GHz. I should also add that the test tables have a clustered primary key, with between 3-18 columns, all fixed length not null, varying row density per page from 1 to 323. Without hints, the default locking level should be TABLOCK (?) and Max Degree of Parallelism set to 1.
I just looked a table scan query to the TPC-H LineItem table at SF10 and saw significant differences. At SF10, using the SQL Server 2008 table, ie, DATE type instead of DATETIME, there are 59,986,052 rows, and 1,127,206 leaf level pages, averaging 53.2 rows per page. By my formula, I would expect the SELECT COUNT(*) FROM LineItem with a forced clustered index scan would cost around 3.5 micro-sec per page, instead I observed 5.8 micro-sec per page.
So why is the scan on LineItem much more expensive than the calibration from my tests? Is it because the LineItem table has 1 varchar(44) column? or because the LineItem clustered index is not unique? I need to find out, as you heard what I said about people who speculate, without investigation.
So additional information on aggregation: SUM(float) costs the same as SUM(int), there is nearly no difference between SELECT COUNT(*), SUM(float) and SELECT SUM(float), so if you aggregate any column, the count is free. Anyways, the first SUM is about 0.20 micro-sec per row on top on the COUNT(*). A SELECT MAX() is just slightly less expensive than a SUM(), around 0185 micro-sec per row. Adding a second, third or fourth SUM() or AVG() contributes about 0.05 micro-sec per col aggregated. However, SELECT SUM(Col1), AVG(Col1) does not incur the extra column penalty, SELECT SUM(Col1), MAX(Col1) does incur the additional col cost.
thanks to Mario (http://sqlinternals.com/) for looking at what the SQL Server engine actually does in SELECT COUNT(*) FROM Table, and to Kevin who is big into what code paths should be,
and likes to count CPU-cycles (might be an indication of a lack of social life,
of course if chicks like to count cpu-cycles, I might get out more myself)
3 Billion cycles per second, and he still wants to nickel and dime the 133 per row.
So that's why I fired up my little debugger and did some tracing (on my AMD Athlon PC, win x32, running SS2005 SP2).
And I was surprised to find you are actually right.
SqlServer *does* touch every row, in order to optimize this process it does use hardware prefetching, and it scans the rows using the row lenghts starting from the bottom of the page.
This is for counting.
For a max or avg it drags in all the column data using memcpys, and applies max or avg functions, the actually referencing the data makes the story of course much more expensive. Here it depends if prefetching does what it should do, how long the memory latencies are..are they local (in NUMA cases), and how costly the cache coherency process behind this is.
So the cost of accessing data depends also (much) on the architecture of your box and the amount of sockets. And not only what you measure using on cpu. But since you say that you are damn careful and this is not your first day doing cpu measurements, i guess you are aware of all that..
the parts that I highlighted above imply that scaling is not linear, ie, a test on one processor, or with OPTION (MAXDOP 1) does not scale with the number of processors, either using a single query with a parallel execution plan, or from multiple concurrent non-parallel execution plans.