THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Joe Chang

SQL Server Fast Track Data Warehouse

This came out on the Microsoft website. I have not had time to look over the hardware configuration carefully. Of course the recommended storage is a SAN, vendors love to sell very high margin products when there are perfectly good merely high margin alternatives. Well atleast this one looked at sequential disk IO bandwidth, unlike past configs that were totally silly. HP config for the DL385G5p does 1.5GB/s, the DL585 3GB/s and the DL785 6GB/s. The Dell configs for PE2950 does 1.6GB/s and the R900 4.8GB/s. Gut feeling is the 2-socket systems are underpowered storage wise, but that what happens when you config expensive storage solutions.

http://www.microsoft.com/sqlserver/2008/en/us/fasttrack.aspx

Ok, I withdraw my initial complaint on the MSA 2000, depending on which model. I love direct-attach storage in non-cluster environments for brute force capability. So I like on the HP side, the MSA60 with LFF drives or the MSA70 with SFF drives (because of the high drive count, 25, on a single x4 SAS, this is really for 8K random IO, not large block or sequential). The MSA 60 base unit is $2,999. The MSA 2012sa single controller base unit is $4,499. The MSA 60 requires a RAID controller in the server. The MSA 2012sa has a built-in RAID controller, requiring an SAS HBA on the server. So the MSA 2000 SAS variant is only slightly more expensive than the pure DA solution. The MSA2000 FC variants are far more expensive and (while suitable for cluster environments which this is not) do not contribute to performance.

Intel vs. AMD

The Dell reference systems are on Intel Core2 architecture processors and the HP systems are on AMD Shanghai. For many people, the choice of Intel vs. AMD is highly emotional. Some feel they are Luke Skywalker battling the evil empire. Others feel compelled to quash rebellion in the ranks. I am of the Han Solo thinking ("I ain't in it for your rebellion honey, I'm in it for the money"?). From the technical perspective, the Core2 has the more powerful processor core for CPU intensive operations. Opteron has the integrated memory controller for faster serialized round-trip memory operations. At the 4-socket level, the AMD system has 8 memory channels versus 4 for systems based on the Intel 7300 chipset. The AMD 4-way systems have more IO bandwidth at the 4-way Intel 7300 chipset. One reason HP may have referenced the AMD Opteron line is to have a uniform line from 2-way to 8-way. Unisys now has an ES7000 for the Core 2 architecture. I would like to see the detailed system architecture for that, as well as run some IO bandwidth calibration tests. The HP Integrity line has outstanding IO capability, but is saddled with Itanium 2 on 90nm (soon to be 65nm?).

 

SAN versus Direct-Attach

I strongly prefer DA over SAN for DW because DA can achieve very high bandwidth at a low cost. A pair of Dell PowerVault MD1000 can support 1.6GB/s for about $12K. I am inclined to think the EMC Clarion CX4-240 with 2 dual-port 4Gbit/s FC HBAs, and 2 DAE will cost in the range of $30K. Yes, the SAN has lots of features, useful for clustered and transactional environments, but not really essential for DW. However, if one were intent on using SAN for DW, I would agree with this approach of achieving very high bandwidth using multiple entry/mid-level SANs rather a single high-end SAN.

 

Data Consumption Calculations

The reference configurations seem to be built around the calculated table scan data consumption rate of 200MB/sec per core. It is commendable that configurations be built around calculations. Also, most people cannot handle complexity, so calculations must be kept as simple as possible. However, it is grossly irresponsible to make the statement that table scan consumption rate is linear with the number of cores. It is highly dubious that parallel execution plan consumption rate was tested at all given the cited SELECT * OPTION (MAXDOP 1) method. An all rows returned SELECT * query is very unlikely to generate a parallel execution plan. This has to do with the formulas used by the internal SQL Server cost based optimizer (or execution plan cost formulas), as IO costs are not reduced, only CPU. The extra cost of the “Parallelism (Gather Streams)” operation inhibits the parallel plan.

Also, SELECT * FROM Table is not a proper test of data consumption, it is as much as a test of the ability of the client to receive data.

 

Anyways, I had previously reported precise measurements of the cost structure of SQL Server table scans. A rough model is that the base table scan (SELECT COUNT(*) or equivalent) depends on the number of pages and rows involved, and whether disk access is required. The cost per page (referenced a 2-way system with the Intel Xeon 5430, Core 2 architecture at 2.66GHz) is around 1 CPU-microsecond for in-memory. The test tables columns were all fixed length not null.

The cost per row per page is roughly 0.05 CPU-microseconds. So the in-memory table scan for a table with 20 rows per page works out to approximately 2 CPU-microseconds. This applies for a unhinted SQL query, for which the default for a table this size should be table lock. The cost of a table scan using row lock is much higher, possibly 1000 cpu-cycles per row (I reported on this long ago, in SQL Server 2000 day on the Pentium III, 4 and Itanium architectures).

 

Let me emphasize that these are measure values, and are known to be consistent over a wide range of conditions. It does not matter what one imagines the code to execute the above looks like. 

 

For large block disk access, the cost additional per page is approximately 4.3 CPU-micro-seconds. So the base table scan at 20 rows per page will run in the range of 4GB/sec in memory and 1.2GB/sec to disk on a single core. Note that I said for large block disk access. So I think this amortizes the disk IO cost over many pages (32-64?) and of course includes the cost of evicting a page from the buffer cache and entering the new page.

 

Notice I said base table scan, meaning more or less SELECT COUNT(*) or equivalent so that the execution plan is a table scan or clustered index scan, not a nonclustered index scan, but the formulas are approximately valid. It turns out that the cost of logic, i.e. AVG, SUM, and other calculations can be quite expensive. A single aggregate on an integer column appears to cost about 0.20 CPU-microseconds per row, meaning the 20 rows per page in-memory table scan cost is now 6 CPU-microseconds versus 2 for the base. The TPC-H query 1 has 7 aggregates, 3 multiplies and 3 add (or subtracts), yielding a net data consumption rate around 140MB/sec on a single core.

 

Storage Configuration

So should each application be tested to the actual data consumption rate for the most important aggregate query? I think this is too complicated and narrow. Even if the main query is complicated, it still helps to have brute force capability in the storage system. This is why my recommendation is to simply fill the available PCI-E slots with controllers, which are relatively inexpensive, and distribute many disks across the controllers, with consideration for the cost of the disk enclosures. This means avoiding the big capacity drives, i.e. skip the 300GB 15K drives. Last year, I recommended 73GB 15 drives. Today, there is little price difference between the 73G and 146G 15K drives, so go for the 146G 15K drives.

Memory

On memory, forget the stupid 4GB per core rule. I see no justification for it. Memory is cheap. Fill the DIMM sockets with the largest capacity memory module where the cost per GB is essentially linear. Two years ago, 2GB ECC DIMMs were around $200, and 4GB ECC were around $1000, so 2GB DIMMs were the recommendation then. I just checked on Crucial, today 4GB ECC is less than $200 each, while 8GB is around $850, so 4GB DIMMs are the recommendation for now.

Processors 

Finally, a note on recommended processors: Large data warehouse queries do benefit from high clock rate (only comparable within a micro-architecture, do not equate Opteron to Core 2 frequency). DW queries do not benefit from cache size. So if the same frequency is available with different cache sizes, at different prices, just be aware that processor cache size does not impact DW. Large processor cache does significant benefit high call volume transactional queries, so the higher price is more than justified there.

 

Reference Configurations with Direct Attach Storage

Below are my reference configurations on the assumption the data warehouse is not clustered. I have included prices obtained from the Dell and HP websites on 26 February 2009. If you can get Dell or HP to quote the price of SAN based storage as in the Microsoft/Dell/HP reference configurations

 

Dell PowerEdge 2950 III, 2x3.33GHz Xeon X5470, 32GB memory

2 PERC6/E controllers, $850 each

2-4 MD 1000 w/ 15 x 146GB 15K SAS drives, $7K each

Total cost: $22K with 30 disks, $36K with 60 disks

 

PowerEdge R900 4 x Six Core Intel Xeon X7460 2.67GHz, 16M, 128GB, $21,366

6 PERC6/E controllers, $5,100,

6 MD 1000 w/15x146G 15K, $43K

Total cost: $70K.

 

While I do agree in principle with the concept of balanced configuration, I ask the question: which is better? Each configuration is approximately $36K.

1 PE 2950 w 2x3.33GHz X5470, 32GB memory, 2 PERC6, 60 x 146G 15K disks

or 1 PE R900, 4x2.67GHz X7460, 128GB memory, 2 PERC6, 30 x 146G 15K disks

 

HP ProLiant configurations with Opteron (Shanghai, 6M L3) processors.

DL385G5p, 2 x Opteron 2384 2.7GHz, 32GB, 2 x 72GB 15K HDD, $7,282

DL385G5p, 2 x Opteron 2384 2.7GHz, 32GB, 16x72GB 15K HDD $12,500

DL 585G5, 4 x Opteron 8384 2.7GHz, 128GB, 2 x 72GB 15K, $21,500

DL 785G5, 8 x Opteron 8384 2.7GHz, 256GB, 2 x 72GB 15K, $60,600

MSA60, 12 x 146K 15K $7K ($5765 bundle price?)

Smart Array P800 $949

Smart Array P411 $649

 

The lowest cost DL385G5p DW configuration is to populate all 16 internal SFF drive bays, then attach 2 MSA 60, each with 12 15K disks. The preferred DL585G5 configuration is to populate 6 RAID controller, with 2 MSA 60 (directly connected, not daisy chained) on each controller in the 3 x8 PCI-E slots, and 1 MSA 60 for each controller in the x4 PCI-E slots.

On the DL785G5, place RAID controllers in each of the 3 x16 and 3 x8 PCI-E slots, each connecting 2 MSA 60, and connect a single MSA 60 for RAID controllers in the x4 PCI-E slots. Defer to Gunter Zink’s team for any more specific slot to disk configuration.

Published Tuesday, February 24, 2009 4:46 PM by jchang

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Kevin said:

I have a question about your computations of cost with SELECT COUNT(*). I am not a SQL Server internals expert, but it seems odd to me that any modern DBMS would have to peek at each row to know how many are in the block. Are you sure SQL Server doesn't just peek at the block header to get the count of rows in that block? If so, then each block read should be followed by at most 1 cache line load.

February 25, 2009 4:07 PM
 

jchang said:

apparently, SQL Server, like most transactional database engines, does not cheat, so there is nearly no difference between COUNT(*) and COUNT(colA). If you really wanted a cheap estimate of the row count, just look it up in sys.dm_db_partition_stats.

I there is a valid reason for this.

suppose your transaction depended on the count, then you don't want uncommited rows right?

Like I said about the aggregates being very expensive, far more expensive than compiled code, I would like see how it is actually implemented.

Anyways, welcome to the world of transactional databases

February 25, 2009 4:25 PM
 

Kevin said:

Thanks for the hospitality and quick reply! I hope it is OK if I submit another (stupid) follow-up if I need to....

February 25, 2009 5:25 PM
 

Kevin said:

jchang,

  At first I think I was speed-reading your comment. I don't think we are talking about the same thing. I agree that COUNT(*) and COUNT(colA) will have the same cost. I don't believe proves SQL Server needs to peek at every row in a block to know how many rows there are in the block. It should only have to look at the block header for that information. Every DBMS I've touched has that optimization.

February 25, 2009 6:40 PM
 

jchang said:

Ok, sure there are optimizations that could be done to improve the cpu efficiency of an approximate SELECT COUNT(*)

but the answer is nolonger reliable for transactions, as you would be including uncommitted transactions.

also, the SQL Server page structure has a header, then the row offsets at the end, so you would read the header, then count the number of offsets at the end

February 25, 2009 6:45 PM
 

Kevin said:

Thanks for answering my question, jchang. So, in your original post where SELECT COUNT(*) was the topic at hand you wrote:

"The cost per page (referenced to the Intel Core 2 architecture at 2.66GHz) is around 1 CPU-microsecond for in-memory. The cost per row per page is roughly 0.05 CPU-microseconds. "

Surely the header of a SQL Server page (block) fits in a single cacheline (64 bytes on x86_64 and AMD64 systems). Once the line is installed, surely counting the row pieces (in the header) are single clock operations. No? It sounds to me like you are costing nearly 133 clock cycles (50ns) per row per page in a SELECT COUNT(*)? I can see the 50+ns to load the line upon first reference, but walking a cacheline and doing a register-add is not 133 CPI effort.

Again, I'd like to point out that you wrote:

"A rough model is that the base table scan (SELECT COUNT(*) or equivalent) depends on the number of pages and rows involved,"

I'm asking whether you really think SELECT COUNT(*) costs more based on the number of rows in a page given your follow-up here about the fact that the number of rows is indexed in the header of the page (block).

I say it must surely cost about the same CPU to count rows with nearly empty pages (blocks) as fully packed pages. What am I missing?

February 25, 2009 7:09 PM
 

jchang said:

You keep thinking in terms of single-threaded code.

SQL Server is a transactional database engine, think in terms of multi-threaded code where each row access requires either a call to lock, or interlock. I have not looked at the SQL Server engine source code myself, you might kindly ask someone on the SQL engine team if you could take a wee little peek. Or you could look at an open source database.

But try writing multi-threaded C code, figure out the cost of Interlock and Lock.

February 25, 2009 11:06 PM
 

jchang said:

See if you can dig up an email for Yavor, I have never met him, but he is in my LinkedIn network

Yavor Angelov

Senior Program Manager

SQL Core RDBMS Team

if not, try the email link on:

http://blogs.msdn.com/sqlqueryprocessing/default.aspx

February 25, 2009 11:14 PM
 

Kevin said:

jchang,

 Thanks for the info. Since the topic at hand was the SQL Server 2008 Fast Track Data Warehouse program I'm surprised the thread ends with a caveat about the "transactional" nature of SQL Server as that attribute has nothing to do with block header row count optimizations.

February 25, 2009 11:24 PM
 

Kevin said:

"You keep thinking in terms of single-threaded code."

jchang,

 I'm not thinking about parallelism or serialized code execution. The topic at hand, which is whether count(*) touches every row piece in a SQL Server page, has nothing to do with whether an execution plan is parallel or serial. Thanks for the tips about investigating open source databases but that won't answer the question at hand which still revolves around your assertion that count(*) cost increases based on the number of rows in a page (block). Perhaps you can ask them. That will help substantiate your "hypothesis" about the 133 clock cycles per row counted in a count(*) operations.

February 25, 2009 11:30 PM
 

jchang said:

just because the title has Data Warehouse does not mean the transaction nature of the SQL Server engine can be turned off. this is why special non-transactional database engine can achieve much higher DW performance

February 25, 2009 11:32 PM
 

Mike said:

Hi Joe,

Does it surprise you that Itanium didn't make it into the reference list?

Mike

February 26, 2009 12:17 PM
 

jchang said:

on kevin's remark: My numbers are carefully measaured results. No one in 9 years have found substantial errors in my quantitative analysis. Many of my non-intuititive findings have been substantiated by others. I am not pulling numbers out of my behind, nor am I asserting what the SQL Server engine source code looks like. You are the one speculating what the code should be without ever looking at real code, and you fail to understand that the transactional nature of SQL Server, Oracle etc, cannot be disabled for DW queries.

Mike: I would not regard the omission of Itanium as a slight. Right now, Itanium is still dual core, which puts it at a disadvantage relative to quad-core Opteron and Xeon. Now Unisys has a 16-socket Xeon capable of 64-cores (72 with R2) posting a better TPC-H than a 64 core Itanium. There will probably remain a small niche in the ultra-high end for Itanium after Intel release Tukwila. Most of this is because HP has a very good chipset which can drive 20GB/s to disk. In which case, do bother with a reference configuration, for a machine that expensive, do the full custom analysis.

For a moderately exhorbitant fee, I can do this analysis completely independent of HP.

February 26, 2009 1:50 PM
 

Mario said:

I think Kevin's assumption is very valid.

A sqlserver datapage has a (96 bytes) pageheader with a 'slot count' that represents the number of rows, this is where sql server would look I assume. The next 'cheapest' thing would be counting all the row offsets in the offset table at the end of a datapage. Visiting each row just to count it, would require sqlserver to calculate the position of each row on the datablock using this offset table.

So it's highly unlikely that SQL Server would visit each row in order to count them.

February 26, 2009 2:13 PM
 

jchang said:

as in the Segal movie, "Assumption is the mother of all f#@kups"

February 26, 2009 2:43 PM
 

Kevin said:

jchang,

 You could actually take a measurement and put the speculation to rest. If you run a query that you **know** touches each row and contrast the cpu utilization to a simple count(*) you'll be out of conjecture and into science. Just make sure you measure CPU accurately.

 So, how about a select count(*) from tableA versus a select max(colA) from tableA? And, no, the difference in CPU will not be due to the *high cost* of the max() function as the code for max() is nothing more than compare and swap conditional. The difference in cost will be from loading the cachelines hosting colA in each row.

February 26, 2009 2:49 PM
 

jchang said:

The cost structure definitely is not consistent with just looking at the header. I have no idea what the actual code does. However reasonable it is to assume that the count of the offsets is sufficient, that does not mean that is how the code is actually implement.

Perhaps we could work with Microsoft to help them improve the performance of SELECT COUNT(*) FROM BigAssTable

I would rather focus on the performance of

SELECT AVG(Col1) FROM BigAssTable relative to the COUNT(*) only aggregate. The count only costs me roughly 0.05 us per row. I think is around 0.15-0.20 us per row. It gets a little tricky when there are multiple aggregate. If you want to bitch to MS, I would focus there.

I will now bitch about the nature of the SQL Server page. My understanding is that after the header, is the actual row data, filling from beginning to end. The row offsets start at the end and fill in backwards. Yes this is efficient from the administrative point of view, but modern microprocessors have deep pipelines, and it is absoultely essential to avoid memory stalls. Does the processor prefetch know that a code sequence reads in reverse order?

February 26, 2009 3:08 PM
 

jchang said:

like I have been trying to get through your skull,

these are measurements, and not just a single measurement, but across a wide range of rows/page, multiple variations on the aggregate.

speculation is for losers

and for your information,

SELECT MAX(Col1) costs about the same as SELECT COUNT(*), MAX(Col1).

I am damn careful about my CPU measurements. Today is not the first day I took a CPU measurement.

February 26, 2009 3:13 PM
 

Kevin said:

jschang,

 That pingpack from my blog is in error. I just and pasted in the wrong URL. Please forgive.

February 26, 2009 4:28 PM
 

mario said:

Joe, regarding:

as in the Segal movie, "Assumption is the mother of all f#@kups"

and speculation is for losers

I agree.

So that's why I fired up my little debugger and did some tracing (on my AMD Athlon PC, win x32, running SS2005 SP2).

And I was surprised to find you are actually right.

SqlServer *does* touch every row, in order to optimize this process it does use hardware prefetching, and it scans the rows using the row lenghts starting from the bottom of the page.

This is for counting.

For a max or avg it drags in all the column data using memcpys, and applies max or avg functions, the actually referencing the data makes the story of course much more expensive. Here it depends if prefetching does what it should do, how long the memory latencies are..are they local (in NUMA cases), and how costly the cache coherency process behind this is.

So the cost of accessing data depends also (much) on the architecture of you box and the amount of sockets.  And not only what you measure using on cpu. But since you say that you are damn careful and this is not your first day doing cpu measurements, i guess you are aware of all that..

February 26, 2009 4:58 PM
 

Kevin said:

Thanks Mario.

OK, I'm perfectly fine to accept the notion that SQL Server has no fast-path optimizations for count(*). No problem here.

February 26, 2009 7:19 PM
 

jchang said:

I am sorry all of this aggravated me so much that I was getting really pissy, good thing you were not in front of me. Anyways, I am not right because I never said SQL Server touches every row, I only said I measured approximately 0.05 micro-sec per row on a 2-way quad core Xeon 5430 2.66GHz.

Any I am very careful because you need to be sure what is it you are really measuring.

Let further add that I was testing a table with 10-18 fixed length not null columns. I just looked at the TPC-H Lineitem table and got different numbers, I will report this in another post.

Let me restate myself as I am never wrong (it was just the answer to a different question), speculation without verification is for losers. You can guess how much I trust other people.

Mario: I do have one question you can recall. What happens when the code issues a read of the last 2 bytes in an 8KB page (ie, the offset), does the prefech in fact get the last 64 byte (on the assumption each 8KB is aligned to 8KB or something) so that the load of the 2nd offset is already in cache?

Kevin: I looked at your site, so it is obvious you are a serious performance person. However, I have been burned repeatly on making assumptions, however logical.

so the question is now: Should SQL Server have a fast-path optimization per your description. See my next post regarding where effort should placed.

February 27, 2009 3:21 PM
 

Kevin said:

No blood, no foul. This whole topic of whether SQL Server has a count(*) fast path optimization (like Oracle) was a dovetail from the real topic which was my assertion that the Fast Track Warehouse program is most likely based on lightweight scans because if it was based on complex, concurrent DW/BI queries they'd discover that 200MB/s will totally hose down harpertown xeons. And I'm not talking trick, ueber-optimized H queries. I'm talking realworld style DW/BI queries that do rich filtration, sorting, wide joins, agg, etc.

Your apology about your aggravation level is accepted. No worries. Now that I have ample warning that you are given to violence I assure you I won't go out of my way to be within shooting distance.

Also, "speculation without verification" is not "for losers." In fact, that's essentially what a hypothesis is. Believing a hypothesis without validation is a dumb move though. There is nothing wrong with hypothesizing, as I did, that SQL Server must surely have a count(*) fast path optimization (like Oracle and most other databases I've dealt with). Dismissing the rhetorical question or hypothesis as absurd, on the other hand, was an irrational play, but you've stepped up an apologized.

Finally, regarding this quote from you:

"Let me restate myself as I am never wrong..."

After that, I know I'll never visit this blog again.

http://kevinclosson.wordpress.com/2009/02/25/1037/

February 27, 2009 5:03 PM
 

mario said:

regarding your prefetch question: I cannot see anything that points to prefetching the row offset table in the debugger, but it can happen out of my sight somehwere before. The prefetch I ran into (and the actual usage of that data later on) is a prefetch of the actual row itself, based on the calculation: page offset in db cache(8KB aligned)+row offset.

Your observation about going through the rowoffset table in reverse order is indeed interesting, in the light of 'prefetching'..

you give an address to a prefetch, and that is what it prefetches (for the lenght of a cache line) as far as I know.

February 27, 2009 5:54 PM
 

George Walkey said:

glad kevin self terminated

guessing IS a waste of time because guessing is not a line item sales people can sell. numbers actually are deliverables.

hypothesis are not. the TPC number are not the TPC "guesses"

If someone at this level complains that there are not quick answers to easily discoverables questions.....well, now you are wasting my time. In the past, when I asked someone a question Id first get a hourly rate quote for "exploratory surgery".

iow, Kevin, do it your own self, mo dean.

try Kalen Delaneys blog about the underlying reasons for MS engine code statements. She wrote 3 books on the engine.

and seems to have the email addresses to coders over there, but again she always gives the company line....

Joe uses the stuff so his experience is more respected.

at least by me, the small guy, the user.

you cant turn off certain parts of the engine

if you never run .NET code, does that part of the engine not load?

possibly, if its in a dll.

but the code for Select (cola) is probably the same code for Select *, nest pas?

March 7, 2009 3:11 PM
 

TS said:

Right now is a horrible time to buy servers because Nehalem-EP is about to be launched.  2S Nehalem-EP is likely going to be about the same as 4S Dunningtons and 4S AMD Opterons.  The only disadvantage is the 18 DIMM memory limit on Nehalem-EP vs 32 DIMMs on the 4S Dunningtons and AMDs.  But the price and power difference is too much to ignore.

Within a month, you can buy 1U 2S Nehalem-EPs, with 2x L5520s and 18x4GB DDR3 registered = 72GB respectable memory. (At 18 DIMMs, you will get 6 channels of ddr3-800 bandwidth which is more than the 8 channels of DDR2-533 from AMD.  This server will only consume 1.5Amps of power which is much better than 5-6Amps from the 4S solutions.

Storage Systems should be migrating to 2.5 inch drives for lesser power and space consumption.  Dell MD1120s and HP MSA70s are much better alternatives than their 3.5inch cousins.

In terms of OS, I advocate using OpenSolaris with PostgreSQL 8.3 but I know you are a MS SQL Server guy, so I will shut up now.

Anyways, nice blog.

March 9, 2009 8:45 PM
 

Mario said:

@frank: regarding: "but the code for Select (cola) is probably the same code for Select *, nest pas?"

Mais Non. I think that's what this whole story is about: don't assume (as in 'probably').

If 'cola' (in the count(cola)) doesn't have a 'not null' constraint, the code has to actually examine the content of the row. A count(*) on the exact same table is faster. On my test box (sqlserver 2005 build 3052) it takes about 3 seconds to logical read 151k blocks in memory to do the count(cola). Accessing the same blocks with a count(*) takes me somewhere between 1 and 2 seconds.

The actual codepath taken is different:

select count(*) code path starting in GetRow:

Tracing sqlservr!CQScanTableScanNew::GetRow to return address 01057382

  23     0 [  0] sqlservr!CQScanTableScanNew::GetRow

  13     0 [  1]   sqlservr!RowsetNewSS::ReleaseRows

  13     0 [  2]     sqlservr!HeapDataSetSession::ReleaseRow

  16    13 [  1]   sqlservr!RowsetNewSS::ReleaseRows

  32    29 [  0] sqlservr!CQScanTableScanNew::GetRow

  25     0 [  1]   sqlservr!CQScanRowsetNew::GetRowWithPrefetch

  28     0 [  2]     sqlservr!RowsetNewSS::GetNextRows

  14     0 [  3]       sqlservr!HeapDataSetSession::GetNextRowValues

 220     0 [  4]         sqlservr!HeapDataSetSession::GetNextRowValuesInternal

   5     0 [  5]           sqlservr!__security_check_cookie

 223     5 [  4]         sqlservr!HeapDataSetSession::GetNextRowValuesInternal

  16   228 [  3]       sqlservr!HeapDataSetSession::GetNextRowValues

  46   244 [  2]     sqlservr!RowsetNewSS::GetNextRows

  35   290 [  1]   sqlservr!CQScanRowsetNew::GetRowWithPrefetch

  36   354 [  0] sqlservr!CQScanTableScanNew::GetRow

390 instructions were executed in 389 events (0 from other threads)

compared with:

select count(c1) (nullable column)

Tracing sqlservr!CQScanTableScanNew::GetRow to return address 010438b3

  23     0 [  0] sqlservr!CQScanTableScanNew::GetRow

  13     0 [  1]   sqlservr!RowsetNewSS::ReleaseRows

  13     0 [  2]     sqlservr!HeapDataSetSession::ReleaseRow

  16    13 [  1]   sqlservr!RowsetNewSS::ReleaseRows

  32    29 [  0] sqlservr!CQScanTableScanNew::GetRow

  23     0 [  1]   sqlservr!CQScanRowsetNew::GetRowWithPrefetch

  26     0 [  2]     sqlservr!RowsetNewSS::FetchNextRow

  14     0 [  3]       sqlservr!HeapDataSetSession::GetNextRowValues

 306     0 [  4]         sqlservr!HeapDataSetSession::GetNextRowValuesInternal

   1     0 [  5]           sqlservr!memcpy

  32     0 [  5]           MSVCR80!memcpy

 324    33 [  4]         sqlservr!HeapDataSetSession::GetNextRowValuesInternal

   5     0 [  5]           sqlservr!__security_check_cookie

 327    38 [  4]         sqlservr!HeapDataSetSession::GetNextRowValuesInternal

  16   365 [  3]       sqlservr!HeapDataSetSession::GetNextRowValues

  43   381 [  2]     sqlservr!RowsetNewSS::FetchNextRow

  39   424 [  1]   sqlservr!CQScanRowsetNew::GetRowWithPrefetch

  36   492 [  0] sqlservr!CQScanTableScanNew::GetRow

528 instructions were executed in 527 events (0 from other threads)

Note the extra memcpy and the extra 150 or so instructions..

..not all counts are equal.

March 10, 2009 4:54 PM
 

mario said:

And now I am even confusing names: @frank must be @george..sorry!

March 10, 2009 4:55 PM
 

George Walkey said:

no problemo

my earlier point was that kevin was a SQL guy with his own blog,

asked a question he should have at least TRIED to get the answer himself, then didnt like the truthful answer Joe gave..

he was lazy

you are not

you traced the calls...cool

funny how the engine behaves huh?

how we have to amass a set of tips to use for each engine version...

March 10, 2009 8:38 PM
 

DavidBojsen said:

Dear Joe

Thank you for your nice informative blog.

We are about to engage in a fast track datawarehouse setup with Microsoft, and are looking at DAS alternatives to the SAN storage.

Quite like you explain here...

But we are finding it hard to get the storage vendor (HP) to help us with the performance reviews.

You suggest taking a HP DL585 G5 (we will be using the G6) and populating it with raid controllers in all availible PCI-E slots and attaching 2 MSA60 (one on each port) to the raid controllers in x8 slots and only one MSA60 in the x4 slots.  This is exactly what we have suggested to HP, but they refuse to garantue that this will give a throughput that is equal or better to using the MSA2212fc.

What is your take on this ?

/ David

September 14, 2009 4:23 AM
 

jchang said:

is he joking? the MSA2212fc? Actually I cannot find the MSA 2212, I see a 2012, and a 2312,

anyways, the 2300 is the controller, 2012 is the 12 bay LFF, and the 2024 is the 24 bay SFF, but HP MSA part numbering is really confusing.

Ask them

1. what is their guarantee good for,

2. how many GB/sec will their proposed solution deliver?,

3. is it fully refundable if they cannot deliver it?

4. have they actually configured a DW like this before?

5. What technical data can they provide to substantiate their guarantee? (not that I do not implicitly and completely trust their word, any more than I trust the next sales rep)

6. Why does their TPC-H benchmark team not use the MSA 2000 Gen 2 SAN?

I would suggest populating the 3 x8 slots, plus one of the x4 slot with RAID controllers for data. Possibly one more x4 for a controller dedicated to logs.

then use the 2 remaining x4 for dual port GbE or even 10GbE, as you will want to move data on and off really fast.

PS, compare the price of the 2 solutions,

take a guess as to the commission they will make for selling direct attach MSA versus MSA SAN, guess which has higher gross margins,

funny how this may skew recommendations in the complete absence of technical data

September 14, 2009 12:22 PM

Leave a Comment

(required) 
(required) 
Submit

About jchang

Reverse engineering the SQL Server Cost Based Optimizer (Query Optimizer), NUMA System Architecture, performance tools developer - SQL ExecStats, mucking with the data distribution statistics histogram - decoding STATS_STREAM, Parallel Execution plans, microprocessors, SSD, HDD, SAN, storage performance, performance modeling and prediction, database architecture, SQL Server engine

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement