THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Joe Chang

Venting at SAN vendors

I was just about ready to unleash a long accumulating stream of rants against SAN vendors for pushing seriously obsolete computers as powerful storage systems. Of course, in a final check of products specs, I saw that EMC just announced the new Clariion CX4 line. The previous CX3 line was built around the Intel E7520 chipset, which was a 2H 2004 product for the old NetBurst processors. A SAN does not need top line CPU power, but I wanted the Intel 5000P or even better, the 5400 chipset, with 2 x 1066 or 1333MHz FSB and 20GB/s memory bandwidth, not the single 800MHz FSB and 6.4GB/sec memory bandwidth on the E7520. A low voltage Core2 Xeon would be a good match to keep power down (never mind, 50 or 65W does not matter). A SAN should not need a full blown 3GHz Quad-Core. Dual core is fine.

The new CX4 line uses the current generationCore2  architecture processors. The low-end CX4-120 has a single 1.2GHz dual-core, the CX4-240 has 1 x 1.6GHz DC, the CX-480 1 x 2.2GHz DC and the top of the line CX4-960 has 2 x 2.33GHz quad core. As I said above about a SAN CPU needs, but where did the 1.2GHz come from? The lowest Xeon bottoms out at 1.6GHz. The very good E5205 1.86GHz has an Intel list price of $177. Not only that, on single socket systems, the E3110 3GHz is listed at $167. The desktop E7200 2.53GHz is $113. Considering that the CX4-480 is not a chump change system, it should have 2 dual core procs (1333MHz FSB) to be able to drive the full bandwidth of 4 FB-DIMM 667MHz memory channels. Consider what happens on a disk read. The data block read is not sent straight from disk to host, it is first written to memory regardless of cache settings(?), then read from memory, and finally sent to host. For this, I want both FSB processor sockets populated. A quad-core on one socket will only have half the FSB bandwidth as 2 dual-cores, one socket attached to each of the two FSB on the 5000/5400 chipset.

The 960 does has two processor sockets populated to utilize the full memory bandwidth. I am not really sure why the CX4-960 needs quad-core. Does the Clariion line have some capability to do compression? It would be nice to off-load this from the host server. I did work on the LiteSpeed compression engine. I always thought it would be good to build a fully multi-threaded version of winzip, with control of intermediate file placement, the buffering flags, and special capability for super fast network transfers.

I was going to really gripe hard on the memory in the CX3, given that a pair of 1GB DDR2 ECC DIMMs now costs $99, and a pair of 2GB $240. Back when the CX3 was launched, memory was more expensive, but they could have done a product refresh, especially considering what vendors typically charge for SANs. Its really stupid that the CX3-10 only had 1 GB per SP, and the CX3-20 only 2GB per SP, expecially considering of that, only 310MB and 1053MB respectively, are available for cache, the rest is required for the SAN OS and other software. The new high-end CX4-960 is listed at 32GB, meaning 16GB in each SP. The CX4-480 has 8GB per SP, the CX4-240 has 4GB per SP and the CX-120 3GB per SP.

The SAN is a computer system in itself, with operating system etc, that needs memory. The memory actually available for cache, from high to low are 10.7GB, 4.5GB, 1.2GB and 600MB. Now I have said before that read cache is worthless except a small amount for read-ahead in sequential ops. I do like write caching, to handle T-Log backups, checkpoints and tempdb surges. So allocating cache by LUN is important. Given that 4x2GB FB DIMM today is $440, the low-end should have started at 4GB per SP, with 8GB in the 240, 16 in the 480 and 32-64GB per SP on the high-end. Atleast a proper performance analysis should be done.

On backend FC ports, the 120 has 2 total, 1 per SP, the 240 has 2 per SP, the 480 4 per SP, and the 960 supports a maximum of 8 per SP (not all of which are populated in the base configuration). The 960 doubles the backend FC ports over the previous high-end CX3-80. The max front-end FC ports in the 480 is 8 and in the 960 is 12. Now this is interesting because this means each 960 SP has a maximum of 20 FC ports, 12 to the front, and 8 to the back. I find this interesting because each x4 PCI-E slot should be matched with one dual-port 4Gb/s FC controller.

The Intel 5000P/X has 28 PCI-E lanes, which can be configured as 7 x4 PCI-E ports. The 480 uses one x4 PCI-E channel for inter-SP communication and write cache mirroring, the 960 uses an x8 PCI-E. This means 7 x4 PCI-E slots for the 480 and 10 x4 + 1x8 PCI-E slots (or 48 lanes) for the 960. The 480 could use the the 5000P chipset. I am wondering if the 960 uses the Intel 5400 chipset with 40 PCI-E lanes. If there are 8 back-end ports, then there is no need to support the full bandwidth of 12 front-end ports. A PCI-E expand, ie, 2 slots sharing the bandwidth of 1, could be used for the extra slots.

Another item is that EMC says the CX4 will support future 8Gbit/sec FC. It helps that the FC module can be changed without opening the box, like the industrial PC devices. Of course, its not like people can't be trusted to open a box and change a FC HBA? (negate this, I see their point). Oh yeah, the PCI-E gen 1, 2.5Gbit/sec x4 slot is nicely matched with a dual port 4Gb/s FC HBA. A dual port 8Gb/s FC would overwhelm the x4 PCI-E gen 1 slot. Is the complete CX4 line on the Intel 5400 chipset? This would be really nice. Intel lists the 5400 as supporting PCI-E gen 2 (5Gbit/s) with only the x16 slot configuration. It could have been that only x16 graphics cards were available at the time of the 5400 validation, and now PCI-E gen2 FC HBAs are now validated.

I do like that Flash disks are now an option, EMC is listing 30X IOPS of a 15K drive. I assume the reference point should be the full-stroke queue-depth 1 IOPS for a 15K drive, which is 225/sec, meaning on the order of 6,750 IOPS per flash drive. Thats nice. The listed response time is 1ms. The response times for a bare SLC NAND flash drives is in the range of 10-30 micro-sec. I will assume that EMC is using such type of drives, but that the overhead of crossing from host to switch to SAN (via front-end FC HBA to memory to CPU to backend FC to Flash drive and back again is responsible for the 1ms latency. I do want sub-200micro-sec latency of my flash drive. The main reason I want a flash drive is for logs. We all know that a single pair of hard drives in RAID 1 can handle most data log loads (50-100MB/sec). But what do we do about multiple highly active databases (>200tx/s) on a single systems. Many people mistakenly cite the rule of separate disks for data and logs means all data on one set, and all logs on another set. It is not practical to have something like 50 pairs of disks dedicated for logs. So I wanted to look in to flash drives for this. Flash would also solve the problem of T-Log backups, but I think a decent write cache algorithm could do it too.

BTW, It does seem that EMC now has a half-way decent set of papers discussing proper configuration for SQL Server (and other RDBMS), unlike the rubbish SAN vendors spewed in the 2002/3 timeframe, when almost everyone who followed vendor recommendations ended up with a horribly crippling performance. However, it is still clear that the SQL papers do not show deep database specific performance expertise, like they have never run a TPC-C, E or H, particularly the differences in C/E versus H strategy. Even more important is examining configurations for best flat-out performance versus best price-performance, ie, short-stroke low queue op versus high queue (still short stroke). What this means is there is not one set of rules for best practices, but 2 or 3 depending on the intent.

Final note. several years ago, I noted that each 2Gb/s FC port could drive 195MB/sec between host and SAN cache, but only 165MB/sec for the round-trip from host to SAN to back-end port disks, back to SAN, and finally back to host, with may a stay in the cache. The 4Gbit/s FC could 330MB/sec to disk. I thought it was just the complete round trip. EMC discusses their Ultrapoint technology DAE, which has a star topology instead of loop. This apparently allows reaching the full rated bandwidth of FC.

Published Wednesday, August 6, 2008 5:40 PM by jchang
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS


No Comments

Leave a Comment


About jchang

Reverse engineering the SQL Server Cost Based Optimizer (Query Optimizer), NUMA System Architecture, performance tools developer - SQL ExecStats, mucking with the data distribution statistics histogram - decoding STATS_STREAM, Parallel Execution plans, microprocessors, SSD, HDD, SAN, storage performance, performance modeling and prediction, database architecture, SQL Server engine

This Blog


Privacy Statement