<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tag 'Hardware'</title><link>http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=Hardware&amp;orTags=0</link><description>Search results matching tag 'Hardware'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>Storage Performance</title><link>http://sqlblog.com/blogs/joe_chang/archive/2013/03/24/storage-performance.aspx</link><pubDate>Mon, 25 Mar 2013 03:52:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:48393</guid><dc:creator>jchang</dc:creator><description>&lt;p&gt;Storage has changed dramatically over the last three years driven by SSD developments.  
Most of the key components necessary for a powerful storage system are available
and the cost is highly favorable for direct placement of data files. 
Some additional infrastructure elements could greatly enhance the flexibility of storage systems with SSDs. There is still some discussion on whether SSD should interface directly to PCI-E 
or continue using the SAS/SATA interfaces originally designed for hard disks. 
New products coming this year include Express Bay, an ecosystem of connectors allowing 
both PCI-E and SAS/SATA to co-exist until a clear direction is established. 
Also expected in the coming year are PCI-E SSDs based on the NVM Express interface.
&lt;/p&gt;

&lt;h4&gt;System Level&lt;/h4&gt;
&lt;p&gt;
The Intel Xeon E5 processors, codename Sandy Bridge-EP, have 40 PCI-E gen 3 lanes&amp;nbsp;on each processor socket. Even though PCI-E gen 3 is 8GT/s, a change in the encoding means that the usable bandwidth is double that of PCI-E gen2 at 5GT/s. 
The net realizable bandwidth of a PCI-E gen 3 x8 slot is 6.4GB/s versus 3.2GB/s for gen 2.
&lt;/p&gt;

&lt;p&gt;
The unfortunate aspect is that the major server vendors all implement a mix of x16 and x4 slots, 
while the HBA vendors seem to be concentrating on products for PCI-E x8. 
Only Supermicro has a system with 10 &lt;strike&gt;8&lt;/strike&gt; PCI-E gen 3 x8 slots. 
Could a vendor put 2 HBA/RAID Controllers designed for x8 onto a single card for a x16 slot? 
Perhaps the new Express Bay form factor will have some means to use x16 slots?
&lt;/p&gt;

&lt;p&gt;&lt;img alt="Sandy Bridge EP 2-socket" src="http://www.qdpma.com/SystemArchitecture_files/SandyBridgeEP2f.png"&gt;&lt;br&gt;&lt;/p&gt;

&lt;p&gt;
Another disappointment is that the 4-socket Xeon E5-46xx systems only connect half of the PCI-E lanes. This might be because the base system configuration is 2-socket populated. 
If a full set of slots are provided, there would no connection to half of the slots unless all four sockets are populated. But this is also an issue on the 2-socket systems if only 1 socket is populated.
&lt;/p&gt;
&lt;h4&gt;Direct-Attach&lt;/h4&gt;
&lt;p&gt;For the most part, I will discuss direct-attach storage configuration, 
as we can pick and choose among the latest components available. 
Technically, direct-attach with SAS can support  a 2-node cluster, 
but few system vendors promote this configuration. 
Dell sells the MD3200 as direct-attach storage supporting 4 hosts in a cluster (or not), 
but technically it is a SAN that just happens to use&amp;nbsp;SAS interfaces on both the front-end and back-end.
&lt;/p&gt;

&lt;p&gt;The objective in the baseline storage configuration below is to achieve 
very high IO bandwidth even in the low capacity configuration. 
Of course it will also have very high IOPS capability because the main elements are SSD. 
My recommended storage system has both SSD and HDD in each IO channel.
&lt;/p&gt;

&lt;p&gt;&lt;img width="600" height="300" alt="PCIe" src="http://www.qdpma.com/Storage_files/Config_2013b.png"&gt;&lt;br&gt;&lt;/p&gt;

&lt;p&gt;
This intent is to place the main databases on SSD and use the HDD for backups and for restore verification. For an Inmon style data warehouse, the HDD might also be used for older data. 
The reason for having both SSD and HDD on each IO channel is to take advantage of simultaneous bi-directional IO. On a database backup, the IO systems reads from SSD, and simultaneously writes to HDD.
&lt;/p&gt;

&lt;p&gt;
4 RAID Controllers, 4GB/s per controller, 16GB/s total IO bandwidth&lt;br&gt;
4 Disk enclosures (yes, I am showing 8 enclosures in the diagram above)&lt;br&gt;
4 x 16 = 64 SSD&lt;br&gt;
4 x 8 = 32 (10K) HDD 
&lt;/p&gt;

&lt;p&gt;
The standard 2U enclosure has 24 x 15mm bays. My preference is for a 16 SSD and 8 HDD mix. 
With the small capacity 100GB SSD, there will be 1.6TB per enclosure and 6.4TB over 4 enclosures before RAID. In 7+1 RAID 5 groups, there will be 5.6TB net capacity, and 4.8TB in 3+1 RG across 4 units. The goal is 4GB/s per controller because the SAS infrastructure is still 6Gbps, 
supporting 2.2GB/s on each x4 port. With 16 SSDs per controller, each SSD needs to support 250MB/s. Most of the recent enterprise class SSDs are rated for well over 300MB/s per unit, 
allowing for a large degree of excess capability. 
Another option is to configure 12 SSDs per controller, expecting each SSD to support 333MB/s.
&lt;/p&gt;

&lt;p&gt;
The cost structure for the above is as follows:&lt;br&gt;
&amp;nbsp; RAID controller $1K &lt;br&gt;
&amp;nbsp; 2U Enclosure $3K &lt;br&gt;
&amp;nbsp; Intel SSD DC 3700 100GB SSD $235 x&amp;nbsp;16 = $3760, RAID 5 7+1: 1.6TB&lt;br&gt;
&amp;nbsp; Seagate Savvio 600GB 10K HDD $400
x 8 = $3200.&lt;/p&gt;

&lt;p&gt;
This works out to $11K per unit or $44K for the set of 4. 
The set of 16 x 100GB contributes $3760. For the 800GB SSD, the R5 7+1 capacity is 44.8TB at cost $148K. &lt;/p&gt;

&lt;p&gt;At maximum expansion of 4 enclosures per  RAID controller, capacity is 170TB at cost is $590K. Of course at this level, I would elect a system with more PCI-E slots for greater IO bandwidth. Another option is a RAID controller with 4 x4 SAS ports. 
Unfortunately none of these have 4 external ports.
&lt;/p&gt;

&lt;p&gt;
While the Intel SSD DC 3700 drew reviews for pioneering consistency of IO performance 
instead over peak performance, it is only available in SATA interface.
Micron &lt;strike&gt;Crucial&lt;/strike&gt; has &lt;strike&gt;announced&lt;/strike&gt; the P410m with similar specifications but with SAS interface. This is listed on the Micron website as in production, but probably only to&amp;nbsp;OEM customers.&amp;nbsp;There are other enterprise grade high endureance MLC SSDs with SAS interface as well.
&lt;/p&gt;

&lt;p&gt;
Note: I do not recommend anything less than 10K HDD even to support database backups. 
The 10K HDDs are not particularly expensive as direct-attach components ($400 for the 600GB model). Only SAN vendors sell $400 HDDs for $2K or more. 
&lt;/p&gt;
&lt;h4&gt;SAS 12Gbps Enclosures&lt;/h4&gt;
&lt;p&gt;
Disk enclosures supporting SAS at 12Gbps might become available as early as this year. 
Each of the 12Gbps SAS x4 uplink and down link ports would then support 4GB/s. 
The RAID controller (HBA) can support 6GB/s+ in a PCI-E gen 3 x8.  
The system with 4 RAID controllers could then deliver 24GB/s instead of 16GB/s. 
At 16 SSDs per controller, this would require 400MB/s per SSD. While SSDs are rated as high as 550MB/s, achieving the full aggregate bandwidth in an array&amp;nbsp;is not necessarily practical. So&amp;nbsp;400MB/s per SSD in an array is a more reasonable expectation. Also, enterprise SAS SSDs
many only be rated to 400MB/s.&lt;/p&gt;

&lt;p&gt;
We should not need 12Gbps SAS SSDs or HDDs in the near future (but 8 NAND channels&amp;nbsp;is a good match for a 1.1GB/s interface).&amp;nbsp;The internal wires in the enclosure connect through a SAS expander. 
The IO from each device bay can signal at 6Gbps, then uplink to the HBA at 12Gbps, 
assuming that packets are buffered on the expander.
&lt;/p&gt;

&lt;p&gt;&lt;img width="600" height="520" alt="PCIe" src="http://www.qdpma.com/Storage_files/Enclosure_b.png"&gt;&lt;br&gt;&lt;/p&gt;

&lt;p&gt;
The standard 2U disk enclosure today supports 24 or 25 2.5in (SFF) bays, with 15mm thickness. 
This is the dimension of an enterprise class 10K or 15K HDD with up to 3 platters. 
The older full size notebook used a 9mm HDD supporting 2 platters. 
Thin notebooks used a 7mm HDD restricted to a single platter.  
There is no particular reason for an SSD to be more than 7mm. 
&lt;/p&gt;

&lt;p&gt;
 It would be better if the new 12Gbps SAS enclosures support more than 24 bays. 
My preference is  for 16 x 15mm and 16 x 7mm bays. The key&amp;nbsp;is to match the practically realizable aggregate bandwidth of SSDs to the uplink with some&amp;nbsp;degree of excess.
Personally, I would like to discard the SSD case to further reduce thickness.
&lt;/p&gt;

&lt;p&gt;&lt;img width="600" height="221" alt="PCIe" src="http://www.qdpma.com/Storage_files/Enclosure.png"&gt;&lt;br&gt;&lt;/p&gt;

&lt;p&gt;
Another option is to employ the NGFF, perhaps a 1U stick, at 5mm or less. 
There could be 2 rows of 24 for SSD, and the 16 x 15mm bays.
&lt;/p&gt;

&lt;p&gt;
I believe that the all-SSD idea is misguided. SSDs are wonderful, but HDD still have an important role. One example is having the HDDs available for backup and restores. 
I want local HDD for backups because so very few people know how to configure for multiple parallel 10GbE network transmission, not to mention IO bandwidth on the backup system.
&lt;/p&gt;

&lt;p&gt;
A database backup that has not been actually verified to restore (with recovery) is a potentially useless backup. 
Having HDDs for backup and restore verification preserves the write endurance on the SSD. 
This allows the use of high-endurance MLC instead of SLC. 
In some cases, it might even be possible to use consumer grade MLC if and only if&amp;nbsp;the database organization maintenance strategy&amp;nbsp;is architected to minimize wear on the SSD.
&lt;/p&gt;
&lt;h4&gt;PCI-E SSD&lt;/h4&gt;
&lt;p&gt;
Some of the discussion on PCI-E versus SATA/SAS interface for the NAND/Flash controller 
incorrectly focuses on the bandwidth of a single 6Gbps lane versus 4 or 8 lanes on PCI-E. 
It is correct that PCI-E was designed to distribute traffic over multiple lanes 
and that hard drives were never expected to exceed to bandwidth of a single lane 
at the contemporary SATA/SAS signaling rate. 
The transmission delay across an extra silicon trip, 
from NAND controller with SATA interface to a SATA to PCI-E bridge chip, 
on the order of 50ns, 
is inconsequential compare this with the 25-50µsec access time of NAND.
&lt;/p&gt;

&lt;p&gt;
The more relevant matter is matching NAND bandwidth to the upstream bandwidth. 
All (or almost all?) the SATA interface flash controllers have 8 NAND channels. 
Back when SATA was 3Gbps and NAND was 40-50MB/s, 8 channel s to the 280MB/s net bandwidth of SATA 3G was a good match. 
About the time SATA moved to 6Gbps, NAND at 100-133MB/s became available so 8 channels was still a good choice.
&lt;/p&gt;

&lt;p&gt;&lt;img alt="PCIe" src="http://www.qdpma.com/Storage_files/SSD_SATA.png"&gt;&lt;br&gt;NAND is now at 200 and 333MB/s, while SATA is still 6Gpbs. 
The nature of silicon product cost structure is such that there is only minor cost reduction in building a 4 channel flash controller. 
The 8 channel controller only requires 256-pin package.
&lt;/p&gt;

&lt;p&gt;
The PCI-E flash controllers have been designed with 32 NAND channels. 
The IDT 32-channel controller has 1517 pins, 
which is not excessively difficult or expensive for a high-end server product. 
Note the Intel Xeon processors are 2011-pins. 
As noted earlier a PCI-E gen 3 x8 port supports 6.4GB/s. 
Over 32 channels, each channel needs to provide 200MB/s. 
The new 333MB/s NAND is probably a better fit to sustain the full PCI-E gen 3 x8 bandwidth after RAID (now RAIN because disks are replaced by NAND).
&lt;/p&gt;

&lt;p&gt;&lt;img alt="PCIe" src="http://www.qdpma.com/Storage_files/SSD_PCIE.png"&gt;
&lt;/p&gt;
&lt;p&gt;
Based on 64Gbit die, and 8 die per package, a package has 64GB raw capacity. The 32-channel PCI-E with 1 package per channel would have 2TB raw capacity (net capacity with 0.78 for over-provisioning and 0.875 for RAIN would be 1400GB) versus 512GB on an 8-channel SATA/SAS SSD.
The IDT document states capacity is 4TB raw for their 32-channel controllers, so perhaps it allows 2 packages per channel? The Micron datasheet mentions 32-channel and 64 placements.
&lt;/p&gt;

&lt;p&gt;
As is today, a PCI-E SSD can achieve maximum bandwidth at lower NAND capacity and in a more compact form factor than with SAS SSD.
On the other hand, SAS infrastructure provides flexible expansion. 
Capacity can be increased without replacing existing devices. 
Some systems support hot swap PCI-E slots. 
However the orientation of the connector in the system chassis makes this a complicated matter. 
The implications are that PCI-E slot SSDs are highly suitable for high density requirements with limited expansion needs. One database server example is tempdb on SSD.
&lt;/p&gt;
&lt;h4&gt;NVM Express&lt;/h4&gt;
&lt;p&gt;
The new generation of PCI-E SSDs may employ the NVMe interface standard. 
There is a standard driver for Windows and other operating systems, 
which will later be incorporated into to the OS distribution media 
allowing boot from an NVMe device, as with SATA devices today. 
This is mostly a client side feature.
&lt;/p&gt;

&lt;p&gt;
For the server side, the NVMe driver is designed for massive bandwidth and IOPS. 
There can be up to 64K queues, 64K commands (outstanding IO requests?). 
The driver is designed for IO to be both super-efficient in cpu-cycles 
and scalable on NUMA systems with very many processor cores. 
&lt;/p&gt;

&lt;p&gt;&lt;img width="600" height="228" alt="EMC VNX" src="http://www.qdpma.com/Storage_files/NVME_ScalableQueuingInterface.png"&gt;&lt;br&gt;&lt;/p&gt;

&lt;h4&gt;Express Bay&lt;/h4&gt;
&lt;p&gt;
To promote the growth of SSD without betting on which interface, 
the Express Bay standard defines a connector that can support both PCI-E and SATA or SAS. 
Some Dell servers today support PCI-E to SSDs in the 2.5in HDD form factor (SFF), 
but I am not sure if this is Express Bay. 
This form factor will allow PCI-E devices to hot-swapped with the same ease as SAS devices today.
&lt;/p&gt;

&lt;p&gt;&lt;img alt="PCIe" src="http://www.qdpma.com/Storage_files/ExpressBay.png"&gt;&lt;br&gt;&lt;/p&gt;

&lt;h4&gt;PCI-E Switches&lt;/h4&gt;
&lt;p&gt;
As mentioned earlier, the PCI-E slot arrangement in server systems does not facilitate hot-add,
even if it is supported.
Existing PCI-E SSDs also do not provide a mechanism for capacity expansion, 
aside from adding a card to an empty slot or replacing an existing card.
&lt;/p&gt;

&lt;p&gt;
Of course, there are PCI-E switches, just like the SAS expanders.
A 64 lane PCI-E switch could connect 5 x8 PCI-E devices over a x8 upstream link.
Other possibilities is a x16 link supporting 12.8GB/s to host with 4 ports for SSD,
or 8 x4 ports to SSD for finer grain expansion.
It may also be possible to support multiple hosts, as in a cluster storage arrangement?
&lt;/p&gt;

&lt;p&gt;&lt;img alt="PCIe" src="http://www.qdpma.com/Storage_files/PCI-E_switch.png"&gt;&lt;br&gt;&lt;/p&gt;

&lt;h4&gt;SAN Configuration&lt;/h4&gt;
&lt;p&gt;
Below is a representation of a typical configuration sold to customers by the SAN vendor. 
I am not joking in that it is common to find 2 ports FC or FCOE on each host. 
The most astounding case was a SAN with 240 x 15K disks and 2 single port FC HBAs in the server. 
Even though the storage system service processors had 4 FC port each (and the FC switches had 48-ports), only 1 on each SP was connected. 
Obviously the storage engineer understood single component and path failure fault-tolerant design. It was just too bad he built a fault-tolerant garden hose system when a fire hose was needed.
&lt;/p&gt;

&lt;p&gt;&lt;img width="465" height="558" alt="SAN_Configuration1c" src="http://www.qdpma.com/Storage_files/SAN_Configuration1c.png"&gt;&lt;br&gt;&lt;/p&gt;

&lt;p&gt;
As I understand it, what happened was the SAN engineer asked how much space is needed for the databases accounting for growth, and then created one volume for it. 
The Windows OS does support multi-path IO. 
Originally the storage vendor provided the MPIO driver, but now it is managed by Microsoft. 
Apparently it was not considered that even with MPIO, all IO for a single volume has a primary path. The secondary path is only used when the primary is not available. 
&lt;/p&gt;
&lt;h4&gt;High-bandwidth SAN Configuration&lt;/h4&gt;
&lt;p&gt;
A proper SAN configuration for both OLTP and DW database servers is shown below. 
Traditionally, a transaction processing database generates small block&amp;nbsp;random IO (2KB in the old days, 8KB since SQL Server 7). As it&amp;nbsp;was difficult to get 10K IOPS (x8KB = 80MB/s),&amp;nbsp;it was thought that&amp;nbsp;IO bandwidth was not a requirement. 
This was 15 years ago. Apparently the SAN vendors read documents from this period, but not more recently, hence their tunnel vision on IOPS, ignoring bandwidth. &lt;/p&gt;

&lt;p&gt;For the last 10 or more years, people have been running large queries on the OLTP system. I have noticed report queries that saturate the storage&amp;nbsp;IO channels could essentially shutdown&amp;nbsp;transaction processing. This is because the report query generates asynchronous IO at high queue depth, while the transaction queries issue synchronous IO at queue depth 1.&amp;nbsp;And the report may escalate to a table lock (or it may use nolock).&amp;nbsp;Furthermore, it is desirable to be able to backup and restore the transaction database quickly. This means bandwidth. 
&lt;/p&gt;

&lt;p&gt;
Note that the system below shows 8Gbps FC, not 10Gbps FCOE. 
A single 10Gbps FCOE may have more bandwidth than a single 8Gbps FC port. 
But no serious storage system will have less than 4 or even 8 ports. 
Apparently FCOE currently does not scale well over multiple ports, due to the overhead of handling Ethernet packets? 
An Intel IDF 2012 topic mentions that this will be solved in the next generation.&lt;/p&gt;

&lt;p&gt;&lt;img width="428" height="635" alt="SAN_Configuration2e" src="http://www.qdpma.com/Storage_files/SAN_Configuration2e.png"&gt;&lt;br&gt;&lt;/p&gt;

&lt;p&gt;
The above diagram shows 8 x 8Gbps FC ports between host and storage system.
Each 8Gbps FC port can support 700MB/s for a system IO bandwidth target of 5.6GB/s.
An OLTP system that handles&amp;nbsp;very high transaction volume may benefit from a dedicated HBA and  
FC ports for log traffic. This would allow the log HBA to be configured for low latency,
and the data HBA to be configured with interrupt moderation and high throughput.
&lt;/p&gt;

&lt;p&gt;
An alternate SAN configuration for SQL Server 2012 is shown below with local SSD for tempdb.
&lt;/p&gt;

&lt;p&gt;&lt;img width="485" height="633" alt="SAN_Configuration3b" src="http://www.qdpma.com/Storage_files/SAN_Configuration3b.png"&gt;&lt;br&gt;&lt;/p&gt;

&lt;p&gt;
The write cache on a SAN must be mirrored for fault tolerance. 
There is very little detail on the bandwidth capability of the link between controllers 
(or SP) on SAN systems, beyond what can be deduced from the fact that the sustained write bandwidth is much lower than the read bandwidth.
So keeping tempdb off the SAN should preserve IO write bandwidth for traffic that
actually needs protection.
&lt;/p&gt;


&lt;p&gt;
The number of volumes for data and temp should be some multiple of 8.
It would be nice to have 1 volume for each FC path.
However we do need to consider how SQL Server place extents over multiple files.
This favors RAID groups of 4 disks.
&lt;/p&gt;
&lt;h4&gt;File Layout&lt;/h4&gt;
&lt;p&gt;
In a HDD storage system, 
the objective for bandwidth is to simultaneously issue large block IO
to all data disks across all IO channels.
A 256K block could be sufficiently large to generate 100MB/s per disk (400 IOPS, not random).
If this were issued at low queue depth (2?), 
then the storage system would not only generate high IO bandwidth
and still be perceptually responsive to other requests for small block IO.
&lt;/p&gt;

&lt;p&gt;
For small block random IO, it is only necessary to distribute IO over all hard disks with reasonable uniformity.
&lt;/p&gt;

&lt;p&gt;
The file layout strategy has two objectives. 
One is to not overwhelm any single IO channel.
In direct-attach, this is not a proper as the smallest pipe is x4 SAS for 2GB/s.
In a SAN, even using 8Gbps FC, this is a concern as 8Gb FC can support only 700-760MB/s.
Although 10Gb FCoE seems to have higher bandwidth, this may not scale with the number of channels
as well as straight FC.
The new Intel Xeon E5 (Sandy-Bridge EP) processors may be able to scale 10Gb FCoE 
with Data Direct IO (DDIO) - but this needs to be verified.
&lt;/p&gt;

&lt;p&gt;
The second is to ensure IO goes to every disk in the RAID Group (or volume).
By default, SQL Server allocates a single 64K extent from each file
before round-robin allocating from the next file.
This might be the reason that many SAN systems generate only 10MB/s per disk (150 IOPS at 64K),
along with no read-ahead.
&lt;/p&gt;

&lt;p&gt;&lt;img width="463" height="496" alt="FileLayout_1" src="http://www.qdpma.com/Storage_files/FileLayout_1.png"&gt;&lt;br&gt;&lt;/p&gt;

&lt;p&gt;
The -E startup flag instructs SQL to allocate up to 4 consecutive extents
before proceeding to the next file.
See James Rowland-Jones 
&lt;a href="http://consultingblogs.emc.com/jamesrowlandjones/archive/2010/04/25/focus-on-fast-track-understanding-the-e-startup-parameter.aspx" target="blank"&gt;
Focus on Fast Track : Understanding the –E Startup Parameter&lt;/a&gt;
for more on this.
In a 4-disk RAID group with stripe-size 64K, a 256K IO to the file
would generate a 64K IO to each disk.
&lt;/p&gt;

&lt;p&gt;&lt;img width="463" height="496" alt="FileLayout_2" src="http://www.qdpma.com/Storage_files/FileLayout_2.png"&gt;&lt;br&gt;&lt;/p&gt;

&lt;p&gt;
It would be necessary to rebuild indexes before this scheme takes effect.
Somewhere it was mentioned that it is important to build indexes with max degree of parallelism 
limited to either 4 or 8. It might be in the 
&lt;a href="http://msdn.microsoft.com/en-us/library/hh918452.aspx" target="blank"&gt;
Microsoft Fast Track Data Warehouse Reference Architecture&lt;/a&gt;.
Start with version 4 for SQL Server 2012, and work backwards?
&lt;/p&gt;

&lt;p&gt;
Trace Flag 1117 (-T1117) causes all files in a filegroup to grow together.
See
&lt;a href="http://blogs.technet.com/technet_blog_images/b/sql_server_sizing_ha_and_performance_hints/archive/2012/02/09/sql-server-2008-trace-flag-t-1117.aspx" target="blank"&gt;
SQL Server 2008 Trace Flag -T 1117&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
With SSD, the second may not be important as the SQL Server read-ahead strategy
(1024 pages?) should generate IO to all units.
On the hard disk, generate close-to-sequential IO was important.
On SSD, it is sufficient beneficial just to generate large block IO, with 64K being large.
&lt;/p&gt;
&lt;h4&gt;Summary&lt;/h4&gt;
&lt;p&gt;
The old concept of distributing IO over both devices and channels still apply.
The recent pricing of SSD is sufficiently low to warrant serious consideration ($2-3K/TB eMLC).
While there is more flexibility in SSD configuration, 
it is still necessary to validate performance characteristics
with real queries to an actual database. 
SQLIO or other synthetic tests are not sufficient.
If the SAN vendor advised in the configuration, 
then chances are IO bandwidth will be not be good.
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Addendum&lt;/strong&gt;&lt;br&gt;
If anyone thinks I am being unfair to or&amp;nbsp;overly critical of SAN vendors, do the following test.&lt;br&gt;
Find the biggest table in your database, excluding LOB fields.&lt;br&gt;
Run this:&lt;br&gt;
&lt;font color="#0000ff" face="Consolas"&gt;DBCC &lt;/font&gt;
&lt;font color="#008080" face="Consolas"&gt;DROPCLEANBUFFERS&lt;/font&gt;&lt;br&gt;
&lt;font color="#0000ff" face="Consolas"&gt;GO&lt;/font&gt;&lt;br&gt;
&lt;font color="#0000ff" face="Consolas"&gt;SET STATISTICS&lt;/font&gt;
&lt;font color="#0000ff" face="Consolas"&gt;IO ON&lt;/font&gt; &lt;br&gt;
&lt;font color="#0000ff" face="Consolas"&gt;SET STATISTICS&lt;/font&gt;
&lt;font color="#0000ff" face="Consolas"&gt;TIME ON&lt;/font&gt;&lt;br&gt;
&lt;font color="#0000ff" face="Consolas"&gt;GO&lt;/font&gt; &lt;br&gt;
&lt;font color="#0000ff" face="Consolas"&gt;SELECT&lt;/font&gt;
&lt;font color="#ff00ff" face="Consolas"&gt;COUNT&lt;/font&gt;&lt;font color="#808080" face="Consolas"&gt;(*)&lt;/font&gt;
&lt;font color="#0000ff" face="Consolas"&gt;FROM&lt;/font&gt;
&lt;font color="#008080" face="Consolas"&gt;TableA&lt;/font&gt;
&lt;font color="#0000ff" face="Consolas"&gt;WITH&lt;/font&gt;
&lt;font color="#808080" face="Consolas"&gt;(&lt;/font&gt;&lt;font color="#0000ff" face="Consolas"&gt;INDEX&lt;/font&gt;&lt;font color="#808080" face="Consolas"&gt;(&lt;/font&gt;&lt;font face="Consolas"&gt;0&lt;/font&gt;&lt;font color="#808080" face="Consolas"&gt;))&lt;/font&gt; &lt;br&gt;
&lt;font color="#0000ff" face="Consolas"&gt;GO&lt;/font&gt;
&lt;/p&gt;

&lt;p&gt;
Then compute 8 (KB/page) * (physical reads + read-ahead reads)/(elapsed time in ms)&lt;br&gt;Is this closer to 700 MB/s or 4GB/s? What did your SAN vendor tell you?&lt;/p&gt;

&lt;p&gt;I am also not fan of SSD caching or auto-tiering on critical databases, meaning the database that runs your business, that is managed by one or more full-time DBAs. In other applications, there may not be a way to segregate the placement of hot data differently from inactive data. In SQL Server, there are filegroups and partitioning. We have all the necessary means of isolating and placing hot data whereever we want it. SSD caching or auto-tiering will probably require SLC NAND. With active management using database controls, we should be able to use HET MLC or&amp;nbsp;even MLC.&amp;nbsp;&lt;/p&gt;

&lt;p&gt;I stress the importance of analyzing the complete system and how it will be used instead of over-focusing on the components. There are criteria that might be of interest when there is only a single device or even single HBA. Today it is possible to over-configure the storage performance without unwarranted expense, and&amp;nbsp;this is best accomplished by watching the big picture.&lt;/p&gt;

&lt;p&gt;Adaptec reports that their Series 7 SAS RAID Controller&amp;nbsp;(72405 - PCI-E gen 3 x8 on the upstream side and 6 x4 SAS 6Gpbs) using the PMC&amp;nbsp;PM8015 controller can do&amp;nbsp;500K IOPS and 6.6GB/s.&lt;/p&gt;

&lt;p&gt;I will keep this topic up to date on &lt;a href="http://www.qdpma.com"&gt;www.qdpma.com&lt;/a&gt; 
&lt;a href="http://www.qdpma.com/Storage/Storage2013.html" target="blank"&gt;Storage 2013&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;related posts on storage:&lt;br&gt;
&lt;a href="http://sqlblog.com/blogs/joe_chang/archive/2010/10/18/io-queue-depth-strategy.aspx"&gt;io-queue-depth-strategy&lt;/a&gt; (2010-08)
&lt;br&gt;
&lt;a href="http://sqlblog.com/blogs/joe_chang/archive/2010/03/23/data-log-and-temp-file-placement.aspx"&gt;data-log-and-temp-file-placement&lt;/a&gt; (2010-03)
&lt;br&gt;
&lt;a href="http://sqlblog.com/blogs/joe_chang/archive/2008/09/04/io-cost-structure-preparing-for-ssd-arrays.aspx"&gt;io-cost-structure-preparing-for-ssd-arrays&lt;/a&gt; (2008-09)
&lt;br&gt;
&lt;a href="http://sqlblog.com/blogs/joe_chang/archive/2008/03/04/storage-performance-for-sql-server.aspx"&gt;storage-performance-for-sql-server&lt;/a&gt; (2008-03)
&lt;/p&gt;

&lt;p&gt;
ps,&lt;br&gt;
If you are using SQL Server 2012 clustering on a SAN, 
I do suggest placing tempdb on local SSD, making use of the new 2012 feature 
that does not require tempdb to be on shared storage.
Keep in mind on the SAN, writes must be mirrored between two storage processors for fault recovery, and this is not a cheap thing to do.
We should plan redo whatever was using tempdb at the time.
&lt;/p&gt;</description></item><item><title>The Zombie PerfMon Counter That Never Dies! Quick Tip</title><link>http://sqlblog.com/blogs/kevin_kline/archive/2012/10/08/the-zombie-perfmon-counter-that-never-dies-quick-tip.aspx</link><pubDate>Mon, 08 Oct 2012 11:55:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:45480</guid><dc:creator>KKline</dc:creator><description>&lt;h2 style="font-family:Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;line-height:19px;"&gt;&lt;/h2&gt;&lt;h2&gt;The PerfMon Counters That Just Won't Die&lt;/h2&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;&lt;br&gt;&lt;/div&gt;&lt;img class="alignright size-medium wp-image-2093" title="zombie-baby1" width="300" height="296" style="border:1px solid black;cursor:default;float:right;font-size:13px;font-weight:normal;margin:2px;" src="http://kevinekline.com/wp-content/uploads/2012/10/zombie-baby1-300x296.jpg"&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;One of the things that's simultaneously great and horrible about the Internet is that once something gets posted out in the ether, it basically never goes away. &amp;nbsp;(Some day, politicians will realize this. &amp;nbsp;We can easily fact check their consistency). &amp;nbsp;Because of longevity of content posted to the Internet, a lot of performance tuning topics become "zombies". &amp;nbsp;We shoot 'em in dead, but they keep coming back!&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;&lt;br&gt;&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;In other words, those old recommendations&amp;nbsp;&lt;em&gt;were&amp;nbsp;&lt;/em&gt;a suggested best practices for long ago, for a specific version of SQL Server, but are now inappropriately for the newer version. &amp;nbsp;It's not uncommon for me, when speaking at a conference, to encounter someone who's still clinging to settings and techniques which haven't been good practice since the days of SQL Server 2000. &amp;nbsp;Here's an example of&amp;nbsp;&lt;a href="http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlops6.mspx"&gt;Microsoft SQL Server 2000 Best Practices that are very version-specific&lt;/a&gt;.&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;&lt;br&gt;&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;So here's an example. &amp;nbsp;The %Disk Time counter and the Disk Queue Length were heavily recommended as a key performance indicator for IO performance. &amp;nbsp;SQL Server throws a lot of IO at the disks using scatter/gather to maximize the utilization of the disk-based IO subsystem. &amp;nbsp;This approach leads to short bursts of long queue depths during checkpoints and readaheads for an instance of SQL Server. &amp;nbsp;Sometimes the server workload is such that your disk can't keep up with the IO shoved at it and when that happens, you'll see long queue lengths too.&amp;nbsp; The short burst scenario isn't a problem. &amp;nbsp;The lengthening queue length scenario usually is a problem. &amp;nbsp;&amp;nbsp;So is that a good practice?&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;&lt;br&gt;&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;&lt;strong&gt;In a word, not-so-much.&lt;/strong&gt;&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;&lt;br&gt;&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;Those counters can still be of some use on an instance of SQL Server which only has one hard disk drive. &amp;nbsp;But that's&amp;nbsp;&lt;em&gt;exceedingly&lt;/em&gt;&amp;nbsp;rare these days. &amp;nbsp;Why?&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;&lt;br&gt;&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;The PerfMon counter %Disk time is a bogus performance metric for several reasons. &amp;nbsp;It does not take into account&amp;nbsp;asynchronous&amp;nbsp;I/O requests. &amp;nbsp;It can't tell what the real performance profile is for an underlying&amp;nbsp;&amp;nbsp;RAID set may be, since they contain multiple disk drives. &amp;nbsp;The PerfMon counter Disk Queue Length is also mostly useless, except on SQL Server's with a single physical disk, because the hard disk controller cache obfuscates how many IO operations are actually pending on the queue or not. &amp;nbsp;In fact, some hard disks even have tiny write caches as well, which further muddies the water was to whether the IO is truly queued, in a cache somewhere between the OS and the disk, or has finally made it all the way to the&amp;nbsp;&lt;a href="http://en.wikipedia.org/wiki/Cmos"&gt;CMOS&lt;/a&gt;&amp;nbsp;on the disk.&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;&lt;br&gt;&lt;/div&gt;&lt;h2&gt;Better IO PerfMon Counters&lt;/h2&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;&lt;br&gt;&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;Instead of using those PerfMon counters, use the Ave Disk Reads /sec, Avg Disk Write /sec, and Avg Disk &amp;nbsp;Transfers/sec&amp;nbsp;to track the performance of disk subsystems. &amp;nbsp;These counters track the average number of read IOs, write IOs, and combined read and write IOs to occured in the last second. &amp;nbsp;Occassionally, I like to track the same metrics by volume of data rather than the rate of IO operations. &amp;nbsp;So, to get that data, you may wish to give these volume-specific PerfMon counters a try:&amp;nbsp;Avg Disk &amp;nbsp;Transfer Bytes/sec, Ave Disk Read Bytes /sec, and Avg Disk Write Bytes/sec&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;&lt;br&gt;&lt;/div&gt;&lt;h2&gt;For SQL Server IO Performance, Use Dynamic Management Views (DMV)&lt;/h2&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;&lt;br&gt;&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;And unless you've been living in a cave, you should make sure to use SQL Server's Dynamic Management Views (DMVs) to check on IO performance for recent versions of SQL Server. &amp;nbsp;Some of my favorite DMV's for IO include:&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;&lt;ul&gt;&lt;li&gt;Sys.dm_os_wait_stats&lt;/li&gt;&lt;li&gt;Sys.dm_os_waiting_tasks&lt;/li&gt;&lt;li&gt;Sys.dm_os_performance_counters&lt;/li&gt;&lt;li&gt;Sys.dm_io_virtual_file_stats&lt;/li&gt;&lt;li&gt;Sys.dm_io_pending_io_requests&lt;/li&gt;&lt;li&gt;Sys.dm_db_index_operational_stats&lt;/li&gt;&lt;li&gt;Sys.dm_db_index_usage_stats&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;Many of these DMVs are fully document in this Books Online article here at&amp;nbsp;&lt;a href="http://msdn.microsoft.com/en-us/library/ms187974.aspx"&gt;Microsoft SQL Server 2012&amp;nbsp;Index Related Dynamic Management Views and Functions&lt;/a&gt;.&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;&lt;br&gt;&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;So how are you tracking IO performance metrics? &amp;nbsp;Which ones are you using?&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;&lt;br&gt;&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;I look forward to hearing back from you!&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;&lt;br&gt;&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;Enjoy,&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;&lt;br&gt;&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;-Kev&lt;/div&gt;&lt;div style="font-size:13px;font-weight:normal;"&gt;&lt;p&gt;-&lt;a href="http://twitter.com/kekline"&gt;Follow me on Twitter!&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;div&gt;&lt;br&gt;&lt;/div&gt;&lt;/div&gt;</description></item><item><title>Intel Xeon E5 (Sandy Bridge-EP) and SQL Server 2012 Benchmarks</title><link>http://sqlblog.com/blogs/joe_chang/archive/2012/03/07/intel-xeon-e5-sandy-bridge-ep-and-sql-server-2012-benchmarks.aspx</link><pubDate>Wed, 07 Mar 2012 23:53:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:42176</guid><dc:creator>jchang</dc:creator><description>&lt;p&gt;
Intel officially announced the Xeon E5 2600 series processor based on Sandy Bridge-EP&amp;nbsp;variant with
upto 8 cores and 20MB LLC per socket.
Only one TPC benchmark accompanied product launch, summary below.
&lt;br&gt;&lt;/p&gt;
&lt;table cellSpacing="1" cellPadding="3"&gt;

&lt;tr align="center"&gt;&lt;th&gt;Processors&lt;/th&gt;&lt;th&gt;Cores per&lt;/th&gt;&lt;th&gt;Frequency&lt;/th&gt;&lt;th&gt;Memory&lt;/th&gt;&lt;th&gt;SQL&lt;/th&gt;&lt;th&gt;Vendor&lt;/th&gt;&lt;th&gt;TPC-E&lt;/th&gt;&lt;/tr&gt;
&lt;tr align="center"&gt;&lt;td&gt;2 x Xeon E5-2690&lt;/td&gt;&lt;td&gt;8&lt;/td&gt;&lt;td&gt;2.9GHz&lt;/td&gt;&lt;td&gt;512GB (16x32GB)&lt;/td&gt;&lt;td&gt;2012&lt;/td&gt;&lt;td&gt;IBM&lt;/td&gt;&lt;td&gt;1,863.23&lt;/td&gt;&lt;/tr&gt;
&lt;tr align="center"&gt;&lt;td&gt;2 x Xeon E7-2870&lt;/td&gt;&lt;td&gt;10&lt;/td&gt;&lt;td&gt;2.4GHz&lt;/td&gt;&lt;td&gt;512GB (32x16GB)&lt;/td&gt;&lt;td&gt;2008R2&lt;/td&gt;&lt;td&gt;IBM&lt;/td&gt;&lt;td&gt;1,560.70&lt;/td&gt;&lt;/tr&gt;
&lt;tr align="center"&gt;&lt;td&gt;2 x Xeon X5690&lt;/td&gt;&lt;td&gt;6&lt;/td&gt;&lt;td&gt;3.46GHz&lt;/td&gt;&lt;td&gt;192GB (12x16GB)&lt;/td&gt;&lt;td&gt;2008R2&lt;/td&gt;&lt;td&gt;HP&lt;/td&gt;&lt;td&gt;1,284.14&lt;/td&gt;&lt;/tr&gt;

&lt;/table&gt;

&lt;p&gt;Note: the HP report lists SQL Server 2008 R2 Enterprise Edition licenses at $23,370 per socket.&lt;br&gt;The first IBM report lists SQL Server 2012 Enterprise Edition licenses at $13,473 per pair of cores(?) or $53,892 per socket. All results used SSD storage.&amp;nbsp;The IBM E7 result used eMLC SSDs, the IBM E5 results showed more expensive SSDs, but did not explicitly say SLC?.&lt;/p&gt;&lt;p&gt;The Xeon E5 superceeds 2-socket systems based on both the Xeon 5600 (Westmere-EP) and Xeon E7 (Westmere-EX).
It is evident that Sandy Bridge improves performance over Westmere at both the socket and core levels and also on a GHz basis.
&lt;/p&gt;

&lt;table cellSpacing="1" cellPadding="3"&gt;

&lt;tr align="center"&gt;&lt;th&gt;Architecture&lt;/th&gt;&lt;th&gt;Total Cores&lt;/th&gt;&lt;th&gt;Frequency&lt;/th&gt;&lt;th&gt;Core-GHz&lt;/th&gt;&lt;th&gt;TPC-E&lt;/th&gt;&lt;th&gt;tps-E per core-GHz&lt;/th&gt;&lt;/tr&gt;
&lt;tr align="center"&gt;&lt;td&gt;Sandy Bridge-EP&lt;/td&gt;&lt;td&gt;2 x 8 = 16&lt;/td&gt;&lt;td&gt;2.9GHz&lt;/td&gt;&lt;td&gt;46.4&lt;/td&gt;&lt;td&gt;1,863.23&lt;/td&gt;&lt;td&gt;40.16&lt;/td&gt;&lt;/tr&gt;
&lt;tr align="center"&gt;&lt;td&gt;Westmere-EX&lt;/td&gt;&lt;td&gt;2 x 10 = 20&lt;/td&gt;&lt;td&gt;2.4GHz&lt;/td&gt;&lt;td&gt;48.0&lt;/td&gt;&lt;td&gt;1,560.70&lt;/td&gt;&lt;td&gt;32.51&lt;/td&gt;&lt;/tr&gt;
&lt;tr align="center"&gt;&lt;td&gt;Westmere-EP&lt;/td&gt;&lt;td&gt;2 x 6 = 12&lt;/td&gt;&lt;td&gt;3.46GHz&lt;/td&gt;&lt;td&gt;41.52&lt;/td&gt;&lt;td&gt;1,284.14&lt;/td&gt;&lt;td&gt;30.93&lt;/td&gt;&lt;/tr&gt;

&lt;/table&gt;

&lt;p&gt;One advantage of the Xeon E7 (Westmere-EX) system is that the memory expanders&amp;nbsp;support&amp;nbsp;for 4 DIMMs per channel or 16 DIMMs per socket (4 memory channels). However, a two-socket Sandy Bridge-EP&amp;nbsp;system supports 256GB with 16 (8 per socket) of the lower price (per GB)&amp;nbsp;16GB DIMMs. And really, 256GB is more than enough for most situations, so it is quite reasonable to not burden outlier configuration requirements on the large majority.&lt;/p&gt;&lt;p&gt;A later version of the Xeon E5 will support 4-socket systems. 
There is no explanation as to whether glue-less 8-socket systems will be supported in the future.
It was previously discussed that there would a EN variant of Sandy Bridge with 3 memory channels and fewer PCI-E lanes.
&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Hardware Strategy for SQL Server 2012 per core licensing&lt;/strong&gt;&lt;br&gt;Top frequency on the 6 core E5-2667 is 2.9GHz, the same as the 8 core (excluding the 8 core 2687W model at 3.1GHz). Top frequency for the 4 core E5-2643 and 2 core E5-2637 are 3.3 and 3.0GHz respectively. The desktop i7-2830 is 3.6GHz with 4 cores, so Intel is deliberately constraining the top frequency on 2 &amp;amp; 4 core version for the server parts, apparently to favor interest in the 8 core part.&lt;/p&gt;&lt;p&gt;Given the SQL Server 2012 per core licensing, there should be interest in a system with fewer cores per socket running at higher frequency, while taking advantage of the high memory and IO bandwith of the E5 system. Consider also that SQL Server write operations (Insert, Update, Delete, the final stage of index builds) and even certain SELECT operations are not parallel (the Sequence Project operator that support the ROW_NUMBER function).&lt;/p&gt;&lt;p&gt;I think it would also make sense for Intel to allow cores to be disabled in BIOS (now UEFI) on the top of line E5-2690 like the&amp;nbsp;desktop extreme edition unlocked processors. Large corporate customers&amp;nbsp;can buy a batch of identical systems,&amp;nbsp;disabling cores that are not needed on individul systems.&amp;nbsp;&lt;/p&gt;&lt;p&gt;It would also be of value to engage a (nolonger quite so, relative to core licenses) exhorbitantly priced consultant to tune SQL Server to run on fewer cores.(Not to be construed as a solicitation for services)&lt;/p&gt;</description></item><item><title>Accelerate OLTP with HP and Microsoft's New High Performance Reference Architecture</title><link>http://sqlblog.com/blogs/kevin_kline/archive/2012/03/06/accelerate-oltp-with-hp-and-microsoft-s-new-high-performance-reference-architecture.aspx</link><pubDate>Tue, 06 Mar 2012 15:25:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:42126</guid><dc:creator>KKline</dc:creator><description>&lt;p&gt;If you haven't started to read Shashank Pawar (&lt;a title="Shashank Pawar's Blog" href="http://blogs.technet.com/b/sqlman/"&gt;blog&lt;/a&gt;), you're missing out.  Shashank is part of Microsoft Australia and has been writing some very good content lately.  Here's an example from the Reference Architecture for High Performance SQL Server:&lt;/p&gt;&lt;p style="padding-left:30px;"&gt;&lt;span&gt;HP and Microsoft engineering teams have worked together to create a reference architecture to Accelerate Online Transaction Processing (OLTP) database workloads with a fully-flash based HP/Microsoft architecture and achieve significant performance increases, simplified database manageability, and industry leading TCO.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;The details come in a torrent after that leading paragraph with lots of pretty pictures and charts to help explain.  This is great stuff, especially for competitive platforms such as Oracle Exadata. &lt;/p&gt;&lt;p&gt;Read more about the new &lt;a title="High Performance SQL Server 2012" href="http://blogs.technet.com/b/sqlman/archive/2012/02/16/reference-architecture-for-high-performance-sql-server.aspx"&gt;HP High Performance Reference Architecture for SQL Server 2012&lt;/a&gt; here.&lt;/p&gt;&lt;p&gt;And just out of curiousity, are any of you using high performance architectures such as Oracle Exadata, IBM Netezza, or Teradata?  I'd love to hear your feedback, questions, and comments.&lt;/p&gt;&lt;p&gt;Enjoy,&lt;/p&gt;&lt;p&gt;-Kev &lt;/p&gt;&lt;p&gt;-&lt;a title="Kevin Kline's Twitter Feed" href="http://twitter.com/kekline"&gt;Follow me on Twitter&lt;/a&gt;&lt;/p&gt;&lt;p&gt;-&lt;a title="Kevin Kline's Blog" href="http://KevinEKline.com"&gt;More on my KevinEKline.com&lt;/a&gt;&lt;/p&gt;</description></item><item><title>Intel Server Strategy Shift with Sandy Bridge EN &amp;amp; EP</title><link>http://sqlblog.com/blogs/joe_chang/archive/2011/11/29/intel-server-strategy-shift-with-sandy-bridge-en-ep.aspx</link><pubDate>Tue, 29 Nov 2011 19:04:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:40055</guid><dc:creator>jchang</dc:creator><description>&lt;p&gt;
The arrival of the Sandy Bridge EN and EP processors, expected in early 2012, will mark 
the completion of a significant shift in Intel server strategy.
For the longest time 1995-2009, the strategy had been to focus on producing a premium processor 
designed for 4-way systems that might also be used in 8-way systems and higher.
The objective for 2-way systems was use the desktop processor that later had a separate brand 
and different package &amp;amp; socket to leverage the low cost structure in driving volume.
The implication was that components would be constrained by desktop cost requirements.
&lt;/p&gt;
 
&lt;p&gt;
The Sandy Bridge collection will be comprised of one group for single processor systems 
designed for low cost, and one premium processor.
The premium processor will support both the EN and EP product lines,
the EN limited to 2-way, and the EP for both 2-way and 4-way systems,
with more than adequate memory and IO in each category. 
The cost structure of both 2-way and 4-way&amp;nbsp;increased&amp;nbsp;from Core 2 to Nehalem,&amp;nbsp;along with a significant boost in CPU, memory and IO capability.
With quad-core available in 1P, the more price sensitive environments should move
down to single processor systems.
This allows 2 &amp;amp; 4-way systems to&amp;nbsp;be built with&amp;nbsp;balanced&amp;nbsp;compute, memory and IO unconstrained by desktop cost requirements.
&lt;/p&gt;
 
&lt;p&gt;
In other blogs, I had commented that the default system choice for database server
for a long time had been a 4-way system should now be a 2-way since the introduction of Nehalem in mid-2009.
Default choice means in the absence of detailed technical analysis, basically a rough guess.
The Sandy Bridge EP, with 8 cores, 4 memory channels and 80 PCI-E lanes per socket
in a 2-way system provides&amp;nbsp;even stronger support for this strategy.
&lt;/p&gt;&lt;p&gt;The glue-less 8-way capability of the Nehalem and Westmere EX line is not continued.
One possibility is that 8-way systems do not need to be glue-less. 
The other is that 8-way systems are being abandoned, 
but I am inclined to think this is not the case.
&lt;/p&gt;
&lt;h4&gt;The Master Plan&lt;/h4&gt;
&lt;p&gt;
The foundation of the premium processor strategy, even though it may have been forgotten in the mists of time,
not to mention personnel turnover, was that a large cache improves scaling at the 4-way multi-processor level
for the shared bus SMP system architectures of the Intel Pentium to Xeon MP period.
The 4-way server systems typically deployed with important applications that could easily 
justify a far higher cost structure than that of desktop components, but required critical capabilities
not necessary in personal computers. 
Often systems in this category were fully configured with top line components whether needed or not.
&lt;/p&gt;
 
&lt;p&gt;
Hence the Intel large cache strategy was an ideal match between premium processors and high budget
systems for important applications. 
One aspect that people with an overly technical point of view have difficulty fathoming
is that the non-technical VP's don't want their mission critical applications running on a cheap box.
In fact, more expensive means that is must be better, and the most expensive is the best, right?
From the Intel perspective, a large premium is necessary to amortize the substantial effort necessary to 
produce even a derivative processor in volumes small relative to desktop processors.
&lt;/p&gt;
 
&lt;p&gt;
The low cost 2-way strategy was to explore demand for multi-processor systems in the desktop market.
Servers were expected to be a natural fit for 2-way systems. 
Demand for 2-way servers exploded to such an extent
that it was thought for a brief moment there would be no further interest for single processor servers.
Eventually, the situation sorted itself out, in part with the increasing power of processors.
Server unit volume settled to a 30/60/10 split between single, dual and quad processors
(this is old data, I am not sure what the split is today).
The 8-way and higher unit volume is low, 
but potentially of importance in having a complete system lineup. 
&lt;/p&gt;
 
&lt;p&gt;
AMD followed a different strategy based on the characteristics of thier platform. 
The Hyper-Transport (HT) interconnect and integrated memory controller architecture 
did not have a hard requirement for large cache to support 4-way and above.
So AMD elected to pursue a premium product strategy on the number of HT links.
Single processor systems require one HT to connect the IO hub.
Two HT is required in a 2-way system, one HT connecting to IO, and another to the second processor.
Three HT could support 4-way and higher with various connection arrangements.
The pricing structure is based on the number of HT links enabled,
on the theory that the processor has higher value in big systems than in small systems. 
&lt;/p&gt;
 
&lt;h4&gt;What Actually Happened&lt;/h4&gt;
&lt;p&gt;
Even with the low cost structure Intel enabled in 2-way, desktop systems remained 
and actually became defined as single processor.
Instead, the 2-way systems at the desk of users became the workstation category.
This might have been because the RISC/UNIX system vendors sold workstations.
The Intel workstations quickly obliterated RISC workstations,
and there have been no RISC workstations for sometime?
Only two RISC architectures are present today, having retreated to the very high-end server space, 
where Intel does not venture. 
&lt;/p&gt;
 
&lt;p&gt;
Itanium was supposed to participate in this space, but the surviving RISC vendors optimized at 8-way and higher. 
Intel would not let go of the 4-way system volume and Itanium was squeezed by Xeon at 4-way and below,
yet could not match IBM Power in high SMP scaling.
To do so would incur a high price burden on 4-way systems.
One other aspect of Intel server strategy of the time was the narrow minded focus on optimizing for a single platform.
&lt;/p&gt;
 
&lt;p&gt;
Most of the time, this was the 4-way server.
There was so much emphasis on 4-way that there actually 2 reference platforms, almost to the exclusion of all else.
For a brief period in 1998 or so, there was an incident of group hysteria that 8-way 
would become the standard high volume server.
But this phase wore off eventually.
The SPARC was perhaps the weakest of the RISC at the processor level. 
Yet the Sun strategy to design for a broad range of platforms from 2-way to 30-way, 
(then with luck 64-way via acquisition of one of the Cray spin-offs) was successful
until their processor fell too far behind.
&lt;/p&gt;
 
&lt;p&gt;
After the initial implementation of the high volume 2-way strategy,
desktop systems became intensely price sensitive.
The 2-way workstations and server system were in fact not price sensitive even though it was thought they were. 
It became clear that desktops could not incur any burden to support 2-way capability.
The desktop processor for 2-way systems was put into a different package and socket,
and was given the Xeon brand. 
&lt;/p&gt;
 
&lt;p&gt;
Other cost reduction techniques were implemented over the next several generations
as practical on timing and having the right level of maturity.
The main avenue is integration of components to reduce part count.
This freed 2-way system from desktop cost constraints, but as with desktops,
it would take several generations to evolve into a properly balanced architecture.
&lt;/p&gt;
 
&lt;p&gt;
The 4-way capable processors remained on a premium derivative,
given the Xeon MP brand in the early Pentium 4 architecture (or NetBurst) period. 
To provide job security for marketing people, 2-way processors were then became the Xeon 5000 series,
and 4-way the Xeon 7000 series in the late NetBurst to 2010 period.
In 2011, the new branding scheme is E3 for 1P servers, E5 for 2-way and E7 for 4-way and higher. 
Presumably each branding adjustment requires changes to thousands of slidedecks.
&lt;/p&gt;
 
&lt;p&gt;
At first, Intel thought both 2-way and 4-way systems had high demand versus cost elasticity.
If cost could be reduced, there would be substantially higher volume.
Chipsets (MCH and IOH) had overly aggressive cost objectives that limited in memory and IO capability.

In fact, 4-way systems had probably already fallen below the boundary of demand elasticity.
&lt;/p&gt;
 
&lt;p&gt;
The same may have been true for 2-way systems, as people began to realize that single processor
systems were just fine for entry server requirements.
For Pentium II and III 2-way systems, Intel only had a desktop chipset.
In 2005-6, Intel was finally able to produce a viable chipset for 2-way systems (E7500? or 5000P) that provided
memory and IO capability beyond desktop systems. 
Previously, the major vendors elected for chipsets from ServerWorks.
&lt;/p&gt;
 

&lt;p&gt;
It was also thought at the time that there was not a requirement for premium processors in 2-way server systems.
The more correct interpretation was that the large (and initially faster) cache of premium processors 
did not contribute sufficient value for 2-way systems.
A large cache does improve performance in 2-way systems, but not to the degree that it does at the 4-way level.
So the better strategy by far on performance above the baseline 2-way system with standard desktop
processors was to step up to a 4-way system with the low-end premium processors instead of a 2-way system with 
the bigger cache premium processors.
&lt;/p&gt;
 
&lt;p&gt;
And as events turned out, the 4-way premium processors lagged desktop processors in transitions to
new microarchitectures and manufacturing processes by 1 full year or more.
The 2-way server on the newer technology of the latest desktop processors
was better than a large cache processor of the previous generation, 
especially one that carried a large price premium.
So the repackaged desktop processor was the better option for 2-way systems
&lt;/p&gt;
 
&lt;p&gt;
The advent of multi-core enabled premium processors to be a viable concept for 2-way systems.
A dual-core processor has much more compute capability than a single core and the same for a quad-core over dual-core
in any system, not just 4-way, 
provided that there not too much difference in frequency. 
The power versus frequency characteristics of microprocessors clearly favors multiple cores for code 
that scale with threads, as in any properly architected server application.
&lt;/p&gt;
 
&lt;p&gt;
However, multi-core at the dual and quad-core level was employed for desktop processors. 
So the processors for 2-way servers did not have a significant premium in capability relative to desktops.
The Intel server strategy remained big cache processors. There was the exception of Tigerton, 
when two standard desktop dual-core processor die in the Xeon MP socket was employed for the 4-way system, 
until a large cache variant was readied in the next generation Dunnington processor incorporated a large cache.
This also happened for the Paxville and Tulsa.
&lt;/p&gt;
 
&lt;h4&gt;System Architecture Evolution from Core 2 to Sandy Bridge&lt;/h4&gt;
&lt;p&gt;
The figure below shows 4-way and 2-way server architecture evolution relative to single processor desktops (and servers too)
from 45nm Core 2 to Nehalem &amp;amp; Westmere and then to Sandy Bridge. Nehalem systems are not shown for space considerations,
but are discussed below.
&lt;/p&gt;
 
&lt;p&gt;&lt;img alt="" src="http://www.qdpma.com/SystemArchitecture_files/IntelServerTransition2.png" width="627" height="434"&gt;
&lt;br&gt;&lt;strong&gt;System architecture from Penryn to Westmere to Sandy Bridge&lt;/strong&gt;, (Nehalem not shown)
&lt;/p&gt;

&lt;p&gt;
The Core 2 architecture was the last Intel processor to use the shared bus, which allows 
multiple devices, processors and bridge chips, to share a bus with a protocol to arbitrate 
for control of the bus. It was called the front-side bus (FSB) because there was once a back-side bus for cache.
When cache was brought on-die more than 10 years ago, the BSB was no more.
By the Core 2 period, to support higher bus frequency, the number of devices was reduced to 2,
but the shared bus protocol was not changed.
The FSB was only pushed to 1066MHz for Xeon MP, 1333MHz for 2-way servers, and 1600MHz for 2-way workstations.
&lt;/p&gt;

&lt;p&gt;
Nehalem was the first Intel processor with a true point-to-protocol, Quick Path Interconnect (QPI),  
at 6.4GHz transfer rate, achieving much higher bandwidth to pin efficiency than possible over shared bus.
Intel had previously employed a point-to-point protocol for connecting nodes of an Itanium system back in 2002.
(AMD implemented point-to-point with HT for Opteron in 2003? at an initial signaling rate of 1.6GHz?)
Shared bus also has bus arbitration overhead in addition to lower frequency of operation. 
The other limitation of Intel processors up to Core 2, was the concentration of signals on the 
memory controller hub (also known as North Bridge) for processors, memory and PCI-E.
The 7300 MCH for the 4-way Core 2 has 2013-pins, which is at the practical limit, 
and yet the memory and IO bandwidth is somewhat inadequate.
&lt;/p&gt;

&lt;p&gt;
Nehalem and Westmere implement a massive increase in memory and PCI-E bandwidth (number of channels or ports) for 
the 2-way and 4-way systems compared to their Core 2 counterparts.
Both Nehalem 2-way and 4-way systems have significantly higher cost structure than Core 2.
Previously, Intel had been mindlessly obsessed with reducing system to the detriment of balanced memory and IO.
This shows Intel recognized that their multi-processor systems were already below the price-demand elasticity point,
and it was time to rebalance memory and IO bandwidth, now possible with point to point interconnect
and the integrated memory controller.
&lt;/p&gt;

&lt;p&gt;
QPI in Nehalem required an extra chip to bridge the processor to PCI-E.
This was not an issue for multi-processor systems,
but was undesirable for the hyper sensitive cost structure of desktop systems.
The lead quad-core 45nm Nehalem processor with 3 memory channels and 2 QPI ports in a LGA 1366 socket 
was followed by a quad-core, 2-memory channel derivative (Lynnfield) with 16 PCI-E plus DMI replacing QPI in a LGA 1156 socket. 
The previously planned dual-core Nehalem on 45nm was cancelled.
Nehalem with QPI was employed in the desktop extreme line, 
while the quad-core without QPI was employed in the high-end of the regular desktop line.
&lt;/p&gt;

&lt;p&gt;
The lead 32nm Westmere was a dual-core with the same LGA 1156 socket (memory and IO) as Lynnfield.
Per the desktop and mobile objective, cost structure was reduced with integration, 
with 1 processor die and potentially a graphics die in the same package, 
and just 1 other component the PCH.
&lt;/p&gt;

&lt;p&gt;
The follow-on Westmere derivative was a six-core using the same LGA 1366 socket as Nehalem, 
i.e., 3 memory channels and 2 QPI. 
This began the separation process of desktop and other single processor systems from 
multi-processor server and workstation systems. 
Extreme desktops employ the higher tier components designed for 2-way, but are still single-socket systems.
I suppose that a 2-way extreme system is a workstation. 
Gamers will have settle for the mundane look of a typical workstation chassis.
&lt;/p&gt;

&lt;p&gt;
With the full set of Sandy Bridge derivatives, the server strategy transition will be complete. 
Multi-processor products, even for 2-way, are completely separated from desktops without the requirement 
to meet desktop cost structure constraints.
With desktops interested only in dual and quad-core, 
a premium product strategy can be built for 2-way and above around both the number of cores and QPI links.
&lt;/p&gt;

&lt;p&gt;
The Sandy Bridge premium processor has 8 cores, 4 memory channels, 2 QPI, 40 PCI-E lanes and DMI
(that can function as x4 PCI-E). 
The high-end EP line in a LGA 2011 socket will have full memory, QPI and PCI-E capability.
The EN line in LGA 1356 socket will have 3 memory channels, 1 QPI and 24 PCI-E lanes plus DMI
to supports up to 2-way systems, and will be suitable for lower priced systems. 
Extreme desktops will use the LGA 2011 socket, but without QPI.
&lt;/p&gt;

&lt;p&gt;
What is interesting is that the 4-way capable Sandy Bridge EP line is targeted at both 2-way and 4-way systems. 
This is a departure from the old Intel strategy of premium processors for 4-way and up.
Since the basis of the old strategy is no longer valid, of course a new strategy should be formulated.
But too often, people only remember the rules of the strategy, not the basis.
And hence blindly follow the old strategy even when it is no longer valid (does this sound familiar?)
&lt;/p&gt;

&lt;p&gt;
This element of a premium 2-way system actually started with the Xeon 6500 line based on Nehalem-EX.  
Nehalem-EX was designed for 4-way and higher with eight-cores, 
4 memory channels supporting 16 DIMMs per processor and 4 QPI links.
A 2-way Nehalem-EX with 8 cores, 16 DIMMs per socket might be viable versus Nehalem at 4 cores, 9 DIMMs per socket,
even though the EX top frequency 2.26GHz versus 2.93GHz and higher in Nehalem.
The more consequential hindrance was that Nehalem-EX did not enter production until Westmere-EP was also in production, 
with 6 cores per socket at 3.33GHz.
So the Sandy-Bridge EP line will provide a better indicator for premium 2-way systems.
&lt;/p&gt;
 
&lt;h4&gt;The Future of 8-way and the EX line&lt;/h4&gt;
&lt;p&gt;
There is no EX line with Sandy Bridge.
Given the relatively low volume of 8-way systems, it is better not to burden the processor used by 4-way systems
with glue-less 8-way capability.
Glue-less means that the processors can be directly connected without the need for additional bridge chips.
This both lowers cost and standardizes multi-processor system architecture, 
which is probably one of the cornerstones for the success Intel achieved in MP systems.
I am expecting that 8-way systems are not being abandoned, 
but rather a system architecture with "glue" will be employed. 
&lt;/p&gt;
 
&lt;p&gt;
Since 8-way systems are a specialized very high-end category,
this would suggest a glued system architecture is more practical in terms of effort than a subsequent 22nm Ivy Bridge EX.
Below are two of my suggestions for 8-way Sandy Bridge or perhaps Ivy Bridges depending on when components could be available.
The first has two 4-port QPI switch, or cross-bar or routers connecting four nodes with 2 processors per node.
&lt;/p&gt;

&lt;p&gt;&lt;img alt="" src="http://www.qdpma.com//SystemArchitecture_files/SandyBridgeEP8a.png" width="578" height="475"&gt;&lt;/p&gt;

&lt;p&gt;
The second system below has two 8-port QPI switches connecting single processor nodes.
&lt;/p&gt;

&lt;p&gt;&lt;img alt="" src="http://www.qdpma.com//SystemArchitecture_files/SandyBridgeEP8b.png" width="578" height="475"&gt;&lt;/p&gt;

&lt;p&gt;
The 2 processor node architecture would be economical, but I am inclined to recommend building the 8-port QPI switch.
Should the 2 processor node prove to be workable,
then a 16-way system would be possible.
Both are purely speculative as Intel does not solicit my advice on server system architecture and strategy, 
not even back in 1997-99.
&lt;/p&gt;

&lt;p&gt;
In looking at the HP DL980 diagram, I am thinking that the HP node controllers
would support Sandy Bridge EP in an 8-way system.
&lt;/p&gt;

&lt;p&gt;
&lt;img alt="DL980" src="http://www.qdpma.com//SystemArchitecture_files/DL980_xnc.png" width="604" height="239"&gt;
&lt;/p&gt;

&lt;p&gt;
There are cache coherency implications (Directory based versus Snoop) that are beyond the scope for database server oriented topic.
There was an IBM or Sun discussion transactional memory.
I would really like to see some innovation on handling locks. 
This is critical to database performance and scaling.
For example, the database engine ensures exclusive access to a row, i.e., memory, before allowing access.
Then why does the system architecture need to do a complex cache coherency check when the application has already done so?
I had also previously discussed SIMD instructions to improve handling of page and row base storage,
&lt;a href="http://sqlblog.com/blogs/joe_chang/archive/2011/02/07/simd-extensions-for-the-database-storage-engine.aspx"&gt;SIMD Extensions for the Database Storage Engine&lt;/a&gt;
(same &lt;a href="http://www.qdpma.com/CPU/SIMD_Extensions.html"&gt;here&lt;/a&gt;).
&lt;/p&gt;

&lt;p&gt;
If that were not enough, I had also called for splitting the memory system.
Over the period of Intel multi-processor systems 1995 to 2011, 
practical system memory has increased from 2GB to 2TB.
Most of the new memory capacity is used for data buffers.
The exceptionally large capacity of the memory system also means that it cannot be 
brought very close to the processor, as into to the same package/socket.
&lt;/p&gt;

&lt;p&gt;
So the memory architecture should be split into a small segment 
that needs super low latency byte addressability.
The huge data buffer portion could be changed to block access.
If so, then perhaps the database page organization should also be changed to make the metadata
access more efficient in terms of modern processor architecture to reduce 
the impact of off-die memory access by making
full use of cache line organization.
The NAND people are also arguing for Storage Class Memory, something along the lines
of NAND used as memory.
&lt;/p&gt;

&lt;p&gt;
More on &lt;a href="http://www.qdpma.com/SystemArchitecture/SystemArchitecture_2011Q3.html"&gt;QDMPA System Architecture&lt;/a&gt;.
and &lt;a href="http://www.qdpma.com/SystemArchitecture/SystemArchitecture_SandyBridge.html"&gt;Sandy Bridge&lt;/a&gt;.
&lt;/p&gt;</description></item><item><title>TPC-H Benchmarks - Westmere-EX versus RISC</title><link>http://sqlblog.com/blogs/joe_chang/archive/2011/10/10/tpc-h-benchmarks-westmere-ex-versus-risc.aspx</link><pubDate>Mon, 10 Oct 2011 20:03:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:38969</guid><dc:creator>jchang</dc:creator><description>&lt;p&gt;
There has been relatively litle activity in TPC Benchmarks recently with the exception of the raft of Dell TPC-H results with Exa Solutions. 
It could be that systems today are so powerful that few people feel the need for benchmarks.
IBM published an 8-way Xeon E7 (Westmere-EX) TPC-E result of 4593 in August, slightly higher
than the Fujitsu result of 4555, published in May 2011.
Both systems have 2TB memory. IBM prices 16GB DIMMs at $899 each, $115K for 2TB or $57.5K per TB. (I think a 16MB DIMM was $600+ back in 1995!)
The Fujistu system has 384 SSDs of the 60GB SLC variety, $1014 each, 
and IBM employed 143 SSDs of the 200GB eMLC variety, $1800 each for 24-28TB raw capacity respectively.
Except for unusually write intensive situations, eMLC or even regular MLC is probably 
good enough for most environments.
&lt;/p&gt;
&lt;p&gt; 
HP published a TPC-H 1TB of 219,887.p QphH
for their 8-way ProLiant DL980 G7 with the Xeon E7-4870,
26% higher in the overall composite score than the IBM x3580 with the Xeon E7-8870 (essentially the same processor).
The HP scores 16% higher in power and 37.7% higher in throughput.
Both throughput tests were with 7 streams.
The HP system had Hyper-Threading enabled (80 physical cores, 160 logical) 
while the IBM system did not. 
Both systems had 2TB memory, more than sufficient to hold the entire database, data and indexes in memory.
The IBM system had 7 PCI-E SSDs and 
the HP system has 416 HDDs over 26 D2700 disk enclosures, 10 LSI SAS RAID controllers,
3 P411 and 1 dual-port 8Gbps FC controller.
&lt;/p&gt;
&lt;p&gt;Also of interest are TPC-H 1TB reports published for the 16-way SPARC M8000 (June 2011)
with SPARC64 VII+ processors and the 4-way SPARC T4-4 (Sep 2011).
The table below shows configuration information for recent TPC-H 1000GB results.
&lt;/p&gt;

&lt;table cellSpacing="1" cellPadding="4"&gt;

&lt;tr&gt;&lt;th&gt;TPC-H 1000GB&lt;/th&gt;&lt;th&gt;IBM x3850 X5&lt;/th&gt;&lt;th&gt;HP ProLiant DL980 G7&lt;/th&gt;&lt;th&gt;IBM Power 780&lt;/th&gt;&lt;th&gt;SPARC M8000&lt;/th&gt;&lt;th&gt;SPARC T4-4&lt;/th&gt;&lt;/tr&gt;
&lt;tr align="center"&gt;&lt;td align="left"&gt;DBMS&lt;/td&gt; &lt;td&gt;SQL 2K8R2 EE&lt;/td&gt;&lt;td&gt;SQL 2K8R2 EE&lt;/td&gt;&lt;td&gt;Sybase IQ ASE 15.2&lt;/td&gt;&lt;td&gt;Oracle 11g R2&lt;/td&gt;&lt;td&gt;Oracle 11g R2&lt;/td&gt;&lt;/tr&gt;
&lt;tr align="center"&gt;&lt;td align="left"&gt;Processors&lt;/td&gt;&lt;td&gt;8 Xeon E7&lt;/td&gt;&lt;td&gt;8 Xeon E7&lt;/td&gt;&lt;td&gt;8 POWER7&lt;/td&gt;&lt;td&gt;16 SPARC64 VII+&lt;/td&gt;&lt;td&gt;4 SPARC T4&lt;/td&gt;&lt;/tr&gt;
&lt;tr align="center"&gt;&lt;td align="left"&gt;Cores Threads&lt;/td&gt;  &lt;td&gt;80-80&lt;/td&gt;&lt;td&gt;80-160&lt;/td&gt;&lt;td&gt;32-128&lt;/td&gt;&lt;td&gt;64-128&lt;/td&gt;&lt;td&gt;32-256&lt;/td&gt;&lt;/tr&gt;
&lt;tr align="center"&gt;&lt;td align="left"&gt;Memory&lt;/td&gt; &lt;td&gt;2048TB&lt;/td&gt;&lt;td&gt;2048TB&lt;/td&gt;&lt;td&gt;512GB&lt;/td&gt;&lt;td&gt;512GB&lt;/td&gt;&lt;td&gt;512GB&lt;/td&gt;&lt;/tr&gt;
&lt;tr align="center"&gt;&lt;td align="left"&gt;IO Controllers&lt;/td&gt; &lt;td&gt;7&lt;/td&gt;&lt;td&gt;13&lt;/td&gt;&lt;td&gt;12&lt;/td&gt;&lt;td&gt;4 Arrays&lt;/td&gt;&lt;td&gt;4 Arrays&lt;/td&gt;&lt;/tr&gt;
&lt;tr align="center"&gt;&lt;td align="left"&gt;HDD/SSD&lt;/td&gt;&lt;td&gt;7 SSD&lt;/td&gt;&lt;td&gt;416 HDD&lt;/td&gt;&lt;td&gt;52 SSD&lt;/td&gt;&lt;td&gt;4x80 SSD&lt;/td&gt;&lt;td&gt;4x80 SSD&lt;/td&gt;&lt;/tr&gt;

&lt;/table&gt;

&lt;p&gt;
The figure below shows TPC-H 1000GB power, throughput and QphH composite scores for 4 x Xeon 7560 (32 cores, 64 threads),
two 8 x Xeon E7 (80 cores, 80 and 160 threads) systems, 8 x POWER7 (32 cores, 128 threads)
16 SPARC64 VII+ (64 cores, 128 threads) and the 4 SPARC T4 (32 cores, 256 threads).
&lt;/p&gt;
&lt;p&gt;
&lt;img alt="tpch100" src="http://www.qdpma.com/SSD_Memory_TPCH_files/tpch1000_summary_201109.png" width="573" height="316"&gt;
&lt;br&gt;&lt;b&gt;TPC-H SF 1000 Results&lt;/b&gt;
&lt;/p&gt;
&lt;p&gt;
The HP 8-way Xeon and both Oracle/Sun systems, one with 16 sockets
and the newest with 4 SPARC T4 processors, are comparable, within 10%.
&lt;/p&gt;
&lt;p&gt;
An important point is that both Oracle/Sun and the IBM Power systems are configured with 512GB memory
versus 2TB for the 8-way Xeon E7 systems, which enough to keep all data and indexes in memory.
There is still disk IO for the initial data load and tempdb intermediate results.
This good indication that Oracle and Sybase have been reasonably optimized on IO, in particular,
when to use an index and when not to. 

I had previously raised the issue that the SQL Server query optimizer should consider
the different characteristics of in-memory, DW optimized HDD storage (100MB/s per disk sequential)
and SSD. 
&lt;/p&gt;
&lt;p&gt;
Sun clearly made tremendous improvements from the SPARC 64 VII+ to the T4,
with the 4-way new system essentially matching the previous 16-way. 
Of course, the Sun had been lagging at the individual processor socket level until now.
The most&amp;nbsp;interesting aspect is that the SPARC T4 has 8 threads per core.
The expectation is that server applications have a great deal of pointer chasing code,
that is: fetch memory which determines next address to fetch with inherently poor locality.
&lt;/p&gt;
&lt;p&gt;
A modern microprocessor with core frequency 3GHz corresponds to a 0.33 nano-second clock cycle.
Local node memory access time might be 50ns, or 150 CPU-clocks.
Remote node memory acess time might be 100ns for a neighboring node to over 250ns for multi-hop nodes
after cache-coherency is taken into account.
So depending on how many instructions are required for each non-cached memory access,
we can expect each thread or logical core to have many dead cycles, possibly enough to justify 8 threads per core.
What is surprising is that Oracle published a TPC-H benchmark with their new T4-4
and not a TPC-C/E which is more likely to emphasize the pointer chasing code than DW.
&lt;/p&gt;
&lt;p&gt;Below are the 22 individual query times for the above systems in the power test (1 stream).&lt;/p&gt;
&lt;p&gt;
&lt;img alt="tpch100" src="http://www.qdpma.com/SSD_Memory_TPCH_files/tpch1000_201109a_log.png" width="720" height="299"&gt;
&lt;br&gt;&lt;b&gt;TPC-H SF 1000 Queries 1-22&lt;/b&gt;
&lt;/p&gt;
&lt;p&gt;
Below are the 22 individual query power times for just the two 8 Xeon E7 systems.
Overall, the HP system (with HT enabled) has 16% TPC-H power score, but the IBM system without HT 
is faster or comparable in 9 of the 22 queries.
Not considering the difference in system architecture, the net might be attributed to HT?
&lt;/p&gt;
&lt;p&gt;
&lt;img alt="tpch100" src="http://www.qdpma.com/SSD_Memory_TPCH_files/tpch1000_201109b_log.png" width="719" height="298"&gt;
&lt;br&gt;&lt;b&gt;TPC-H SF 1000 IBM and HP 8-way Xeon E7&lt;/b&gt;
&lt;/p&gt;

&lt;p&gt;Below are the 22 individual query power times for the HP 8 Xeon E7 and Oracle SPARC T4-4 systems.&lt;/p&gt;
&lt;p&gt;
&lt;img alt="tpch100" src="http://www.qdpma.com/SSD_Memory_TPCH_files/tpch1000_201109c_log.png" width="717" height="295"&gt;
&lt;br&gt;&lt;b&gt;TPC-H SF 1000 8-way HP Xeon E7 and 4-way SPARC T4&lt;/b&gt;
&lt;/p&gt;</description></item><item><title>New Fusion ioDrive2 and ioDrive2 Duo</title><link>http://sqlblog.com/blogs/joe_chang/archive/2011/10/04/new-fusion-iodrive2-and-iodrive2-duo.aspx</link><pubDate>Tue, 04 Oct 2011 22:06:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:38850</guid><dc:creator>jchang</dc:creator><description>&lt;p&gt;Fusion-iO just announced the new ioDrive2 and ioDrive2 Duo on Oct 2011 (at some conference of no importance). 
The MLC models will be available late November and the SLC models afterwards.
See the Fusion-iO &lt;a href="http://www.fusionio.com/press-releases/new-storage-class-memory-beats-fusion-iodrive-on-all-metrics/" target="_blank"&gt;
press release&lt;/a&gt; for more info.
&lt;/p&gt;

&lt;p&gt;
Below are the Fusion-IO ioDrive2 and ioDrive2 Duo specifications. 
The general idea seems to be for the ioDrive2 to match the realizable bandwidth of a PCI-E gen2 x4 slot (1.6GB/s)
and for the ioDrive2 Duo to match the bandwidth of a PCI-E gen2 x8 slot (3.2GB/s).
I assume that there is a good explanation why most models have specifications slightly below the corresponding PCI-E limits.
&lt;/p&gt;

&lt;p&gt;
The exception is that 365GB model at about 50% of the PCI-E g2 x4 limit.
Suppose that the 785GB model implement parallelism with 16 channels and 4 die per channel.
Rather than building the 365GB model with the same 16 channels, 
but a different NAND package with 2 die each, they just implemented 8 channels using the same 4 die per package.
Lets see if Fusion explains this detail.
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fusion-IO ioDrive2&lt;/strong&gt;&lt;/p&gt;

&lt;table cellSpacing="1" cellPadding="4"&gt;

&lt;tr&gt;&lt;th&gt;ioDrive2 Capacity&lt;/th&gt;&lt;th&gt;400GB&lt;/th&gt;&lt;th&gt;600GB&lt;/th&gt;&lt;th&gt;365GB&lt;/th&gt;&lt;th&gt;785GB&lt;/th&gt;&lt;th&gt;1.2TB&lt;/th&gt;&lt;/tr&gt;

&lt;tr align="left"&gt;
&lt;td align="left"&gt;NAND Type&lt;/td&gt;
&lt;td colSpan="2"&gt;SLC (Single Level Cell)&lt;/td&gt;
&lt;td colSpan="3"&gt;MLC (Multi Level Cell)&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="right"&gt;
&lt;td align="left"&gt;Read Bandwidth (64kB)&lt;/td&gt;
 
&lt;td&gt;1.4 GB/s&lt;/td&gt;
&lt;td&gt;1.5 GB/s&lt;/td&gt;
&lt;td&gt;710 MB/s&lt;/td&gt;
&lt;td&gt;1.2 GB/s&lt;/td&gt;
&lt;td&gt;1.3 GB/s&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="right"&gt;
&lt;td align="left"&gt;Write Bandwidth (64kB)&lt;/td&gt;
&lt;td&gt;1.3 GB/s&lt;/td&gt;
&lt;td&gt;1.3 GB/s&lt;/td&gt;
&lt;td&gt;560 MB/s&lt;/td&gt;
&lt;td&gt;1.0 GB/s&lt;/td&gt;
&lt;td&gt;1.2 GB/s&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="right"&gt;
&lt;td align="left"&gt;Read IOPS (512 Byte)&lt;/td&gt;
  
&lt;td&gt;351,000&lt;/td&gt;
&lt;td&gt;352,000&lt;/td&gt;
&lt;td&gt; 84,000&lt;/td&gt;
&lt;td&gt; 87,000&lt;/td&gt;
&lt;td&gt; 92,000&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="right"&gt;
&lt;td align="left"&gt;Write IOPS (512 Byte)&lt;/td&gt;
 
&lt;td&gt;511,000&lt;/td&gt;
&lt;td&gt;514,000&lt;/td&gt;
&lt;td&gt;502,000&lt;/td&gt;
&lt;td&gt;509,000&lt;/td&gt;
&lt;td&gt;512,000&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="right"&gt;
&lt;td align="left"&gt;Read Access Latency&lt;/td&gt;
 
&lt;td&gt;47 µs&lt;/td&gt;
&lt;td&gt;47 µs&lt;/td&gt;
&lt;td&gt;68 µs&lt;/td&gt;
&lt;td&gt;68 µs&lt;/td&gt;
&lt;td&gt;68 µs&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="right"&gt;
&lt;td align="left"&gt;Write Access Latency&lt;/td&gt;
&lt;td&gt;15 µs&lt;/td&gt;
&lt;td&gt;15 µs&lt;/td&gt;
&lt;td&gt;15 µs&lt;/td&gt;
&lt;td&gt;15 µs&lt;/td&gt;
&lt;td&gt;15 µs&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="left"&gt;
&lt;td align="left"&gt;Bus Interface&lt;/td&gt;
&lt;td colSpan="5"&gt;PCI-E Gen 2 x4&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="right"&gt;
&lt;td align="left"&gt;Price&lt;/td&gt;
&lt;td&gt;$?&lt;/td&gt;
&lt;td&gt;?&lt;/td&gt;
&lt;td&gt;$5,950?&lt;/td&gt;
&lt;td&gt;$?&lt;/td&gt;
&lt;td&gt;?&lt;/td&gt;
&lt;/tr&gt;

&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Fusion-IO ioDrive2 Duo&lt;/strong&gt;&lt;/p&gt;

&lt;table cellSpacing="1" cellPadding="4"&gt;

&lt;tr&gt;&lt;th&gt;ioDrive2 Capacity&lt;/th&gt;&lt;th&gt;1.2TB&lt;/th&gt;&lt;th&gt;2.4TB&lt;/th&gt;&lt;/tr&gt;

&lt;tr align="left"&gt;
&lt;td align="left"&gt;NAND Type&lt;/td&gt;
&lt;td colSpan="1"&gt;SLC (Single Level Cell)&lt;/td&gt;
&lt;td colSpan="1"&gt;MLC (Multi Level Cell)&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="right"&gt;
&lt;td align="left"&gt;Read Bandwidth (64kB)&lt;/td&gt;
 
&lt;td&gt;3.0 GB/s&lt;/td&gt;
&lt;td&gt;2.6 GB/s&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="right"&gt;
&lt;td align="left"&gt;Write Bandwidth (64kB)&lt;/td&gt;
&lt;td&gt;2.6 GB/s&lt;/td&gt;
&lt;td&gt;2.4 GB/s&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="right"&gt;
&lt;td align="left"&gt;Read IOPS (512 Byte)&lt;/td&gt;
  
&lt;td&gt;702,000&lt;/td&gt;
&lt;td&gt;179,000&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="right"&gt;
&lt;td align="left"&gt;Write IOPS (512 Byte)&lt;/td&gt;
 
&lt;td&gt;937,000&lt;/td&gt;
&lt;td&gt;922,000&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="right"&gt;
&lt;td align="left"&gt;Read Access Latency&lt;/td&gt;
 
&lt;td&gt;47 µs&lt;/td&gt;
&lt;td&gt;68 µs&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="right"&gt;
&lt;td align="left"&gt;Write Access Latency&lt;/td&gt;
&lt;td&gt;15 µs&lt;/td&gt;
&lt;td&gt;15 µs&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="left"&gt;
&lt;td align="left"&gt;Bus Interface&lt;/td&gt;
&lt;td colSpan="2"&gt;PCI-E Gen 2 x8&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="right"&gt;
&lt;td align="left"&gt;Price&lt;/td&gt;
&lt;td&gt;$?&lt;/td&gt;
&lt;td&gt;?&lt;/td&gt;
&lt;/tr&gt;

&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;SLC verus MLC NAND&lt;/strong&gt;&lt;br&gt;
Between the SLC and MLC models, the SLC models have much better 512-byte reads IOPS than the MLC models,
with only moderately better bandwidth and read latency.
Not mentioned, but common knowledge is that SLC NAND has much greater write-cycle endure than MLC NAND.
&lt;/p&gt;

&lt;p&gt;It is my opinion that most database, transaction processing and DW, can accommodate MLC NAND
characteristics and limitations in return for the lower cost per TB.
I would consider budgeting a replacement set of SSDs if analysis shows that the MLC life-cycle does not match
the expected system life-cycle. 
Of course, I am also an advocate of replacing the main production database server on a 2-3 year cycle
instead of the&amp;nbsp;traditional (bean-counter)&amp;nbsp;5-year practice.
&lt;/p&gt;
&lt;font size="2"&gt;
&lt;p&gt;The difference in read IOPS at 512B is probably not important. If the ioDrive2 MLC models can drive 70K+ read IOPS at 8KB, then it does not matter what the 512B IOPS is.&lt;/p&gt;
&lt;/font&gt;

&lt;p&gt;&lt;strong&gt;Post-RAID?&lt;/strong&gt;&lt;br&gt;
One point from the press release:
"new intelligent self-healing feature called Adaptive FlashBack provides complete chip level fault tolerance, 
which enables ioMemory to repair itself after a single chip or a multi chip failure without interrupting business continuity."
For DW systems, I would like to completely do away with RAID when using SSDs,
instead having two system without RAID on SSD units.
By this, I mean fault-tolerance should be pushed into the SSD at the unit level.&amp;nbsp;Depending the failure rate of the controller,&amp;nbsp;perhaps there could be two controllers on each SSD unit.&lt;/p&gt;

&lt;p&gt;
For a critical transaction processing system, it would be nice if Fusion could provide failure statistics
for units that have been in production for more than 30 days
(or whatever the infant mortality period is) on the assumption that most environments will spend a certain amount
of time to spin up a new production system.
If the failure rate for a system with 2-10 SSDs is less than 1 per year,
then perhaps even a transaction processing system using mirroring for high-availability can also do 
without RAID on the SSD?
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ioDrive2 and ioDrive2 Duo&lt;/strong&gt;&lt;br&gt;
I do think that it is great idea for Fusion to offer both the ioDrive2 and ioDrive2 Duo product lines
matched to PCI-E gen2 x4 and x8 bandwidths respectively.
The reason is that server systems typically have a mix of PCI-E x4 and x8 slots
with no clear explanation of the reasoning for the exact mix, 
other than perhaps that being demanded by the customer complaining the loudest.
&lt;/p&gt;

&lt;p&gt;
By have both the ioDrive2 and Duo, it is possible to fully utilize the bandwidth from all available slots
balanced correctly.
It would have been an even better idea if the Duo is actually a daughter card the plugs onto the 
ioDrive2 base unit, so the base model can be converted to a Duo,
but Fusion apparently neglected to solicit my advice on this matter.
&lt;/p&gt;
&lt;p&gt;
I am also inclined to think that there should also be an ioDrive2 Duo MLC model
at 1.2TB, on the assumption that the performance will be similar to the 2.4TB model,
as the ioDrive2 765GB and 1.2TB models have similar performance specifications.
The reason is that a database server should be configuration with serious brute force IO capability,
that is, all open PCI-E gen 2 slots should be populated.
But not every system will need the x8 slots populated with the 2.4TB MLC model,
hence the viability of a 1.2TB model as well.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;ps&lt;/strong&gt;&lt;br&gt;if Fusion should be interested in precise quantitative analysis for SQL Server performance, instead of the rubish whitepapers put out by typical system vendors,
well I can turn a good performance report very quickly. Of course I would need to keep the cards a while for continuing analysis...&lt;/p&gt;</description></item><item><title>Laptop for database performance consultants</title><link>http://sqlblog.com/blogs/joe_chang/archive/2011/09/02/laptop-for-database-performance-consultants.aspx</link><pubDate>Fri, 02 Sep 2011 21:39:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:38247</guid><dc:creator>jchang</dc:creator><description>&lt;p style="font-size:11pt;"&gt;
Today, it is actually possible to build a highly capable database
system in a laptop form factor. There is no point to running a production database
on a laptop.&amp;nbsp;The purpose of this is so that consultants (i.e., me), can
investigate database performance issues without direct access to a full sized
server. It is only necessary to have the characteristics of a proper
database server, rather than be an exact replica. 
&lt;/p&gt;

&lt;p style="font-size:11pt;"&gt;
Unfortunately, the commercially available laptops do not
support the desired configuration, so I am making an open appeal to laptops
vendors. What I would like is: 
&lt;/p&gt;

&lt;p style="font-size:11pt;"&gt;
1) Quad-core processor with hyper-threading (8 logical processors), 
&lt;br&gt;2) 8-16GB memory&amp;nbsp;(4&amp;nbsp;SODIMM so we do not need really expensive 8GB single rank DIMMs)&amp;nbsp;
&lt;br&gt; 3) 8x64GB (raw capacity) SSDs on a PCI-E Gen 2 x8
interface (&lt;strong&gt;for the main database&lt;/strong&gt;, not the OS)
&lt;br&gt;- alternatively, 2-4 x4 externally accessible PCI-E ports&amp;nbsp;for external SSDs&lt;br&gt;- or 2 x4 SAS 6Gbps ports for external SATA SSDs&amp;nbsp;
&lt;br&gt;4) 2-3 SATA ports for HDD/SSD/DVD etc&amp;nbsp;&lt;strong&gt;for OS boot&lt;/strong&gt; etc
&lt;br&gt;5) 1-2 e-SATA&lt;br&gt;6) 2 1GbE 
&lt;/p&gt;

&lt;p style="font-size:11pt;"&gt;
Below is a representation of the system, if this helps clarify.
&lt;br&gt;
&lt;img style="width:301px;height:365px;" title="thunderbolt" alt="SandyBridgeLaptop" src="http://www.qdpma.com/SystemArchitecture_files/SandyBridgeLaptop3.png" width="301" height="365"&gt;
&lt;br&gt;
&lt;img style="width:692px;height:459px;" title="thunderbolt" alt="SandyBridgeLaptop" src="http://www.qdpma.com/SystemArchitecture_files/SandyBridgeLaptop2.png" width="692" height="459"&gt;
&lt;/p&gt;

&lt;p style="font-size:11pt;"&gt;
The Sandy-Bridge integrated graphics should be sufficient, but
high-resolution 1920x1200 graphics and dual-display are desired. (I could live
with 1920x1080). 
&lt;br&gt;There should also be a SATA hard disk for the OS (or SATA SSD
without the 2.5in HDD form factor if space constrained) as the primary SSD array should be dedicated to the database. 
&lt;br&gt;Other desirable elements would be 1 or 2 e-SATA port, to&amp;nbsp;support&amp;nbsp;backup and restores with consuming the valuable main SSD array,&lt;br&gt;and 2x1GbE
ports (so I can test code for parallel network transfers.
&lt;/p&gt;

&lt;p style="font-size:11pt;"&gt;
The multiple processor cores allow parallel execution plans.
Due to a quirk of the SQL Server query optimizer, 8 or logical processors are more
likely to generate a parallel execution plan in some cases. &lt;br&gt;Ideally, the main
SSD array is comprised of 2 devices, one on each&amp;nbsp;PCI-E x4 channel.
&lt;/p&gt;

&lt;p style="font-size:11pt;"&gt;
The point of the storage system is to demonstrate 2GB/sec+
bandwidth, and 100-200K IOPS. One of the sad fact is even today storage
vendors promote $100K+ storage systems that end up&amp;nbsp;delivering less
than 400-700MB/s bandwidth and less than 10K IOPS. 
So it is important to demonstrate what a proper database storage system should be capable of. 
&lt;br&gt;Note that is it not necessary to have massive memory.&amp;nbsp;
A system with sufficient memory and a powerful storage system can run any query, while a system with very large memory but weak storage can only run read queries that fit in memory. And even if data fits in memory, the performance&amp;nbsp;could still&amp;nbsp;fall off a cliff on tempdb IO.
&lt;/p&gt;

&lt;p style="font-size:11pt;"&gt;
Based on component costs, the
laptop without PCI-E SSD should be less than $2000, and the SSD array should be
less than $1000 per PCI-E x4 unit (4x64GB).
&lt;br&gt;It would really help if the PCI-E SSD could be powered off from SW, i.e., without having to remove it. This why I want to boot off the SATA port, be&amp;nbsp; it HDD or SSD.&lt;/p&gt;

&lt;p style="font-size:11pt;"&gt;
&lt;strong&gt;NAND notes&lt;/strong&gt;
&lt;br&gt;per below, &lt;strong&gt;2&amp;nbsp;SSDs on SATA ports do not cut the mustard&lt;/strong&gt;, &lt;br&gt;
The spec above call for 8 SSDs. Each SSD is comprised of 8 NAND packages, and each
package is comprised of 8 die. So there are 64 die in one SSD, and IO is
distributed over 8 SSDs, or a total of 512 individual die. &lt;br&gt;
The performance of a single NAND die is nothing special and even pathetic on
writes. However, a single NAND die is really small and really cheap. That is
why it is essential to employ high parallelism at the SSD unit level. And then,
employ parallelism over multiple SSD units.
&lt;br&gt;
An alternative solution is for the laptop to expose 2-4 PCI-E x4 ports (2 Gen 2
or 4 Gen 1) to connect to something like the OCZ IBIS, or an SAS controller
with 2 x4 external SAS ports.&lt;/p&gt;

&lt;p style="font-size:11pt;"&gt;
&lt;strong&gt;System notes&lt;/strong&gt;
&lt;br&gt;The laptop will have 1 Intel quad-core Sandy-Bridge processor, which has 2 memory channels supporting 16GB dual-rank DDR3 memory. The processor has 16 PCI-E gen 2, DMI g2 (essentially&amp;nbsp;4 PCI-E g2 lanes) and integrated graphics. There must be a 6-series (or C20x) PCH, which connects upstream on the DMI. Downstream, there are 6 SATA ports (2 of which can be 6Gbps), 1 GbE port, and 8 PCI-E g2 lanes. So on the PCH, we can attach 2 HDD or SSD at 6Gbps, plus support 2 eSATA connections. There is only a single 1GbE port, so if we want 2, we have to employ a separate GbE chip.&lt;/p&gt;

&lt;p style="font-size:11pt;"&gt;
While the total PCH down stream ports exceeds the upstream, it ok for our purposes to support 2&amp;nbsp;internal SATA&amp;nbsp;SSDs at 6Gbps, 2 eSATA ports and 2 GbE, plus USB etc. The key is how the 16 PCI-E gen 2 lanes are employed. In the available high-end laptops, most vendors attach a high-end graphics chip (to all 16 lanes?).&amp;nbsp;We absolutely need 8 PCI-E lanes for&amp;nbsp;our high performance SDD storage&amp;nbsp;array. I would be happy with the integrated graphics, but if the other 8 PCI-E lanes were attached to graphics, I could live with it.
&lt;/p&gt;

&lt;p style="font-size:11pt;"&gt;The final comment (for now) is that even though it is possible to attach more than 2 SSD off the PCH, we need then bandwidth on the main set of PCI-E ports. It is insufficient for all storage to be clogging the DMI and PCH.&lt;/p&gt;

&lt;p style="font-size:11pt;"&gt;
&lt;strong&gt;Thunderbolt&lt;/strong&gt;
&lt;br&gt;Thunderbolt is 2x2 PCI-E g2 lanes, so technically thats almost what I need (8 preferred, but 6 acceptable). 
&lt;br&gt;&lt;strike&gt;What is missing from the documentation is were Thunderbolt attaches.&lt;br&gt;If directly to the SandyBridge processor (with bridge chip for external?), then that's OK,&lt;br&gt;if off the PCH, then that is not good enough for the reasons I outlined above. 
&lt;/strike&gt;
&lt;br&gt;Also, we need serious SSDs to attach off TB, does the Apple SSD cut mustard?&lt;/p&gt;

&lt;p style="font-size:11pt;"&gt;
The diagram below shows the Thunderbolt controller connected to the PCH, 
but also states that other configurations are possible.
The problem is that most high-end laptops are designed with high-end graphics,
which we do not want squandering all 16 PCI-E lanes.
&lt;/p&gt;

&lt;p&gt;
&lt;img style="width:414px;height:524px;" title="thunderbolt" alt="thunderbolt" src="http://www.qdpma.com/SystemArchitecture_files/thunderbolt.png" width="414" height="524"&gt;&lt;/p&gt;


&lt;p style="font-size:11pt;"&gt;
A Thunderbolt controller attached to the PCH&amp;nbsp;is capable of supporting x4 PCI-E gen 2, but cannot also simultaneously support saturation volume traffic from internal storage (SATA ports), and network (not to mention eSATA). I should add that I intend to place the log on the SATA port HDD/SSD, along with the OS, hence I do not want the main SSD array generating traffic&amp;nbsp;over the DMI-PCH connection.
&lt;/p&gt;

&lt;p style="font-size:11pt;"&gt;
A Thunderbolt SDK is supposed to released very soon, so we can find out more.
I am inclined to think that Thunderbolt is really a docking station connector, being able to route both video and IO over a single connector. If we only need to route IO traffic, then there&amp;nbsp;are already 2 very suitable protocols for this, i.e., eSATA for consumer, and SAS for servers, each with a decent base of products. Of course, I might like a 4 bay disk enclosure for 2.5in SSDs on 1x4 SAS, or an 8-bay split over 2 x4 ports. Most of the existing disk enclosures carry over from hard disk environment, with either 12-15 3.5in bays or 24-25 2.5in bays.&lt;/p&gt;</description></item><item><title>IBM System x3850 X5 TPC-H Benchmark</title><link>http://sqlblog.com/blogs/joe_chang/archive/2011/03/04/ibm-system-x3850-x5-tpc-h-benchmark.aspx</link><pubDate>Fri, 04 Mar 2011 21:04:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:33911</guid><dc:creator>jchang</dc:creator><description>&lt;p&gt;IBM just published a TPC-H SF 1000 result for their 
&lt;a href="http://www.qdpma.com\SystemArchitecture\SystemArchitecture_IBM.html"&gt;x3850 X5&lt;/a&gt;, 
4-way Xeon 7560 system featuring a special MAX5 memory expansion board to support 1.5TB memory. In Dec 2010, IBM also published a TPC-H SF1000 for their Power 780 system, 8-way, quad-core, (4 logical processors per physical core).
&lt;/p&gt;

&lt;p&gt;
The figure table below shows TPC-H SF 1000 results for the 8-way 6-core Opteron 8439 on SQL Server and Sybase, the 16-way quad-core Itanium 9350 on Oracle, the 4-way Xeon 7560 on SQL Server and the 8-way POWER7 on Sybase. On TPC-H Power (single stream), the 4-way Xeon on SQL Server is competitively placed relative to the 16-way Itanium and 8-way POWER7 systems. In other words, an  8-way Xeon might be comparable to the 8-way POWER7. If there is a weak point in SQL Server, it is in the throughput test (multiple concurrent query streams). This aspect is probably something that could be corrected. Unfortunately, it is probably not a priority for the SQL Server team at this time.
&lt;/p&gt;

&lt;p&gt;&lt;img alt="tpch100" src="http://www.qdpma.com/SSD_Memory_TPCH_files/tpch1000_summary_2011.png" width="665" height="323"&gt;
&lt;br&gt;&lt;b&gt;TPC-H SF 1000 Results for HP DL785 and Integrity Superdome servers&lt;/b&gt;&lt;/p&gt;

&lt;table cellSpacing="1" cellPadding="3"&gt;

&lt;tr align="center"&gt;&lt;th&gt;System&lt;/th&gt;&lt;th&gt;Processor&lt;/th&gt;&lt;th&gt;Total&lt;br&gt;Cores&lt;/th&gt;&lt;th&gt;Memory&lt;/th&gt;&lt;th&gt;SQL&lt;/th&gt;&lt;th&gt;Power&lt;/th&gt;&lt;th&gt;Throughput&lt;/th&gt;&lt;th&gt;Composite&lt;br&gt;QphH&lt;/th&gt;&lt;th&gt;Streams&lt;/th&gt;&lt;/tr&gt;

&lt;tr align="center"&gt;
&lt;td&gt;HP DL785 G6&lt;/td&gt;

&lt;td&gt;Opt 8439&lt;/td&gt;

&lt;td&gt;48&lt;/td&gt;

&lt;td&gt;512&lt;/td&gt;

&lt;td&gt;2008 rtm&lt;/td&gt;

&lt;td&gt;95,789.1&lt;/td&gt;

&lt;td&gt;69,367.6&lt;/td&gt;

&lt;td&gt;81,367.6&lt;/td&gt;

&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="center"&gt;
&lt;td&gt;HP DL785 G6&lt;/td&gt;

&lt;td&gt;Opt 8439&lt;/td&gt;

&lt;td&gt;48&lt;/td&gt;

&lt;td&gt;384&lt;/td&gt;

&lt;td&gt;Sybase 15.1&lt;/td&gt;

&lt;td&gt;108,436.8&lt;/td&gt;

&lt;td&gt;96,652.7&lt;/td&gt;

&lt;td&gt;102,375.3&lt;/td&gt;

&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="center"&gt;
&lt;td&gt;HP Superdome2&lt;/td&gt;

&lt;td&gt;It 9350&lt;/td&gt;

&lt;td&gt;64&lt;/td&gt;

&lt;td&gt;512&lt;/td&gt;

&lt;td&gt;O11g R2&lt;/td&gt;

&lt;td&gt;139,181.0&lt;/td&gt;

&lt;td&gt;141,188.1&lt;/td&gt;

&lt;td&gt;140,181.1&lt;/td&gt;

&lt;td&gt;64&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="center"&gt;
&lt;td&gt;IBM x3850 X5&lt;/td&gt;

&lt;td&gt;Xeon 7560&lt;/td&gt;

&lt;td&gt;32&lt;/td&gt;

&lt;td&gt;1536&lt;/td&gt;

&lt;td&gt;2008 R2&lt;/td&gt;

&lt;td&gt;127,676.1&lt;/td&gt;

&lt;td&gt;81,039.6&lt;/td&gt;

&lt;td&gt;101,719.3&lt;/td&gt;

&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="center"&gt;
&lt;td&gt;IBM Power 780&lt;/td&gt;

&lt;td&gt;POWER 7&lt;/td&gt;

&lt;td&gt;32&lt;/td&gt;

&lt;td&gt;512&lt;/td&gt;

&lt;td&gt;Sybase 15.2&lt;/td&gt;

&lt;td&gt;170,206.1&lt;/td&gt;

&lt;td&gt;159,463.1&lt;/td&gt;

&lt;td&gt;164,747.2&lt;/td&gt;

&lt;td&gt;9&lt;/td&gt;
&lt;/tr&gt;

&lt;/table&gt;

&lt;p&gt;
Additional details are below.
The two IBM results employ SSD storage. The older results are on HDD storage.
In addition, the IBM x3850 X5 (with Xeon 7560) system is configured with 1.5TB memory.
The total size of the TPC-H SF 1000 database, all tables and indexes, should be 1.4TB. 
A storage system (7 SSDs) capable of very high IO rates is still required to handle the intense tempdb activity.
&lt;/p&gt;

&lt;table cellSpacing="1" cellPadding="3"&gt;

&lt;tr align="center"&gt;&lt;th&gt;System&lt;/th&gt;&lt;th&gt;DL785 G6&lt;/th&gt;&lt;th&gt;DL785 G6&lt;/th&gt;
 &lt;th&gt;Superdome 2&lt;/th&gt;&lt;th&gt;x3850 X5&lt;/th&gt;&lt;th&gt;Power 780&lt;/th&gt;&lt;/tr&gt;

&lt;tr align="center"&gt;
&lt;td&gt;Database&lt;/td&gt;

&lt;td&gt;SQL Server&lt;/td&gt;

&lt;td&gt;Sybase 15.1&lt;/td&gt;

&lt;td&gt;Oracle 11g R2&lt;/td&gt;

&lt;td&gt;SQL Server&lt;/td&gt;

&lt;td&gt;Sybase 15.2&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="center"&gt;
&lt;td&gt;Processor&lt;/td&gt;

&lt;td&gt;Opteron 8439&lt;/td&gt;

&lt;td&gt;Opteron 8439&lt;/td&gt;

&lt;td&gt;Itanium 9350&lt;/td&gt;

&lt;td&gt;Xeon 7560&lt;/td&gt;

&lt;td&gt;POWER7&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="center"&gt;
&lt;td&gt;Sockets-Cores&lt;/td&gt;

&lt;td&gt;8 x 6 = 48&lt;/td&gt;

&lt;td&gt;8 x 6 = 48&lt;/td&gt;

&lt;td&gt;16 x 4 = 64&lt;/td&gt;

&lt;td&gt;4 x 8 = 32&lt;/td&gt;

&lt;td&gt;8 x 4 = 32&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="center"&gt;
&lt;td&gt;Hyper-Threading&lt;/td&gt;

&lt;td&gt;no&lt;/td&gt;

&lt;td&gt;no&lt;/td&gt;

&lt;td&gt;disabled&lt;/td&gt;

&lt;td&gt;2 per&lt;/td&gt;

&lt;td&gt;4 per&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="center"&gt;
&lt;td&gt;Frequency&lt;/td&gt;

&lt;td&gt;2.8GHz&lt;/td&gt;

&lt;td&gt;2.8GHz&lt;/td&gt;

&lt;td&gt;1.73GHz&lt;/td&gt;

&lt;td&gt;2.26GHz&lt;/td&gt;

&lt;td&gt;4.1GHz&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="center"&gt;
&lt;td&gt;Memory&lt;/td&gt;

&lt;td&gt;512GB&lt;/td&gt;

&lt;td&gt;384GB&lt;/td&gt;

&lt;td&gt;512G&lt;/td&gt;

&lt;td&gt;1536GB&lt;/td&gt;

&lt;td&gt;512GB&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="center"&gt;
&lt;td&gt;Storage Controllers&lt;/td&gt;

&lt;td&gt;6 P800&lt;/td&gt;

&lt;td&gt;8 x 8Gbps&lt;br&gt; dual-port FC&lt;/td&gt;
 
&lt;td&gt;48 8Gpbs &lt;br&gt;dual-port FC&lt;/td&gt;

&lt;td&gt;7 PCI-E SSD&lt;/td&gt;

&lt;td&gt;12 PCI-E SAS&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="center"&gt;
&lt;td&gt;Storage Ext&lt;/td&gt;

&lt;td&gt;12 MSA70&lt;/td&gt;

&lt;td&gt;4 MSA2324fc&lt;/td&gt;

&lt;td&gt;24 MSA2324&lt;/td&gt;

&lt;td&gt;&amp;nbsp;&lt;/td&gt;

&lt;td&gt;4 EXP 12&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="center"&gt;
&lt;td&gt;Data disks&lt;/td&gt;

&lt;td&gt;240 HDD&lt;/td&gt;

&lt;td&gt;96 HDD&lt;/td&gt;

&lt;td&gt;576 HDD&lt;/td&gt;

&lt;td&gt;7 SSD&lt;/td&gt;

&lt;td&gt;52 SAS SSD&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="center"&gt;
&lt;td&gt;Controller-Disks&lt;/td&gt;

&lt;td&gt;3x50, 25, 30, 35&lt;/td&gt;

&lt;td&gt;1 per 24&lt;/td&gt;

&lt;td&gt;1 per 24&lt;/td&gt;

&lt;td&gt;&amp;nbsp;&lt;/td&gt;

&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="center"&gt;
&lt;td&gt;LUNs-disks&lt;/td&gt;

&lt;td&gt;48x5?&lt;/td&gt;

&lt;td&gt;?&lt;/td&gt;

&lt;td&gt;3 per 6&lt;/td&gt;

&lt;td&gt;&amp;nbsp;&lt;/td&gt;

&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="center"&gt;
&lt;td&gt;OS&lt;/td&gt;

&lt;td&gt;2008 R2 EE&lt;/td&gt;

&lt;td&gt;RHEL 5.3&lt;/td&gt;

&lt;td&gt;HP-UX 11&lt;/td&gt;

&lt;td&gt;2008 R2 EE&lt;/td&gt;

&lt;td&gt;RHEL 6&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="center"&gt;
&lt;td&gt;Database&lt;/td&gt;

&lt;td&gt;2008 EE&lt;/td&gt;

&lt;td&gt;Sybase 15.1&lt;/td&gt;

&lt;td&gt;Oracle 11g R2&lt;/td&gt;

&lt;td&gt;2008 R2&lt;/td&gt;

&lt;td&gt;Sybase 15.2&lt;/td&gt;
&lt;/tr&gt;

&lt;/table&gt;

&lt;p&gt;Below are the individual query run times. &lt;/p&gt;

&lt;p&gt;&lt;img alt="tpch100" src="http://www.qdpma.com/SSD_Memory_TPCH_files/tpch1000_2011_log.png" width="722" height="341"&gt;
&lt;br&gt;&lt;b&gt;TPC-H SF 1000 individual query execution times&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;
Note the wide variation in each query between different systems and database engines.
This could reflect differences in any of: &lt;br&gt;
&amp;nbsp;1) processor and system architecture,&lt;br&gt;
&amp;nbsp;2) memory versus disk, HDD and SSD&lt;br&gt;
&amp;nbsp;3) execution plans&lt;br&gt;
&amp;nbsp;4) the efficiency between component operations (scan, index seek, hash, sort, etc) &lt;br&gt;
and probably other factors as well.
It would be interesting to compare the execution plans between different database engines,
even to force the SQL Server execution plan to one as close as possible to the plans 
employed on the other database engines.
&lt;/p&gt;

&lt;p&gt;
The main point of interest is not moderate differences in the overall (geometric mean) performance, but rather the very large differences in certain queries.
The long run time for Q18 should probably be investigated.
&lt;/p&gt;

&lt;p&gt;
Another view, the 4-way Xeon 7560 SF 1TB with 1.5TB memory + SSD versus 4-way Xeon SF 3TB with 0.5TB memory &amp;amp; HDD.
The number of processors&amp;nbsp;is&amp;nbsp;doubled, but the database is 3 times larger. 
On this alone, we might expect a 50% difference in query time, with the caveat that&amp;nbsp;there are complications in projecting TPC-H performance at different scale factor. 
There are also significant differences in the memory-to-data ratio,
and storage performance characteristics.
&lt;/p&gt;
&lt;p&gt;
&lt;img alt="tpch100" src="http://www.qdpma.com/SSD_Memory_TPCH_files/tpch_Xeon7560.png" width="722" height="340"&gt;
&lt;br&gt;&lt;b&gt;TPC-H individual query execution times for 4-way 1TB and 8-way 3TB&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;
On the 8-way system at SF 3TB, Q18 actually runs faster than on the 4-way system at SF 1TB.
But the other larger queries, Q1, 9, and 21, show the expected pattern.
Overall, it does appear that the 3TB query run times are&amp;nbsp;on the order&amp;nbsp;of 50% higher.&lt;/p&gt;</description></item><item><title>Benchmark Update - Astounding Fujitsu RX900 8-way Xeon 7560 TPC-E Scaling</title><link>http://sqlblog.com/blogs/joe_chang/archive/2010/09/28/benchmark-update-astounding-fujitsu-8-way-xeon-7560-tpc-e-scaling.aspx</link><pubDate>Tue, 28 Sep 2010 21:13:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:29024</guid><dc:creator>jchang</dc:creator><description>&lt;P&gt;Fujitsu just published an astounding TPC-E benchmark result of 3,800 tpsE for their 8-way Xeon 7560 system, the Primergy RX900 S1. Fujitsu had previously published a TPC-E result of 2046.96 for their 4-way Xeon 7560 system, the Primergy RX600 S5. The new results shows 85.6% scaling from 4-socket to 8-socket. &lt;/P&gt;
&lt;P&gt;Microsoft Windows Server 2008 R2 introduced core OS improvements in that not only increased the number of logical processors supported from 64 to ???, but removed many locks, including the dispatch scheduler lock. This improved high-end scaling (64 to 128 cores?) from 1.5X to 1.7X, based on tests with the HP Superdome and Itanium processors. At the time of this announcement, the Xeon 7500 processor were not yet available. &lt;/P&gt;
&lt;P&gt;When the Xeon 7500 did become available in early 2010, the first TPC-E benchmarks were 2,022.64 and 3141.76 tpsE for the 4-way and 8-way Xeon 7560 systems respectively. The scaling from 4S to 8S was 1.55X, well below the expectation of 1.7X set by Microsofts 2008 R2 announcement. This was understandable as the 8-way result was probably rushed to alignment with product launch. Perfect benchmark results are ready on their own schedule, which is not always in time for marketing blitzes. (Of course, considering that the marketing budget may be paying for the benchmarks, it would be advisable to try really really hard to have a good result for product launch.) &lt;/P&gt;
&lt;P&gt;There are two apparent differences between the new Fujitsu and original NEC 8-way Xeon 7560 TPC-E reports. One is the Fujitsu uses SSD while the NEC system used HDD storage. The SSD configuration yields much better average response times mostly in the Trade Lookup and Trade Update transactions, with a reductions from 50/56ms to 13/14ms respectively. In the 4-way Xeon 7560 TPC-E reports, the use of SSD over HDD yields &lt;STRIKE&gt;1%&lt;/STRIKE&gt; improvement &lt;/P&gt;
&lt;P&gt;(My mistake, I should compare the Fujitsu RX900 2046.96 tps-E result with the Dell 1933.96 tpsE, both systems at 512GB for 5.8% performance gain attributed to SSD over HDD. The 1% gain was compared to the IBM at 2022.64 tpsE with 1TB memory and HDD storage). &lt;/P&gt;
&lt;P&gt;The other difference is that the Fujistu system distributes network traffic over 6 GbE ports compared with 2 for the NEC system. There are 24 or so(?) RPC calls per TPC-E transaction, so the extra network ports might provide another minor improvment. &lt;/P&gt;
&lt;P&gt;Nothing apparent can explain the 4S to 8S scaling improvement from 1.55X to 1.85X. This is certainly not impossible, as IBM figured out how to do this and better with their POWER4 line some years ago. At the time, I thought this was mostly the massive inter-processor bandwidth of the POWER4. Now it is more clear that the OS and database engine all contribute to nearly perfect scaling. &lt;/P&gt;
&lt;P&gt;My thinking is that some one at Micrsoft has been watching the performance traces and finally figured out the most critical points of contention. (Such persons are always nameless to the outside world, as this would upstage more established egos.)&amp;nbsp;So I believe this is a new build of Windows and SQL Server, but build numbers do seem to be obvious in the TPC reports, even though full disclosure is required. It is never some magic registry entry like Turbo Mode: ON. &lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;ps&lt;/STRONG&gt; The more serious impact of SSD may be evident in the Maximum response times, which ran as high as 68.7 seconds for Trade-Result on the NEC system with HDD, and topped out at 7.1 seconds on the Fujitsu system with SSD. I am thinking that having an open transaction for 68 sec can have serious repercussions on an OLTP system. Curious though the 4-way Fujitsu with SSD&amp;nbsp;could not keep&amp;nbsp;max response&amp;nbsp;similarly low (18.53 sec on Trade-Lookup),&amp;nbsp;while the 4-way IBM kept&amp;nbsp;max response&amp;nbsp;to 17 sec with HDD.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Transaction Response Times&lt;/STRONG&gt; &lt;BR&gt;The table below shows transaction response times, average and maximum for the 8-way NEC with HHD and the Fujitsu with SSD storage. The SSD storage system has better average response time, with the biggest impact in Trade-Lookup and Trade-Update. The reduction in maximum response is more dramatic in 6 of the 10 transactions. &lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Transaction Response Times&lt;/STRONG&gt;, Average and Maximum&lt;/P&gt;
&lt;TABLE cellSpacing=1 cellPadding=3&gt;

&lt;TR align=middle&gt;
&lt;TH&gt;&amp;nbsp;&lt;/TH&gt;
&lt;TH&gt;Avg Response&lt;/TH&gt;
&lt;TH&gt;Max Response&lt;/TH&gt;
&lt;TH&gt;&amp;nbsp;&lt;/TH&gt;
&lt;TH&gt;&amp;nbsp;&lt;/TH&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TH&gt;System&lt;/TH&gt;
&lt;TH&gt;NEC&lt;BR&gt;A1080&lt;/TH&gt;
&lt;TH&gt;Fujitsu&lt;BR&gt;RX900&lt;/TH&gt;
&lt;TH&gt;NEC&lt;/TH&gt;
&lt;TH&gt;Fujitsu&lt;/TH&gt;
&lt;TH&gt;weight&lt;/TH&gt;
&lt;TH&gt;frames&lt;/TH&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;Storage&lt;/TD&gt;
&lt;TD&gt;HDD&lt;/TD&gt;
&lt;TD&gt;SSD&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;Broker-Volume&lt;/TD&gt;
&lt;TD&gt;0.05&lt;/TD&gt;
&lt;TD&gt;0.06&lt;/TD&gt;
&lt;TD&gt;2.88&lt;/TD&gt;
&lt;TD&gt;6.72&lt;/TD&gt;
&lt;TD align=right&gt;4.9%&lt;/TD&gt;
&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;Customer-Position&lt;/TD&gt;
&lt;TD&gt;0.02&lt;/TD&gt;
&lt;TD&gt;0.05&lt;/TD&gt;
&lt;TD&gt;43.55&lt;/TD&gt;
&lt;TD&gt;3.49&lt;/TD&gt;
&lt;TD align=right&gt;13%&lt;/TD&gt;
&lt;TD&gt;2&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;Market-Feed&lt;/TD&gt;
&lt;TD&gt;0.03&lt;/TD&gt;
&lt;TD&gt;0.03&lt;/TD&gt;
&lt;TD&gt;48.81&lt;/TD&gt;
&lt;TD&gt;3.48&lt;/TD&gt;
&lt;TD align=right&gt;1%&lt;/TD&gt;
&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;Market-Watch&lt;/TD&gt;
&lt;TD&gt;0.03&lt;/TD&gt;
&lt;TD&gt;0.05&lt;/TD&gt;
&lt;TD&gt;2.77&lt;/TD&gt;
&lt;TD&gt;2.83&lt;/TD&gt;
&lt;TD align=right&gt;18%&lt;/TD&gt;
&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;Security-Detail&lt;/TD&gt;
&lt;TD&gt;0.01&lt;/TD&gt;
&lt;TD&gt;0.02&lt;/TD&gt;
&lt;TD&gt;2.89&lt;/TD&gt;
&lt;TD&gt;3.79&lt;/TD&gt;
&lt;TD align=right&gt;14%&lt;/TD&gt;
&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;Trade-Lookup&lt;/TD&gt;
&lt;TD&gt;0.50&lt;/TD&gt;
&lt;TD&gt;0.13&lt;/TD&gt;
&lt;TD&gt;49.09&lt;/TD&gt;
&lt;TD&gt;3.30&lt;/TD&gt;
&lt;TD align=right&gt;8%&lt;/TD&gt;
&lt;TD&gt;4&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;Trade-Order&lt;/TD&gt;
&lt;TD&gt;0.07&lt;/TD&gt;
&lt;TD&gt;0.10&lt;/TD&gt;
&lt;TD&gt;45.96&lt;/TD&gt;
&lt;TD&gt;3.74&lt;/TD&gt;
&lt;TD align=right&gt;10.1%&lt;/TD&gt;
&lt;TD&gt;4&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;Trade-Result&lt;/TD&gt;
&lt;TD&gt;0.07&lt;/TD&gt;
&lt;TD&gt;0.13&lt;/TD&gt;
&lt;TD&gt;68.73&lt;/TD&gt;
&lt;TD&gt;7.10&lt;/TD&gt;
&lt;TD align=right&gt;10%&lt;/TD&gt;
&lt;TD&gt;6&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;Trade-Status&lt;/TD&gt;
&lt;TD&gt;0.02&lt;/TD&gt;
&lt;TD&gt;0.03&lt;/TD&gt;
&lt;TD&gt;60.23&lt;/TD&gt;
&lt;TD&gt;6.51&lt;/TD&gt;
&lt;TD align=right&gt;19%&lt;/TD&gt;
&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;Trade-Update&lt;/TD&gt;
&lt;TD&gt;0.56&lt;/TD&gt;
&lt;TD&gt;0.14&lt;/TD&gt;
&lt;TD&gt;3.46&lt;/TD&gt;
&lt;TD&gt;3.79&lt;/TD&gt;
&lt;TD align=right&gt;2%&lt;/TD&gt;
&lt;TD&gt;3&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;Data-Maintenance&lt;/TD&gt;
&lt;TD&gt;0.11&lt;/TD&gt;
&lt;TD&gt;0.07&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;&lt;STRONG&gt;weighted&lt;BR&gt;Avg Response&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;STRONG&gt;0.0812&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;STRONG&gt;0.0635&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;&lt;STRONG&gt;Average&lt;BR&gt;tx in flight&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;STRONG&gt;2526.5&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;STRONG&gt;2389.1&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;/TABLE&gt;
&lt;P&gt;&lt;STRONG&gt;System configuration&lt;/STRONG&gt; &lt;/P&gt;
&lt;TABLE cellSpacing=1 cellPadding=3&gt;

&lt;TR align=middle&gt;
&lt;TH&gt;System&lt;/TH&gt;
&lt;TH&gt;NEC&lt;BR&gt;A1080&lt;/TH&gt;
&lt;TH&gt;Fujitsu&lt;BR&gt;RX900&lt;/TH&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;Processor&lt;/TD&gt;
&lt;TD&gt;Xeon 7560&lt;/TD&gt;
&lt;TD&gt;Xeon 7560&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;Sockets-Cores&lt;/TD&gt;
&lt;TD&gt;8 x 8 = 64&lt;/TD&gt;
&lt;TD&gt;8 x 8 = 64&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;Hyper-Threading&lt;/TD&gt;
&lt;TD&gt;yes&lt;/TD&gt;
&lt;TD&gt;yes&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;Frequency&lt;/TD&gt;
&lt;TD&gt;2.26GHz&lt;/TD&gt;
&lt;TD&gt;2.26GHz&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;Memory&lt;/TD&gt;
&lt;TD&gt;1024GB&lt;/TD&gt;
&lt;TD&gt;1024GB&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;IO&lt;/TD&gt;
&lt;TD&gt;7 FC&lt;/TD&gt;
&lt;TD&gt;14 SAS&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;Storage&lt;/TD&gt;
&lt;TD&gt;1872 HDD&lt;/TD&gt;
&lt;TD&gt;336 SSD&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;OS&lt;/TD&gt;
&lt;TD&gt;2008 R2 DC&lt;/TD&gt;
&lt;TD&gt;2008 R2 DC&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;Database&lt;/TD&gt;
&lt;TD&gt;2008 R2 DC&lt;/TD&gt;
&lt;TD&gt;2008 R2 DC&lt;/TD&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;&lt;STRONG&gt;tps-E&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;STRONG&gt;3,141.76&lt;/STRONG&gt;&lt;/TD&gt;
&lt;TD&gt;&lt;STRONG&gt;3,800.00&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TABLE&gt;
&lt;H3&gt;Benchmark Summary 2010-09-28&lt;/H3&gt;
&lt;P&gt;Below is a summary of the best available TPC benchmark results for recent Intel Xeon and AMD Opteron server systems. Note that Westmere-EP 32nm and Nehalem-EX 45nm have been consolidated.&lt;/P&gt;
&lt;TABLE cellSpacing=1 cellPadding=3&gt;

&lt;TR align=middle&gt;
&lt;TH&gt;Processor&lt;BR&gt;Architecture&lt;BR&gt;Process&lt;/TH&gt;
&lt;TH&gt;TPC&lt;/TH&gt;
&lt;TH&gt;2-way&lt;/TH&gt;
&lt;TH&gt;4-way&lt;/TH&gt;
&lt;TH&gt;8-way&lt;/TH&gt;
&lt;TH&gt;16-way&lt;/TH&gt;&lt;/TR&gt;
&lt;TR align=middle&gt;
&lt;TD&gt;W-32nm/N-EX45&lt;BR&gt;Xeon 5600 6C&lt;BR&gt;7500 8C&lt;/TD&gt;
&lt;TD&gt;TPC-C&lt;BR&gt;TPC-E&lt;BR&gt;TPC-H&lt;/TD&gt;
&lt;TD&gt;803,068&lt;BR&gt;1,110&lt;BR&gt;73,974.6@100G&lt;/TD&gt;
&lt;TD&gt;1,807,347&lt;BR&gt;2,022.64&lt;BR&gt;121,346@300G&lt;/TD&gt;
&lt;TD&gt;-&lt;BR&gt;3,800.00&lt;BR&gt;162,601@3TB&lt;/TD&gt;
&lt;TD&gt;future&lt;BR&gt;future&lt;BR&gt;future&lt;/TD&gt;&lt;/TR&gt;
&lt;TR style="BACKGROUND:lightgray;" align=middle&gt;
&lt;TD&gt;Magny-Cours&lt;BR&gt;45nm&lt;BR&gt;12C&lt;/TD&gt;
&lt;TD&gt;TPC-C&lt;BR&gt;TPC-E&lt;BR&gt;TPC-H&lt;/TD&gt;
&lt;TD&gt;705,652&lt;BR&gt;887.4&lt;BR&gt;71,438.3@100G&lt;/TD&gt;
&lt;TD&gt;1,193,472&lt;BR&gt;1,464&lt;BR&gt;107,561@300G&lt;/TD&gt;
&lt;TD&gt;n/a&lt;BR&gt;n/a&lt;BR&gt;n/a&lt;/TD&gt;
&lt;TD&gt;n/a&lt;BR&gt;n/a&lt;BR&gt;n/a&lt;/TD&gt;&lt;/TR&gt;&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;2010 Sep 19&lt;/H4&gt;
&lt;P&gt;TPC-H results was finally published for 4-way Xeon 7500 @300GB on 14 Sep. A TPC-C result was also published for the 4-way 7500 on 27 Aug. There will probably not be a TPC-C for the 8-way DL980 as there may be a limitation for SQL Server in the ability to write to a single log file. HP seems to be the only vendor active in TPC-H. This could be because other companies have cut staff. Benchmarking is a specialized skill. It usually takes a dedicated person for each benchmark and environment. It is not the benchmark result that is important. It is the investigation into the root cause of bottlenecks to improve performance in the next iteration that is important. So this means only HP will be making contributions in DW. &lt;/P&gt;</description></item></channel></rss>