A recently published TPC-H benchmark result, along with previously published results provide some insight into to the relative impact of SSD to disk drives, and SSD versus system memory. The TPC-H configurations are shown below. All results are at SF 100. The first two are on Windows Server 2003 sp1, and SQL Server 2005 sp1 and sp2 respectively use processors based on the Intel Core 2 architecture and HDD storage. The two recent results are both on Windows Server 2008 and SQL Server 2008 sp1 using the Intel Xeon 5570 processors (Nehalem architecture) and Fusion-IO SSD storage.
||2 Xeon 5355
||4 Xeon 7350
||2 Xeon 5570
||2 Xeon 5570
The system configurations above are not the best to make this comparison. Ideally we would like to be able see the TPCH SF 100 result for Dell T610 with HDD storage. This would provide two systems identical except for the storage, and two similar systems that differ in system memory with the large memory configuration sufficient to place the entire database in the buffer cache. But we can make estimates for the expected differences between the two processor architectures based on available information.
First a quick word about TPC-H. Scale Factor 100 means the LINEITEM table data (leaf level) should be 100GB, or did when the table definition employed the 8 byte DATETIME data type. With the 4 byte DATE type employed in SQL Server 2008, the SF 100 LINEITEM table drops to 90GB. The full TPC-H SF100 database with all tables and indexes was approximately 170GB in SQL Server 2005, decreasing to 150GB with the DATE type. With compression, the database size probably fits in the buffer cache with SQL Server maximum memory set to 135GB out of the system memory of 144GB.
TPC-H is comprised of 22 scored queries, and has two test sequences. The Power sequence runs the 22 queries sequentially using a single stream, which is essentially a test of the database engine parallel execution plan capability. The Throughput sequences run multiple concurrent streams. All systems in this group use 5 streams in the throughput test at this SF. It would be helpful to see the Power sequence query time at degree of parallelism 1, 2, 4, 8 and so on to the number of physical and logical cores, but this is not part of the TPC-H. The top level results are shown below.
|2 Xeon 5355
|4 Xeon 7350
|2 Xeon 5570
|2 Xeon 5570
|2x 5355 to 2x 5570
|2x 5570 to 4x 7350
The T610 with 2 Xeon 5570 result is very impressive, being 83% higher than the 2-way Xeon 5355 quad-core 2.66GHz Core 2 architecture (8 physical cores on both systems) with other difference being the SSD storage. The T610 is just 21% lower than the 4-way Xeon 7350 system with 16 Core 2 architecture cores and 128GB memory. The typical TPC-H storage configuration with hard drives can deliver substantial IO bandwidth, with 4-8 PCI-E SAS controllers and 80-200 disk drives. The SSD configuration, however, can also deliver spectacularly more pseudo-random small block IOPS than 200 disk drives and at much lower latency.
The DL380G6 with 144GB memory is even more impressive scoring 78% higher in the composite QphH score, 87% in Power and 69 % higher in overall than the T610 with 48GB.
The chart below the run time for each of the 22 TPC-H queries in the Power sequence (stream 0) on the Dell T610 with Xeon 5570 processors and SSD storage relative to the Dell PowerEdge 2900 with Xeon 5355 processors and HDD storage.
We can use the SPEC CPU 2005 integer benchmark to make a rough estimate for the difference in the processor performance between the Core 2 and Nehalem architectures. The SPEC CPU 2005 integer for the Xeon 5355 (2.6GHz) is 19.4 and for the Xeon 5570 31.5, with a very large gain in 2 components, which means some of the gain should be de-rated due to the nature of SPEC CPU where the very latest compiler improvements are incorporated. The 60% difference in performance implies that the execution time of the faster processor should be approximate 0.625 times that of the baseline processor. Since the reduction in query time is visibly better than 0.625 for many queries, we can attribute this to the SSD.
The chart below shows the TPC-H Power sequence query times for the DL380G6 with 144GB memory relative to the T610 with 48GB. Five queries, 1, 9, 13, 15 and 16, show only minor differences with query 13 actually being 15% faster with less memory, and another two, 3, 14, show 20% gain for the extra memory. Nine queries show gains of 2.5X or more and another shows 2X gain. So overall, there is a significant benefit to having data in system memory, i.e., buffer cache, even with the extraordinarily high IOPS and low latency of SSD storage.
The reason for this is that the CPU cost for the operating system to perform an IO operation, regardless of whether this is to disk or SSD storage devices, and then to evict an existing page in the buffer cache is high relative to the cost of many common SQL operations. See IO Cost Structure (or my blog on SQLblog) for more on this. The duration or elapsed time on disks drives is much higher than SSD, but the CPU cost cannot be evaded.
While the published performance data is not ideal, needing a result for a 2-way Xeon 5570 with 48GB memory and HDD storage for SQL Server 2008, the available results are sufficient to indicate that SSD can significantly improve performance relative to HDD even in data warehouse type applications, and the system memory is still best for database performance.
The chart below shows the stream 0 (Power) runtime for the 16-core (4xXeon 7350 Core 2) 128GB and hard drives, the 8-core (Xeon 5570 Nehalem) with 48G and SSD, and the 8-core (X5570) with 144GB and SSD relative to the 8-core (Xeon 5355 Core 2) 48GB memory with hard drives. In query 3, the 16-core with 128G is actually 20% slower than the 8-core with 48GB. This goes to show parallel execution plans can be very tricky. Without thorough investigation of how each query behaves at degree of parallelism 1, 2, 4, 8, 16 and so on, nasty surprises can and do occur. Big-iron system can be very useful, but a simple migration from a 4-way system can result in minimal or negative performance change. With expert investigation and troubleshooting, very respectable performance gains can usually be achieved.
For the majority of the queries, the 16-core 128G memory system has double or more improvement relative to 8-cores 48GB memory. Doubling the number cores alone should yield 30-40 reduction in query time as the gain is entirely due to the increase degree of parallelism from 8 to 16, which is a difficult aspect to scale. So it is apparent the additional memory probably contributes to performance significantly.
The original HP 2xX5570 publication has been withdrawn. Three results were later published, all with 2 x Xeon 5570 processors, 144GB memory. One is with the 4 Fusion IO drives, another uses 12 SATA SSD drives spread across 3 SAS controllers, and the third with 100 SAS hard disk drives.
|12 SATA SSD
|4 FusionIO SSD
The chart below shows the TPC-H power results on the 22 individual queries for the SATA SSD and FusionIO SSD relative to the SAS HDD. Most queries are either indentical or very close. The SSD is much faster for certain queries, but slower in others.
The two queries where SSD is much faster are 2 and 12. The execution plan for these at SF10 show nested loops joins to the medium tables ORDERS and PARTSUPP (I do not have the execution plan for SF 100 handy). HDD is faster on Q9, which shows scans and hash match joins between several tables.