THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Joe Chang

Solid State Drive versus Memory, TPC-H Nehalem

A recently published TPC-H benchmark result, along with previously published results provide some insight into to the relative impact of SSD to disk drives, and SSD versus system memory. The TPC-H configurations are shown below. All results are at SF 100. The first two are on Windows Server 2003 sp1, and SQL Server 2005 sp1 and sp2 respectively use processors based on the Intel Core 2 architecture and HDD storage. The two recent results are both on Windows Server 2008 and SQL Server 2008 sp1 using the Intel Xeon 5570 processors (Nehalem architecture) and Fusion-IO SSD storage.

System OS SQL Processors Memory Storage
Dell PE2900 W2K3 sp1 2005 sp1 2 Xeon 5355 48GB HDD
HP DL580G5 W2K3 sp1 2005 sp2 4 Xeon 7350 128GB HDD
Dell T610 W2K8 2008 sp1 2 Xeon 5570 48GB SSD
HP DL380G6 W2K8 2008 sp1 2 Xeon 5570 144GB SSD

The system configurations above are not the best to make this comparison. Ideally we would like to be able see the TPCH SF 100 result for Dell T610 with HDD storage. This would provide two systems identical except for the storage, and two similar systems that differ in system memory with the large memory configuration sufficient to place the entire database in the buffer cache. But we can make estimates for the expected differences between the two processor architectures based on available information.

First a quick word about TPC-H. Scale Factor 100 means the LINEITEM table data (leaf level) should be 100GB, or did when the table definition employed the 8 byte DATETIME data type. With the 4 byte DATE type employed in SQL Server 2008, the SF 100 LINEITEM table drops to 90GB. The full TPC-H SF100 database with all tables and indexes was approximately 170GB in SQL Server 2005, decreasing to 150GB with the DATE type. With compression, the database size probably fits in the buffer cache with SQL Server maximum memory set to 135GB out of the system memory of 144GB.

TPC-H is comprised of 22 scored queries, and has two test sequences. The Power sequence runs the 22 queries sequentially using a single stream, which is essentially a test of the database engine parallel execution plan capability. The Throughput sequences run multiple concurrent streams. All systems in this group use 5 streams in the throughput test at this SF. It would be helpful to see the Power sequence query time at degree of parallelism 1, 2, 4, 8 and so on to the number of physical and logical cores, but this is not part of the TPC-H. The top level results are shown below.

Processor Memory QphH Power Throughput
2 Xeon 5355 48GB 15,723.9 20,587.9 12,009.1
4 Xeon 7350 128GB 34,989.9 46,708.6 26,211.3
2 Xeon 5570 48GB 28,772.9 38,550.7 21,475,1
2 Xeon 5570 144GB 51,110.6 72,134.0 36,313.6

 

transition Memory QphH Power Throughput
2x 5355 to 2x 5570 48GB 83% 87% 79%
2x 5570 to 4x 7350 48-128 21.6% 21% 22%
2x 5570 48-144 78% 87% 69%

 

The T610 with 2 Xeon 5570 result is very impressive, being 83% higher than the 2-way Xeon 5355 quad-core 2.66GHz Core 2 architecture (8 physical cores on both systems) with other difference being the SSD storage. The T610 is just 21% lower than the 4-way Xeon 7350 system with 16 Core 2 architecture cores and 128GB memory. The typical TPC-H storage configuration with hard drives can deliver substantial IO bandwidth, with 4-8 PCI-E SAS controllers and 80-200 disk drives. The SSD configuration, however, can also deliver spectacularly more pseudo-random small block IOPS than 200 disk drives and at much lower latency.

The DL380G6 with 144GB memory is even more impressive scoring 78% higher in the composite QphH score, 87% in Power and 69 % higher in overall than the T610 with 48GB.

The chart below the run time for each of the 22 TPC-H queries in the Power sequence (stream 0) on the Dell T610 with Xeon 5570 processors and SSD storage relative to the Dell PowerEdge 2900 with Xeon 5355 processors and HDD storage.

T610 vs PE2900

We can use the SPEC CPU 2005 integer benchmark to make a rough estimate for the difference in the processor performance between the Core 2 and Nehalem architectures. The SPEC CPU 2005 integer for the Xeon 5355 (2.6GHz) is 19.4 and for the Xeon 5570 31.5, with a very large gain in 2 components, which means some of the gain should be de-rated due to the nature of SPEC CPU where the very latest compiler improvements are incorporated. The 60% difference in performance implies that the execution time of the faster processor should be approximate 0.625 times that of the baseline processor. Since the reduction in query time is visibly better than 0.625 for many queries, we can attribute this to the SSD.

The chart below shows the TPC-H Power sequence query times for the DL380G6 with 144GB memory relative to the T610 with 48GB. Five queries, 1, 9, 13, 15 and 16, show only minor differences with query 13 actually being 15% faster with less memory, and another two, 3, 14, show 20% gain for the extra memory. Nine queries show gains of 2.5X or more and another shows 2X gain. So overall, there is a significant benefit to having data in system memory, i.e., buffer cache, even with the extraordinarily high IOPS and low latency of SSD storage.

144 vs 48GB memory

The reason for this is that the CPU cost for the operating system to perform an IO operation, regardless of whether this is to disk or SSD storage devices, and then to evict an existing page in the buffer cache is high relative to the cost of many common SQL operations. See IO Cost Structure (or my blog on SQLblog) for more on this. The duration or elapsed time on disks drives is much higher than SSD, but the CPU cost cannot be evaded.

 

While the published performance data is not ideal, needing a result for a 2-way Xeon 5570 with 48GB memory and HDD storage for SQL Server 2008, the available results are sufficient to indicate that SSD can significantly improve performance relative to HDD even in data warehouse type applications, and the system memory is still best for database performance.

 

The chart below shows the stream 0 (Power) runtime for the 16-core (4xXeon 7350 Core 2) 128GB and hard drives, the 8-core (Xeon 5570 Nehalem) with 48G and SSD, and the 8-core (X5570) with 144GB and SSD relative to the 8-core (Xeon 5355 Core 2) 48GB memory with hard drives. In query 3, the 16-core with 128G is actually 20% slower than the 8-core with 48GB. This goes to show parallel execution plans can be very tricky. Without thorough investigation of how each query behaves at degree of parallelism 1, 2, 4, 8, 16 and so on, nasty surprises can and do occur. Big-iron system can be very useful, but a simple migration from a 4-way system can result in minimal or negative performance change. With expert investigation and troubleshooting, very respectable performance gains can usually be achieved.

HDD SDD Memory

For the majority of the queries, the 16-core 128G memory system has double or more improvement relative to 8-cores 48GB memory. Doubling the number cores alone should yield 30-40 reduction in query time as the gain is entirely due to the increase degree of parallelism from 8 to 16, which is a difficult aspect to scale. So it is apparent the additional memory probably contributes to performance significantly.

 

 

Update 2009-10-11

The original HP 2xX5570 publication has been withdrawn. Three results were later published, all with 2 x Xeon 5570 processors, 144GB memory. One is with the 4 Fusion IO drives, another uses 12 SATA SSD drives spread across 3 SAS controllers, and the third with 100 SAS hard disk drives.

Processor Memory QphH Power Throughput
100 HDD 144GB 50,738.4 67,712.9 38,019.1
12 SATA SSD 144GB 51,422.4 70,048.5 37,749.1
4 FusionIO SSD 144GB 51,085.6 72,110.5 36,190.8

The chart below shows the TPC-H power results on the 22 individual queries for the SATA SSD and FusionIO SSD relative to the SAS HDD. Most queries are either indentical or very close. The SSD is much faster for certain queries, but slower in others.

HDD SDD Memory

The two queries where SSD is much faster are 2 and 12. The execution plan for these at SF10 show nested loops joins to the medium tables ORDERS and PARTSUPP (I do not have the execution plan for SF 100 handy). HDD is faster on Q9, which shows scans and hash match joins between several tables.

Published Wednesday, August 12, 2009 1:06 AM by jchang
Filed under: , ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Cyborg said:

Thanks for the analysis. One issue not explored here is the difference in tempdb configuration for the SSD configurations. The Dell config puts some tempdb files (including the log) on FusionIO and some on spinning media, while the HP SSD config is purely on the FusionIO boards (except the tempdb log, which is on HDD). Consider the relative performance graph for Q3. If this query was memory bottlenecked, you'd see the "16c 128GB H" bar below 1.0. Now, with SSDs in the picture, somehow the HP config is now better than the Dell config. Q3 is a relatively large join that includes lineitem and sorts at the end, so tempdb config is pretty relevant. Seems like Dell misconfigured tempdb for their SSD result.

August 16, 2009 10:13 AM
 

Joe Chang said:

I am more inclined to think the Q3 anomaly is a parallelism issue than tempdb. The HP performance over the Dell is most definitely due to the 144G memory, not a Dell tempdb issue.

I am inclined to think the tempdb space is mostly for the index builds, I have never seem TPC-H consume particularly large amounts of tempdb space. Of course, this would also mean that Dell should have dropped the tempdec_4 and 5 files after the index build and there is no mention of this.

August 17, 2009 12:04 PM
 

Alex Kaplun said:

Hi Joe,

I am extremely impressed with your knowledge. We have a mid-size (1TB) sql server 2000 database that we are really interested in optimizing in terms of hardware and db configuration. If you do consulting and will be interested to work with us please contact me at akaplun@aveta.com with your info

Thanks a lot,

Alex

September 17, 2009 10:39 AM
 

Chris B said:

Which HP benchmark was that w/ the SSDs?  I see 2 (looks like maybe a re-try, one of them shows as being "In Review".

<a href=http://www.tpc.org/tpch/results/tpch_result_detail.asp?id=109082801>http://www.tpc.org/tpch/results/tpch_result_detail.asp?id=109082801</a>

The Dell used what looks like 4 FusionIO cards, although the price/order shows 6.  The HP system (at least the one referenced above) uses a dozen 60GB SATA SSDs spread across 3 entry/mid-level P410 controllers.  

After doing some analysis of one of our own servers (with behavior closest to TPC-H), shuffling LUNs around & aligning partions, I'm doing a bit of homework on SSDs.  With OCZ's Z-Drive offering, I'm hoping the prices continue to drop... although their 256GB SLC model should hit retail for something close to $1600, beating the FusionIO on size, price & throughput.  

What are your thoughts re: moving just TempDB to SSD storage?  Seems the W/R nature is many sequential writes, but if you have many concurrent threads, wouldn't they hit "random" locations within the TempDB files?  esp. if you have it spread across several files?

October 2, 2009 3:22 PM
 

jchang said:

The HP DL380G6 result with Fusion IO SSDs I discussed above was published on Aug 3, but appears to have been withdrawn, with a substituted report of Sep 4. The difference between the Aug 3 and Sep 4 reports appears to be 2 x 146G 15K drives for backup space, so perhaps the Aug 3 report did not meet TPC-H benchmark requirements.

In between, they published another report on Aug 27 with 12 SATA SSD.

Keep in mind, all the recent HP DL360G6 results are with 144GB system memory, and the TPC-H SF 100 database is approx 140GB total with all data and indexes. So the SSDs probably only contributed a little.

Right now, it is unclear to me just what the advantages of SSD over HDD are for DW type apps. The Fusion IO drives can do 700MB/s, while the duo drives can do 1.5GB/s. With PCI-E Gen 2 adapters, I can get 1.5GB/s on one rack of 12-15 HDD in pure sequential.

However, not all DW is table scans, so anytime there is a high row count loop join or key lookup, SSD wins.

I am also not certain the tempdb absolutely requires SSD.

Most SSDs now have wear leveling, meaning it will try to distribute the writes over the entire array for maximum life. But what happen if you first place statis data on 80% of the SSD? you can only do wear leveling over the remaining 20%.

The temp activity is not true random over the entire disk, meaning 2ms rotational latency plus 3.4ms seek. With multiple threads, the writes might be mixed, but its sector, skip, sector, meaning accesses are more like track-to-track seeks, so you will get more IOPS than with true random

October 3, 2009 9:42 AM
 

john said:

Can I kind of draw a conclusion that using SSD against using HHD will boost the performance by nearly a factor of 2? However, this does not sound the ultimate performance gain of electric seeking over mechanical seeking, thus what would you think the biggest drawback in preventing achieving higher performance? Is it the I/O bottleneck or what?

October 11, 2009 3:29 PM
 

jchang said:

Not really. It might roughly correct to say that upgrading from a 2 x Xeon 5355 to a 2 x Xeon 5570 will boost performance by approximately 2X, or moving from a 4 x 7350 to a 2 x 5570 with the same memory might be approximately equal.

On the HDD-SSD transition, if your system is bottlenecked on the disk system ie, your storage system can deliver 5K IOPS but needs 10K, and the SSD can do 70K, then you will get a 2X boost. If a specific query is bottlenecked on disk IO, then that query will benefit from SSD.

In the examples above, from 2 x 5355 w/HDD to 2 x 5570 w/SSD, the performance difference was mostly due to improved CPU and memory BW, and somewhat by SSD. The HDD had more than adequate sequential IO capability. However, some queries still generate pseudo-random IO, where the SSD have much better performance. So it depends.

What is more clear is that getting the entire DB in memory really helps (about 2X)

October 12, 2009 10:03 AM
 

Sam said:

Is it possible that in the scenarios above, The SSDs (both Fusion-io and the other SATA SSD) did not get to contribute all that they could have due the fact that the entire LINEITEM Table data (100GB or less) fits in RAM (144GB)?  It would be reasonable to assume that in most enterprise scenarios, the actual database size exceeds the total amount of RAM involved.  This implies that there will be page faults as normal part of operations.  So, then the question becomes, In which scenario are the page faults handled faster?  

a)     The RAM to spinning disks, or

b)     The RAM to SATA SSDs, or

c)      The  RAM to Fusion-io disks

I am inclined to say the Fusion-io solution will shine.  It is because their media connects to the server using the PCIe bus architecture that sits very close to the CPU and does not have to deal with the overhead of SAS/SATA protocols.  It bypasses the southbridge controller completely.

I would love to know what others think about this.  At any rate, what is required is that there be more TPC type tests be done with bigger data-sets which are a much more realistic approximation of the enterprise world.

October 27, 2009 1:52 PM
 

jchang said:

the LineItem table itself fitting in memory is not a big deal, HDDs properly configured, ie, direct attach, distributed over multile controllers, does just fine on sequential IO.

Notice the 2 queries where SSD is much better, Q2 & Q12, do not involve LINEITEN, and the execution plan may generate random IO (depending on the difference in execution plans between SF 10 and 100)

Of the 144GB system memory, 135GB is for SQL, which is just about the size of the entire tpch SF100 db with compression.

Disk IO still occurs because of tempdb for intermediate results.

Page faults should not occur because of this, because the SQL Server engine should just discard clean buffers, no page out require. Even is a page is dirty, it is not paged, it is written to orginal location.

Notice that Fusion has the best Power, while HDD has the best throughput, with the SATA SSD being middle in both.

The Fusion IO may seem to be better in theory, having fewer hops.

However, per Gunters point in the other blog on TPC benchmarks, the Fusion IO drivers is less CPU efficient, ie, it takes deep driver expertise and time to get it nearly perfect.

I do not want to say that people do not have sub-100GB DW, but it would be desirable for the 2-way systems to be tested at 300GB, and 4-way at 1000GB

October 28, 2009 11:55 AM

Leave a Comment

(required) 
(required) 
Submit

About jchang

Reverse engineering the SQL Server Cost Based Optimizer (Query Optimizer), NUMA System Architecture, performance tools developer - SQL ExecStats, mucking with the data distribution statistics histogram - decoding STATS_STREAM, Parallel Execution plans, microprocessors, SSD, HDD, SAN, storage performance, performance modeling and prediction, database architecture, SQL Server engine

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement