THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Joe Chang

Benchmark Omissions for the Six-Core Intel Xeon AMD Opteron Processors

To date, no 4-way or 8-way TPC-H data warehouse benchmark result has been published for the six-core Xeon X7460 and no TPC-C or TPC-E OLTP benchmark result has been published for six-core Opteron. Usually, the absence of published results means the results are not competitive, in one manner or another.

TPC-C, E and H results were published for the previous generation quad-core Intel Xeon X7350. TPC-C and TPC-E results were published for the follow-on six-core Xeon X7460, but there are no 4-way or 8-way TPC-H results. Unisys did publish a 10TB TPC-H result their 16-way ES7000 with the Xeon 7460, but there is no simple way to compare this with 4-way or 8-way results at 100 or 300GB scale factors.

There are very impressive 4-way quad-core Opteron 8384 2.7GHz TPC-C and TPC-E results of 579,814 tpm-C and 635.43 tpsE respectively, but not for the six-core Opteron. For TPC-H, there is a series of 8-way results at scale factor 300GB for the quad-core and six-core Opteron processors, though curiously no 4-way results for the Opteron after dual-core 8220.

One suspected reason for the lack of 4-way or 8-way TPC-H results on the Intel Xeon X7460 is that it cannot achieve meaningful performance gains over the quad-core Xeon X7350. The large 16M L3 cache on the X7460 helps on high-call volume (>10,000 RPC/sec) benchmarks like TPC-C and TPC-E, but not in the high-row count TPC-H queries with parallel execution plans, where the lower 2.66GHz frequency is also a liability.

In the dual-core era, 4-way Intel Xeon (with the Pentium 4 based NetBurst core) and AMD Opteron systems were very close on TPC-H. It might be that the 4-way quad-core Opteron processors were not competitive with the Xeon 7300 series (Core2 architecture cores) so no TPC-H results were published. The quad-core Opteron was very competitive, significantly better even, than the Xeon 7350 without a large shared cache, in TPC-C and TPC-E. The Opteron architecture does not need as large a cache as the Xeon, but does benefit from the large 6M L3 cache in Shanghai compared with the 2M L3 cache in Barcelona.

TPC-H results were published for quad-core and six-core Opteron in the HP ProLiant DL785 8-way systems at scale factor 300GB. Previously, IBM had published an 8-way SF 300 TPC-H result for the Xeon X7350. The first 8-way Opteron quad-core had a better result, so it is possible that the Xeon bus architecture could not scale to 8-way for DW type workloads.

Significantly, the 8-way six-core Opteron 8389 2.8GHz shows a very significant TPC-H performance gain over the quad-core Opteron 8384 2.7GHz (91,558.2 QphH@300GB versus 57,684.7), more than would be suggested by the 50% increase in the number of cores and nominal frequency increase, as scaling is less than linear. Presumably, this should be attributed to micro-architecture improvements between Shanghai and Istanbul.

The most significant improvement cited by AMD is HT-assist, which is essentially a snoop filter for maintaining cache coherency in the Hyper-Transport architecture. Now ever since the Opteron with integrated memory controller and HT introduction, AMD crowed about how memory and inter-processor bandwidth scaled with the number of processors, unlike the Intel architecture, where memory and inter-processor bandwidth was bottlenecked by the front-side bus.

Well AMD neglected to mention that their scalable bandwidth was also offset by increased inter-processor communication to maintain cache-coherency (see the article by Johan de Dela on Anandtech http://it.anandtech.com/IT/showdoc.aspx?i=3571). So now that AMD has the snoop filter capability and a very good TPC-H result for the Opteron with HT-assist, why is there not any published TPC-C or TPC-E OLTP benchmark results?

Note that Intel had difficultly with Snoop Filter in the 5000P/X chipset. The snoop filter improved some benchmarks, and cause degradation in others. So it would be no surprise if it takes or two generations to work out the issues. The expectation is that AMD will need to work out these issues if Magny-Cours is expected to compete with Nehalem-EX systems.

For Intel, the lack of competitive Xeon 7400 series DW benchmark results will be a moot point once the next-generation Nehalem-EX systems becomes available.

Anyways, these are my suspicions. System vendors are welcome to refute any of my opinions by publishing results. I was supriseb by the 2-way Xeon 5500 Nehalem TPC-H results.

Published Sunday, September 13, 2009 5:23 PM by jchang

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

No Comments

Leave a Comment

(required) 
(required) 
Submit

About jchang

Reverse engineering the SQL Server Cost Based Optimizer (Query Optimizer), NUMA System Architecture, performance tools developer - SQL ExecStats, mucking with the data distribution statistics histogram - decoding STATS_STREAM, Parallel Execution plans, microprocessors, SSD, HDD, SAN, storage performance, performance modeling and prediction, database architecture, SQL Server engine

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement