Both Opteron TPC-H systems have 256GB, 206 disks, and 7 disk controllers. The Xeon system is on S2K5, and the Opteron results are S2K8 using the date data type in place of datetime, saving 12 bytes per row on LineItem. Page compression is also enabled, so the entire 300GB Lineitem + indexes and other tables essentially fits in memory, so the Opteron and Xeon results are not properly comparable.
The performance gain from Barcelona to Shanghai is 9% on the combined QphH, 11.7% on power, and 6.6% on throughput.
The gains can be divided between increased cache size, frequency and core improvements.
TPC-C is known to benefit from large cache, so the majority of the gain from Barcelona to Shanghai is probably due cache. The Xeon 7460 has the best performance due to the combination of additional cores and the very large L3 cache.
TPC-H is known to be essentially cache size independent. So the fact that TPC-H improves by more than clock frequency says some gain is due to core improvements. For an 8% frequency increase, an expected performance of 5-6% is reasonable.
So 3-4% can probably be attributed to core improvements (see below). There is no published TPC-H result for the Intel six-core Dunnington. This is probably an indication that the Intel FSB architecture cannot properly feed 24 cores with a single MCH. The Intel Core 2 significantly out performs Opteron at this single core level (non-parallel large queries), but Opteron catches up at high degrees of parallelism with better memory bandwidth (more memory channels).
There is still no SPEC CPU Integer 2006 Base for Shanghai (there is for SPEC CPU rate, but I still like looking at the non-rate). Core i7 results look decent, though not spectacular. A 10% gain from QX9770 3.2GHz to i7 3.2GHz. The goal of each tick and tock is 40%, but this is very difficult to do on SPEC INT. Hopefully we will see Core i7 (45nm) at 3.5-3.8GHz in 2009.
Prior to Shanghai launch, AMD made noise that Shanghai would have 20-30% better performance than Barcelona at the same clock. In a complex processor architecture, it would not be unusual that a minor design mistake, when corrected, yield a significant performance gain on a specific operation, but may yield only a modest gain on a complete suite of operations. AMD was vague as to whether the 20-30% was specific or broad. Anyways, I just talked to the person who did the AMD results. In the past he had made subtle contributions to achieving best in class performance. Between the Barcelona and Shanghai results, there was a minor change that contributed 2%. So this would reduce the contribution from any core improvements in Shanghai.
On Scotts comments, there are only TPC-E results for Intel Xeon and Itanium, one major vendor say they have done TPC-E for Opteron but has elected to no publish. Let me just say benchmarking is a viscious world where second place is for losers, even if its by just 1%. Oracle and DB2 refuses to publish. Anyways:
4 x X7460 Six-core (24 cores) 2.66GHz, 128GB 729.65 tpsE (IBM x3850 M2)
12 x X7460 Six-core (64 of 72 cores used) 2.66GHz, 384GB, 1,400 tpsE (NEC 5800)
I am glad NEC decided to enter a Xeon for big iron. There is a role for big boxes even though they require special tuning skills. They had an Itanium, but Intel is far behind the ball in getting Itanium architecture and manufacturing process current. There is no way a 90nm processor (Montvale) can compete with 45nm X86/64. Even the upcoming 65nm Tukwila will probably be short, more so once the big Nehalem 8-core server variant arrives. I am here a PASS now, but have not had a chance to bug NEC on details of their 5800 architecture.
The SPEC CPU mentioned in the comment below are SPEC int_rate, which is a through-put test. I want the SPEC CPU Integer base (no special complie flags, substituting hand coded assembly for compiler generated code). For reference (I will check if the Xeon 5460 is on a server or workstation platform)
Dell has now published Shanghai SPEC CPU integer results
X5460 3.16GHz 2x6M, 25.3
X7460 2.66GHz, 16M L3, 21.7
Opteron 8360 2.5GHz 14.4
Opteron 8384 2.7GHz 16.9
Core i7 965 3.2GHz 30.2
The reason I say SPEC CPU integer base is important is that is a reasonable predictor of standard non-parallel execution plans for a wide range of SQL operations. Many people have a favorite query they like to run to evaluate new systems. So even though the Opteron can achieve very good results on the 8-way for well tuned parallel plans (and spec int rate), single threaded operations fall far short of the Core 2 based Xeon. So it is important to test single thread, through-put and parallel plan performance separately.