Intel launched the Xeon 5600 series (Westmere-EP, 32nm) six-core processors on 16 March 2010 without any TPC benchmark results. In the performance world, no results almost always mean bad or not good results. Yet there is every reason to believe that the Xeon 5600 series with six-cores (X models only) will performance exactly as expected for a 50% increase in the number of cores at the same frequency (as the 5500) with no system level bottlenecks. The expectation is that a six-core Xeon 5600 should provide 30%+ improvement over the comparable quad-core Xeon 5500 in throughput oriented tests, particularly OLTP type workloads. Single stream parallel execution plans will probably show less gain, as scaling via parallelism is not a simple matter.
Then two weeks later on 30 March 2010, Intel launched the Xeon 7500 series 8-core processors for 4-way+ systems (and the Xeon 6500 for high-end 2-way systems) with TPC-E results on 4-way and 8-way systems but no TPC-H results. The TPC-E results were exactly what Intel said it was going to be last September at IDF, 2.5X over the previous generation Xeon 7400 series and 2.5X over the contemporary 2-way Xeon 5500 series.
My guess is that Intel wanted it to be clear that the 4-way Xeon 7500 achieved the stated performance objectives of 2.5X over the 2-way Xeon 5500, just in case some slide-decks did not mention which 2-way system the 2.5X claim referred to. Of course, the Intel statement of 2.5X for Xeon 7500 was most probably made with performance measurements already run on proto-type systems. It was probably also felt that the Xeon 5600 series is such a natural choice to supersede the 5500 series that TPC benchmarks were not essential, as there were sufficient other benchmarks to support the claims.
Benchmark Omissions
Earlier, I had commented about benchmark omissions from the quad-core generation on. Below is a summary of processors and systems for which TPC results are published. The Intel Xeon 7500 Processor Product Brief shows 3.03X relative to 7400 for OLTP Brokerage Database, which is TPC-E, but 2022 over 729 is 2.77X. (One of us is on medication.)
Updated 2010-05-12
Processor Architecture Process |
TPC |
2-way |
4-way |
8-way |
16-way |
Core2 65nm Xeon 5300 QC 7300 QC |
TPC-C TPC-E TPC-H |
251,300 5160 only 17,686@100 |
407,079 479.51 34,990@100 |
841,809 804.0 46,034@300 |
- 1,250.0 - |
Barcelona 65nm QC |
TPC-C TPC-E TPC-H |
- - - |
471,883 - - |
- - 52,860@300 |
- - - |
Core2 45nm Xeon 5400 QC 7400 SC |
TPC-C TPC-E TPC-H |
275,149 317.45 - |
634,825 729.65 - |
Linux DB2 1,165.56 - |
- 2,012.8 (R2) 102,778@3T |
Shanghai 45nm QC |
TPC-C TPC-E TPC-H |
- - - |
579,814 635.4 - |
- - 57,685@300 |
- - - |
Istanbul 45nm SC |
TPC-C TPC-E TPC-H |
- - - |
- - - |
- - 91,558@300* |
- - - |
Nehalem 45nm Xeon 5500 QC 7500 8C |
TPC-C TPC-E TPC-H |
661,475† 850.0 51,086@100 |
1,807,347 2,022.64 - |
- 3,141.76 162,601@3TB |
- - - |
Westmere 32nm Xeon 5600 SC 7600 ?C |
TPC-C TPC-E TPC-H |
803,068 1,110.1 |
future future future |
future future future |
future future future |
Magny-Cours 45nm 12C |
TPC-C TPC-E TPC-H |
705,652 887.4 71,438.3@100G |
1,193,472 1,464 107,561@300G |
future future future |
future future future |
* and SF 1TB report
† Xeon W5580 3.2GHz, versus X5570 2.93GHz for other Xeon 5500 results
In brief, the Intel Core 2 architecture processors were avoiding comparisons against AMD Opteron in TPC-H, except for the 16-way Unisys system, for which there is no comparable Opteron system.
Opteron on the other hand, avoided comparison with Core2 architecture in 2-way systems and TPC-C/E OLTP benchmarks across the board. In the 2-way systems, the Intel old-FSB technology was still adequate, and the powerful Core2 architecture core was enough to beat a 2-way Opteron. There were respectable 4-way TPC-C and TPC-E results for Shanghai. When AMD announced the HT-Assist feature in Istanbul, one might have thought AMD was finally going to be able compete in 4-way OLTP. But there have been zero benchmarks published as of current.
When the 2-way Intel Xeon 5500 processor, based on the Nehalem architecture, came out in early 2009, outstanding results were published for both the OLTP oriented TPC-E and DW/DSS oriented TPC-H. In February 2010, a TPC-C was published as well, even though Microsoft had previously said all new OLTP benchmarks were going to be TPC-E. This result was with SQL Server 2005 for some reason.
There was every expectation with the Xeon 7500 Nehalem-EX, that there would be both OLTP and DW/DSS benchmark results, as Xeon 7500 should produce world-class (and world-record) results in both. It is possible that performance problems were encountered in trying to achieve good scaling over 32-cores and 64-threads in a 4-way Xeon 7500 system. If this is identified as something that can be fixed in the Windows operating system or SQL Server engine, then a change request would be made. I seriously doubt that another processor stepping would be done for this, as Xeon 7500 is already D-step at release.
TPC-H Scaling
It is also quite possible Intel will have to face the fact that 2.5X over the 2-way Xeon 5500 TPC-H SF100 result of 51,000 QphH is not going to be achieved no matter how good Xeon 7500 is at DW. This is because the TPC-H scores is a geometric mean of the 22 queries. There are several small queries in TPC-H, two of which already run in under 1 seconds on the 2-way 8-core Xeon 5570 for SF100, and several that run near or under 2 seconds. There is limited opportunity to continue to improve the performance of small queries with increasing degree of parallelism, as the overhead to setup each thread becomes larger compared to the actual work done be each thread, especially if one also has the give up frequency, dropping from 2.93 to 2.26GHz. It would be helpful to know what the actual frequency is during a performance run with the turbo-boost feature.
It is possible that some marketing putz does not understand this and denied permission to publish perfectly good Xeon 7500 TPC-H results because it did not meet the 2.5X goal. (Along with making a negative ranking and review entry for the person responsible for TPC-H benchmarking due to failing to achieve the 2.5X goal. But lets not grind axes on here. Besides, who said life was fair? It takes exceptional talent to accomplish the impossible. A clever person anticipates impossible problems, and transfers to another group to avoid a sticky wicket).
Achieving 2.5X in the big queries is a more meaningful goal. Achieving 50% better than the 8-way Opteron 6-core TPC-H SF300 or SF1TB would also be a worthwhile accomplishment, if Xeon 7500 were upto the task.
TPC-E Scaling
Finally, a quick comment on Xeon 7500 scaling from 4-way (32-cores, 64-threads) to 8=way (64-cores, 128-threads). In the past, achieving 1.5X scaling with this number of cores would have been a triumph. Given the announcement Microsoft made on Windows Server 2008 R2, on removing the thread scheduler and other impediments to high-end scaling, we were expecting 1.7X scaling. It could be that scaling beyond 64-threads in tricky, because of the 64-thread limit per group(insert correct terminology). Hopefully the 4-way to 8-way to 16-way scaling will improve over time as problems are solved one at a time, while the task master whips his/her draft horses (again, I digress).
Intel Xeon 5600 (Westmere-EP) and 7500 (Nehalem-EX) SKUs
Lets take a look at the Xeon 5600, 7500 and 6500 SKUs. The low-voltage, low power SKUs are omitted. These are fine products for high-density environments, web servers, and utility database. The Line-of-business and DW databases should be on the X models.
Xeon 5600 SKUs
| Model |
Cores |
Threads |
GHz |
L3 |
QPI GT/s |
Memory |
Price* |
| X5680 |
6 |
12 |
3.33 |
12M |
6.4 |
1333 |
$1,663 |
| X5670 |
6 |
12 |
2.93 |
12M |
6.4 |
1333 |
$1,440 |
| X5660 |
6 |
12 |
2.80 |
12M |
6.4 |
1333 |
$1,219 |
| X5650 |
6 |
12 |
2.66 |
12M |
6.4 |
1333 |
$996 |
| E5640 |
4 |
8 |
2.66 |
12M |
5.86 |
1066 |
$774 |
| E5630 |
4 |
8 |
2.53 |
12M |
5.86 |
1066 |
$551 |
| E5620 |
4 |
8 |
2.40 |
12M |
5.86 |
1066 |
$387 |
| X5677 |
4 |
8 |
3.46 |
12M |
6.4 |
1333 |
$1,693 |
| X5667 |
4 |
8 |
3.06 |
12M |
6.4 |
1333 |
$1,440 |
* Intel 1k pricing
Xeon 7500 SKUs
| Model |
Cores |
Threads |
GHz |
L3 |
QPI GT/s |
Memory |
Price* |
| X7560 |
8 |
16 |
2.26 |
24M |
6.4 |
1066? |
$3,692 |
| X7550 |
8 |
16 |
2.00 |
18M |
6.4 |
? |
$2,729 |
| E7540 |
6 |
12 |
2.00 |
18M |
6.4 |
? |
$1,980 |
| E7530 |
6 |
12 |
1.86 |
18M |
5.86 |
? |
$1,391 |
| E7520 |
4 |
8 |
1.86 |
18M |
4.8 |
? |
$856 |
Xeon 6500 SKUs
| Model |
Cores |
Threads |
GHz |
L3 |
QPI GT/s |
Memory |
Price* |
| X6550 |
8 |
16 |
2.00 |
18M |
6.4 |
? |
$2,461 |
| E6540 |
6 |
12 |
2.00 |
18M |
6.4 |
? |
$1,712 |
| E6510 |
4 |
8 |
1.73 |
12M |
4.8 |
? |
$744 |
Before commenting, recall the main differences between the Xeon 5600 and Xeon 7500/6500 series. The Xeon 5600 series (32nm process) has 2 QPI links and 3 memory channels. The Xeon 7500 series (45nm process) has 4 QPI links, 4 memory channel, larger cache per core (for the 24M version, 3M vs 2M) plus extensive reliability features. The 2 QPI links on the 5600 series allows a 2-way (socket) system. The 4 QPI links on the 7500 series allows glueless 4-way and 8-way. My understanding is the 6500 series is the 7500 with only 2 QPI links enable for 2-way systems with 16-cores and 8 memory channels total, at lower frequency than the 5600 with 12-cores and 6 memory channels total, plus the 7500 RAS features.
Intel Xeon 5600 (Westmere-EP) and 7500 (Nehalem-EX) Systems
Now lets looks at system pricing for the 2-way Dell PowerEdge T710 (Xeon 5600), R810 (either 7500 or 6500) and the 4-way R910 (7500). All systems with redundant power supplies, 2x73GB 15K 2.5in drives, 6Gb/s SAS. 4 power supplies in the 4-way
Dell PowerEdge T710 Systems with 2 Xeon 5600 processors
| System |
Processor |
GHz |
Cores |
L3 |
QPI |
- |
Memory |
Price |
| T710 |
X5680 |
3.33 |
6 |
12M |
6.4 |
1333 |
72GB 18x4G |
$9,974 |
| T710 |
X5660 |
2.80 |
6 |
12M |
6.4 |
1333 |
72GB 18x4G |
$8,634 |
| T710 |
X5650 |
2.66 |
6 |
12M |
6.4 |
1333 |
72GB 18x4G |
$8,154 |
| T710 |
E5640 |
2.66 |
4 |
12M |
5.86 |
1066 |
72GB 18x4G |
$7,474 |
| T710 |
E5630 |
2.53 |
4 |
12M |
5.86 |
1066 |
72GB 18x4G |
$6,934 |
For some reason, Dell does not offer the T710 with the second from top X5670 2.93GHz.
Dell PowerEdge R810 Systems with 2 Xeon 7500 or 6500 processors
| System |
Processor |
GHz |
Cores |
L3 |
QPI |
- |
Memory |
Price |
| R810 |
X7560 |
2.26 |
8 |
24M |
6.4 |
1066 |
64GB 16x4G |
$17,866 |
| R810 |
X7542 |
2.66 |
6 |
12M |
5.86 |
? |
64GB 16x4G |
$13,366 |
| R810 |
X6550 |
2.00 |
8 |
18M |
6.4 |
1066 |
64GB 16x4G |
$13,066 |
| R810 |
E7540 |
2.00 |
6 |
18M |
6.4 |
1066 |
64GB 16x4G |
$12,166 |
| R810 |
E6540 |
2.00 |
6 |
18M |
6.4 |
1066 |
64GB 16x4G |
$11,496 |
Dell PowerEdge R910 Systems with 2 out of 4 sockets populated, Xeon 7500
| System |
Processor |
GHz |
Cores |
L3 |
QPI |
- |
Memory |
Price |
| R910 |
X7560 |
2.26 |
8 |
24M |
6.4 |
1066 |
64GB 16x4G |
$19,246 |
| R910 |
X7550 |
2.00 |
8 |
18M |
6.4 |
1066 |
64GB 16x4G |
$16,446 |
| R910 |
E7540 |
2.00 |
6 |
18M |
6.4 |
1066 |
64GB 16x4G |
$13,546 |
| R910 |
E7530 |
1.86 |
6 |
18M |
5.86 |
980 |
64GB 16x4G |
$12,446 |
Dell PowerEdge R910 Systems with 4 Xeon 7500 processors
| System |
Processor |
GHz |
Cores |
L3 |
QPI |
- |
Memory |
Price |
| R910 |
X7560 |
2.26 |
8 |
24M |
6.4 |
1066 |
128GB 32x4G |
$34,040 |
| R910 |
X7550 |
2.00 |
8 |
18M |
6.4 |
1066 |
128GB 32x4G |
$28,440 |
| R910 |
E7540 |
2.00 |
6 |
18M |
6.4 |
1066 |
128GB 32x4G |
$22,640 |
| R910 |
E7530 |
1.86 |
6 |
18M |
5.86 |
980 |
128GB 32x4G |
$20,440 |
Previously, I had argued that processors and systems today were so powerful that the standard practice of buying 4-way systems for critical database server by default be changed to 2-way. What I mean by default is in lieu of proper system sizing analysis.
It may seem strange that I suggest not doing a proper sizing analysis (one of my services as a consultant). But from the sizing analysis I have seen done by other people, the quality of the work was poor and the effort cost more than a pair 4-way systems.
What this means is that the practical solution used to be to buy a 4-way system. Try it out. If it not sufficient, then hire someone (there are many people who can do this) to make it work on a 4-way. If that does not work, consider pruning features until it does work.
So why not just move up to an 8-way or larger system? Because 8-way and larger are mostly NUMA systems. Technically, all Opteron 2-way and up are NUMA. But by NUMA, I really mean systems where there is a large discrepancy between local and remote node memory access. There are very very few people who can do performance analysis on a NUMA system (not those who claim to be able to). Do a search on SQL NUMA to see who has published meaningful material on this matter.
Default System Choice: Intel Xeon 5600
Anyways, the default choice today should be a 2-way system. However, since this is critical system, perhaps there are features from the high-end that we want. I believe this is the rational for the Xeon 6500 from Intel, and the PowerEdge R810 from Dell.
In looking over the T710, R810 and R910, I am inclined to say the effort was not entirely successful, as with many first iterations. The effort definitely deserves merit, and is the proper direction for the future. But it just needs further refinement. Of course, the true measure whether people actually buy the R810 in volume, not just one persons opinion.
The R810 with either X7560 or X6550 just gives up too much frequency for the extra 2 cores per socket, and fourth memory channel. Some environments might want the X7500/6500 RAS features despite this. And there is only a $1400 price difference between the R810 and R910 with 2 sockets populated.
The amount of $1,400 is very small for having two extra sockets available, even though most people never populate sockets after system purchase. It would be nice if you could buy the R910 with 4-sockets populated, but not have to pay the per-socket software licensing until they are turned-on, like in RISC world. (In RISC world, you don't 'pay' for the $25K+ processors until they are activated either. I do not think this is necessary for the Intel Xeon 7500 at $5K each.)
True, the R810 is a 2U form factor compared with 4U for the R910, allowing much higher density. But the assumption was this is a critical database server, for which an extra 2U is not a show stopper. (There are people who get hung up on the latest industry jargon/fads, and forget the job one is making sure your business in running.)
Late Addition - AMD Magny-Cours
AMD Opteron 6176 (Magny-Cours) 2-way 12-core results have been just published, with the HP ProLiant DL385G7. I will add more detail later. The 2-way TPC-E result is 887.38 and the TPC-C result is 705,652. Interestingly, both the HP ProLiant DL370G6 with the Xeon W5580 and the DL385G7 Opteron TPC-C results are on SQL Server 2005. Perhaps the Microsoft mandate to use TPC-E is for SQL Server 2008, hence the C on 2005 was allowed? Also of interest is that the Opteron 6176 TPC-C result uses 125 SSDs instead of hard disks (1300 HDs in the Xeon W5580 result).
Before comparing the Opteron 12-core with Xeon 5500, let us first compare against the previous generation Xeon 5400 quad-core. The 2-way 12-core Opteron 6176 achieved OLTP results higher than the Xeon 5460 by 2.5X on TPC-C and 2.8X on TPC-E. These are very good results for a 3X increase in the number of cores. Now in comparing against the quad-core Xeon 5500 series, the 12-core Opteron is just marginally higher. I am inclined to think much of this is due to the Hyper-Threading capability in the Xeon 5500 series. HT was much maligned in the NetBurst architecture generation. Some people today still blindly regurgitate the advice to disable HT, not realizing this advice was applicable to the old NetBurst and not the new Nehalem architecture processors. At some point AMD may have to admit that implementing HT will be a necessity.
The price for the DL385G7 with 2x6176 processors from the TPC-H report is $1,511 for the system chassis, $1,799 for each processor, $990 for each 8GB kit, and perhaps another $1K for comparable configuration as above. This is very reasonable, except for the memory which seems high. Each 8GB kit should be around $500.
Magny-Cours is comprised of two six-core Istanbul die(?) each with 4x0.5 L2 cache and 6M L3. The Istanbul die size is 346mm2, versus 540 684mm2 for Nehalem-EX with 8-cores and 24M L3. The images below were adjusted to match the die size closely, but there is no assurance that the aspect ratios are correct.
Note: For some reason, I thought I saw Nehalem-EX die size listed at 540mm2. The Intel press release actual says 684mm, so the scaling below is more appropriate:
Late Addition - Dell PowerEdge R910 TPC-E result
Dell published a 4-way Xeon 7560 TPC-E result on their PowerEdge R910. In comparison with the result for the IBM x3850 X5:
IBM: 2,022.64 tps-E, 1024GB Memory, 1014 300GB 15K HDD, $999K tot cost
Dell: 1,933.96 tps-E, 512GB Memory, 576x146GB+480x73GB 15K HDD, $635K tot cost
Both systems are on Windows Server 2008 R2 EE. The IBM system is also on SQL Server 2008 R2 EE, while the Dell is on SQL Server 2008 EE. R2 is about $9.5K more the RTM per processor socket for a total contribution difference of $38K. The IBM system uses 64x16GB DIMMs at $2K each compared with 64x8GB on the Dell at $500 each for a total difference of $96K.
Both systems have over 1000 disk drives. IBM prices that 300GB 15K 3.5in drives ($709ea), while Dell employs a mix of 73GB ($329ea) and 146GB ($479ea) 15K 2.5in drives. Also, the IBM storage enclosure are $268K vs $120K for Dell. Technically, the 300GB drives are not necessary to meet TPC-E requirements, so this difference should not be be considered in comparing the results.
It is unclear whether SQL Server 2008 R2 has any advantages over 2008 SP1, and whether the cost increase is justified. The performance difference of 5% between the IBM and Dell systems could be explained by the 2X difference in memory. The 16GB DIMMs are $125 per GB versus the 8GB at $62.50 per GB.
Update 2010-05-12
HP has published TPC-C and TPC-E results for the 2-way AMD Opteron 6176 12-core 2.3GHz and for the Intel Xeon X5680 6-core 3.3GHz. The Xeon 5680 score 13.8% higher in TPC-C and 25.1% higher in TPC-E. The individual physical core in Westmere are faster than the Opteron core based on SPEC CPU 2006 Integer base (adjusted to exclude parallel components). There is no meaning to compare frequency between completely different processor architectures.