Fujitsu just published an astounding TPC-E benchmark result of 3,800 tpsE for their 8-way Xeon 7560 system, the Primergy RX900 S1. Fujitsu had previously published a TPC-E result of 2046.96 for their 4-way Xeon 7560 system, the Primergy RX600 S5. The new results shows 85.6% scaling from 4-socket to 8-socket.
Microsoft Windows Server 2008 R2 introduced core OS improvements in that not only increased the number of logical processors supported from 64 to ???, but removed many locks, including the dispatch scheduler lock. This improved high-end scaling (64 to 128 cores?) from 1.5X to 1.7X, based on tests with the HP Superdome and Itanium processors. At the time of this announcement, the Xeon 7500 processor were not yet available.
When the Xeon 7500 did become available in early 2010, the first TPC-E benchmarks were 2,022.64 and 3141.76 tpsE for the 4-way and 8-way Xeon 7560 systems respectively. The scaling from 4S to 8S was 1.55X, well below the expectation of 1.7X set by Microsofts 2008 R2 announcement. This was understandable as the 8-way result was probably rushed to alignment with product launch. Perfect benchmark results are ready on their own schedule, which is not always in time for marketing blitzes. (Of course, considering that the marketing budget may be paying for the benchmarks, it would be advisable to try really really hard to have a good result for product launch.)
There are two apparent differences between the new Fujitsu and original NEC 8-way Xeon 7560 TPC-E reports. One is the Fujitsu uses SSD while the NEC system used HDD storage. The SSD configuration yields much better average response times mostly in the Trade Lookup and Trade Update transactions, with a reductions from 50/56ms to 13/14ms respectively. In the 4-way Xeon 7560 TPC-E reports, the use of SSD over HDD yields 1% improvement
(My mistake, I should compare the Fujitsu RX900 2046.96 tps-E result with the Dell 1933.96 tpsE, both systems at 512GB for 5.8% performance gain attributed to SSD over HDD. The 1% gain was compared to the IBM at 2022.64 tpsE with 1TB memory and HDD storage).
The other difference is that the Fujistu system distributes network traffic over 6 GbE ports compared with 2 for the NEC system. There are 24 or so(?) RPC calls per TPC-E transaction, so the extra network ports might provide another minor improvment.
Nothing apparent can explain the 4S to 8S scaling improvement from 1.55X to 1.85X. This is certainly not impossible, as IBM figured out how to do this and better with their POWER4 line some years ago. At the time, I thought this was mostly the massive inter-processor bandwidth of the POWER4. Now it is more clear that the OS and database engine all contribute to nearly perfect scaling.
My thinking is that some one at Micrsoft has been watching the performance traces and finally figured out the most critical points of contention. (Such persons are always nameless to the outside world, as this would upstage more established egos.) So I believe this is a new build of Windows and SQL Server, but build numbers do seem to be obvious in the TPC reports, even though full disclosure is required. It is never some magic registry entry like Turbo Mode: ON.
ps The more serious impact of SSD may be evident in the Maximum response times, which ran as high as 68.7 seconds for Trade-Result on the NEC system with HDD, and topped out at 7.1 seconds on the Fujitsu system with SSD. I am thinking that having an open transaction for 68 sec can have serious repercussions on an OLTP system. Curious though the 4-way Fujitsu with SSD could not keep max response similarly low (18.53 sec on Trade-Lookup), while the 4-way IBM kept max response to 17 sec with HDD.
Transaction Response Times
The table below shows transaction response times, average and maximum for the 8-way NEC with HHD and the Fujitsu with SSD storage. The SSD storage system has better average response time, with the biggest impact in Trade-Lookup and Trade-Update. The reduction in maximum response is more dramatic in 6 of the 10 transactions.
Transaction Response Times, Average and Maximum
| |
Avg Response |
Max Response |
|
|
| System |
NEC A1080 |
Fujitsu RX900 |
NEC |
Fujitsu |
weight |
frames |
| Storage |
HDD |
SSD |
|
|
|
|
| Broker-Volume |
0.05 |
0.06 |
2.88 |
6.72 |
4.9% |
1 |
| Customer-Position |
0.02 |
0.05 |
43.55 |
3.49 |
13% |
2 |
| Market-Feed |
0.03 |
0.03 |
48.81 |
3.48 |
1% |
1 |
| Market-Watch |
0.03 |
0.05 |
2.77 |
2.83 |
18% |
1 |
| Security-Detail |
0.01 |
0.02 |
2.89 |
3.79 |
14% |
1 |
| Trade-Lookup |
0.50 |
0.13 |
49.09 |
3.30 |
8% |
4 |
| Trade-Order |
0.07 |
0.10 |
45.96 |
3.74 |
10.1% |
4 |
| Trade-Result |
0.07 |
0.13 |
68.73 |
7.10 |
10% |
6 |
| Trade-Status |
0.02 |
0.03 |
60.23 |
6.51 |
19% |
1 |
| Trade-Update |
0.56 |
0.14 |
3.46 |
3.79 |
2% |
3 |
| Data-Maintenance |
0.11 |
0.07 |
|
|
|
|
weighted Avg Response |
0.0812 |
0.0635 |
|
|
|
|
Average tx in flight |
2526.5 |
2389.1 |
|
|
|
|
System configuration
| System |
NEC A1080 |
Fujitsu RX900 |
| Processor |
Xeon 7560 |
Xeon 7560 |
| Sockets-Cores |
8 x 8 = 64 |
8 x 8 = 64 |
| Hyper-Threading |
yes |
yes |
| Frequency |
2.26GHz |
2.26GHz |
| Memory |
1024GB |
1024GB |
| IO |
7 FC |
14 SAS |
| Storage |
1872 HDD |
336 SSD |
| OS |
2008 R2 DC |
2008 R2 DC |
| Database |
2008 R2 DC |
2008 R2 DC |
| tps-E |
3,141.76 |
3,800.00 |
Benchmark Summary 2010-09-28
Below is a summary of the best available TPC benchmark results for recent Intel Xeon and AMD Opteron server systems. Note that Westmere-EP 32nm and Nehalem-EX 45nm have been consolidated.
Processor Architecture Process |
TPC |
2-way |
4-way |
8-way |
16-way |
W-32nm/N-EX45 Xeon 5600 6C 7500 8C |
TPC-C TPC-E TPC-H |
803,068 1,110 73,974.6@100G |
1,807,347 2,022.64 121,346@300G |
- 3,800.00 162,601@3TB |
future future future |
Magny-Cours 45nm 12C |
TPC-C TPC-E TPC-H |
705,652 887.4 71,438.3@100G |
1,193,472 1,464 107,561@300G |
n/a n/a n/a |
n/a n/a n/a |
2010 Sep 19
TPC-H results was finally published for 4-way Xeon 7500 @300GB on 14 Sep. A TPC-C result was also published for the 4-way 7500 on 27 Aug. There will probably not be a TPC-C for the 8-way DL980 as there may be a limitation for SQL Server in the ability to write to a single log file. HP seems to be the only vendor active in TPC-H. This could be because other companies have cut staff. Benchmarking is a specialized skill. It usually takes a dedicated person for each benchmark and environment. It is not the benchmark result that is important. It is the investigation into the root cause of bottlenecks to improve performance in the next iteration that is important. So this means only HP will be making contributions in DW.