THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Joe Chang

NEC Express5800 A1080a

NEC announced their 8-way server with Intel's Xeon 7500 launch on 30 March 2010, along with a published TPC-E performance benchmark result. The listed availability date was 24 June 2010, so the 5800/A1080a should now be available, although I have not yet ordered a system for my own personal use.

Producing a respectable benchmark result for a high-end system on a new system architecture is a major effort because a great deal of back and forth exchanges on the entire software stack are necessary to resolve issues in previously unexplored territory.

Having the benchmark result ready to publish on Intel's processor launch date is important because it leverages extensive publicity of a major Intel product launch. Especially considering that the NEC Express was the only 8-way TPC benchmark result at the time (the other TPC results was an IBM 4-way, there were also Fujitsu 8-way SAP SD 2-Tier results).

In the past, I have frequently cited system examples with Dell or HP servers, as they are systems with which I have frequent direct experience using. IBM is also a major vendor in the US server market, but as an independent consultant, I only rarely encounter IBM servers. So not having extensive experience on IBM servers, I do not cite examples unless there are unique features. (IBM does published redbooks with a great deal of technical details, which can be very helpful.)

When Intel announced the Xeon 7500, along with the IBM 4-way and NEC 8-way TPC-E results, I did not discuss the architecture of either system, but did comment on the performance relative to the previous generation Xeon 7400 architecture and the lower end concurrent generation Xeon 5500 architecture (see Intel Xeon 5600 and 7500). I also commented that I thought Intel deliberately held up publication of TPC results for the Xeon 5600 on 16 March in order to not detract from the then upcoming Xeon 7500 announcement.

Last week, I was at HP Technology Forum where HP announce the new ProLiant G7 series along with TPC-C, E and H results, including an 8-way TPC-H for the Xeon 7500 (new HP ProLiant servers). I also mentioned that were new Dell and Fujitsu TPC-E results, but did not discuss them because there were already published results for 2-way Xeon 5600 and 4-way Xeon 7500.

Apparently the NEC America Product Marketing saw that blog and was miffed that I did not mention their 8-way TPC-E report of 30 March (NEC Express 5800/A1080a TPC-E report).

Since I have frequently discussed high-end NUMA system (Big-Iron Revival and NUMA Systems) I suggested doing a couple of papers, one for scaling SQL Server on big-iron for transaction processing, the other on storage performance for data warehouse. They have promised to provide access to their systems, so hopefully I can get these out in the near future.

NEC Express5800/A1080a TPC-E Benchmark Report

In the mean time, I will discuss some details of the NEC 8-way Xeon 7500 TPC-E report. The table below compares the NEC 8-way Xeon 7500 TPC-E with the 16-way Xeon 7400 and 4-way Xeon 7500 systems.

System Unisys ES7000 NEC 5800/A1080 IBM x3850 X5
Processor Xeon 7460 Xeon 7560 Xeon 7560
Sockets-Cores 16 x 6 = 96 8 x 8 = 64 4 x 8 = 32
Hyper-Threading no yes yes
Frequency 2.66GHz 2.26GHz 2.26GHz
Memory 1024GB 1024GB 1024GB
Storage IO 15x2 FC 7x2 FC 6x2 SAS
Storage 870+12 HDD 1872+20 HDD 1008 HDD
Windows Server 2008 R2 DC 2008 R2 DC 2008 R2 EE
SQL Server 2008 R2 DC 2008 R2 DC 2008 R2 EE
tps-E 2,012.77 3,141.76 2,022.64

The 8-way Xeon 7560 is 56% better on TPC-E than the 16-way Xeon 7460, (2.7X better than a 8-way 7460, not listed) and 55% better than a 4-way 7560. The Xeon 7500 performance relative to the previous generation Xeon 7400 is in line with information disclosed at Intel Developer Forum 2009. The scaling is below the expectation set by Microsoft for Windows Server 2008 R2. A recent Fujitsu report achieved slightly better 4-way Xeon 7560 result with 512GB memory and SSD storage.

In previous generations, 1.5X scaling for each doubling of processor cores at this level was the best that could be achieved. Last year, Microsoft disclosed that the a number of improvements in Windows Server 2008 R2, particularly in the elimination of critical locks, allows for substantially improved high-end scaling. Scaling of 1.7X, presumably measurements on an Itanium system at 64 and 128 cores (128 and 256 logical processors) was reported. Still, much work is required in actually achieving the maximum possible scaling, especially on a brand new system architecture. So the expectation is that 8-way Xeon 7500 will gradually see slightly better scaling relatve to the 4-way.

Below is a diagram of the NEC Express5800A1080a configuration used in their 8-way TPC-E report

NEC Express Overview

I am not familiar with the NEC D3-10 storage system. It appears to consist of an external controller in an enclosure for 12 3.5in disks, and can be daisy-chain to additional 12-disk enclosures. The controller front-end interface is 4Gbps FC (perhaps an 8Gbps FC front-end will also be available). The back-end is SAS. I think this is the correct choice for storage systems. There really is no value to incurring the expense of running FC to the disks.

Of the 7 dual-port FC adapters in the A1080a, 13 FC ports connects to storage for the data files. The last FC port connects to storage for the log file.

Storage Architecture NEC
HBA 7 dual-port 4Gbps FC
FC ports (data) 13
RAID controllers per port 6 (78 total)
Addn'l disk enclosures 78 total
Disks per RAID controller 24 (1872 total)
Disks per FC port 6x24 = 144
LUNs per RAID controller 2 (12 disks per LUN)
LUNs 2x78 = 156

Assuming that this is a cost optimized design, we can estimate that the IO load on the data disks is close to 175 IOPS per 15K disk. The IOPS per FC port is then in the range of 25,000 and 50K IOPS per HBA. For the 8KB SQL Server page size, this is 200MB/sec per 4Gbps FC and 400MB/sec per HBA, well under the 330MB/sec (practical) capability of 4Gpbs FC.

Another point concerns the number of data files. The total number of disks was determined by IO load, for which this benchmark result required in the range of 1800+ disks. The disks were distributed across multiple controllers so as to not create too much IO load on any given HBA or RAID controller (in terms of both IOPS and MB/s). The number of LUNs created from the 1872 disks was really determined by practical configuration on disks per LUN, and 12 disks per LUN is practical.

Many people like to cite without any supporting data the rule for the ratio of data file to processor cores. One should really ask if all of this was because long long ago, there was a benchmark configuration with 4 cores and 4 data files. The reason for the number files had to with the storage system configuration. In this world, there are far too many people with limited intelligence, that must have rules to live by, without any interest in the reason behind the rule. It is as if blind adherence to the rule without any understanding will absolve them of the consequences of their actions. I do not believe there was ever any substance to the cores-file ratio rule. It was all some numnut who misinterpreted two uncorrelated facts in a reference system, and inferred that this must a rule. Other people of comparable intelligence then perpetuated this new rule.

The rear view shows the 7 dual-port FC adapters

5800/A1080a rear view

I did not find an architecture diagram of the NEC Express5800/A1080a. My understanding is that it is the standard glueless arrangement shown below, possibly with three IOH, not four.

5800/A1080a rear view

There were a number of tuning techniques mention in the NEC 5800/A1080a TPC-E report, very similar with the tuning methods employed in other TPC-C and TPC-E reports. So are absolutely critical, some are for achieving the last 1-2%, and others are merely convenient for the choice storage. All of this should be discussed in a detailed report.

Link for NEC America 5800/A1080a

Intel slidedeck for Xeon 7500 processor launch.

Published Monday, June 28, 2010 11:36 AM by jchang
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS


No Comments

Leave a Comment


About jchang

Reverse engineering the SQL Server Cost Based Optimizer (Query Optimizer), NUMA System Architecture, performance tools developer - SQL ExecStats, mucking with the data distribution statistics histogram - decoding STATS_STREAM, Parallel Execution plans, microprocessors, SSD, HDD, SAN, storage performance, performance modeling and prediction, database architecture, SQL Server engine

This Blog


Privacy Statement