THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Joe Chang

Nehalem, Shanghai, and Dunnington performance notes

Back in March 2008, I discussed the SAP SD 2 Tier results for some AMD Opteron and Intel Xeon systems. Here are some more recent results. All Opteron and Xeon processors below are on the 45nm process except for the Opteron 8360, which is a 65nm product. For some reason, HP has not posted a 4-socket result for the 45nm Opteron 8384, where the bigger 6M L3 cache in known to improve network round-trip performance. The Itanium is a dual-core 90nm product, which is at serious disadvantage (to be replaced Tukwila, a 65nm quad-core). Note that the Xeon 7460 is a six core processor. My expectation is that the 4-way Opteron 8384 should be around 22-23K which would be very competitive for a quad-core going against a six-core.


System             Processors                                           Users                SAPS

DL785G5           8 x 2.7GHz        Opteron 8384                7,101                 35,400 (O10,Lin)

DL580G5           4 x 2.66GHz      Xeon X7460                   5,155                 25,830 (S2K5)

DL585G5           4 x 2.5GHz        Opteron 8360                3,801                 19,020 (S2K5)


DL380G6           2 x 2.93GHz      Xeon X5570                   4,995                 25,000 (S2K5)

DL385G5           2 x 2.7GHz        Opteron 2384                2,752                 13,780 (S2K5)

DL380G5           2 x 3.33GHz      Xeon 5470                     2,518                 12,600 (S2K5)

BL860C             2 x 1.66GHz      Itanium 9140M                  501                  5,850


Of special note is the exceptional result for the 2-socket Xeon X5570, based on the new Nehalem processor, expected to be available some time in the first half of 2009. This is not entirely unexpected. First, the integrated memory controller probably helps in network round-trip intensive operations. Second is the return of Hyper-Threading, as Nehalem was designed in Oregon, while the Core 2 architecture is an Israeli design (each design team has their own opinion of various micro-architecture features). On the last Oregon design, the Pentium 4 NetBurst core, I noted that Hyper-Threading improved network round-trip performance by 15-20%, but did not actually improve any SQL operation outside of the network round-trip. Later, I measured 40-50% performance gains for HT on LiteSpeed backup compression tests. I have heard that while the theory behind HT is sound, certain operations such as acquiring locks (at the C/C++ level) can cause problems. The compression algorithm has no such issues, and probably represents the upper bound on what could be achieved with HT. Supposedly the Itanium HT, which was introduced with Montecito after the last NetBurst, had some improvements over NetBurst. Now that the Oregon team has had 8+ years to investigate HT characteristics, we should expect a much improved HT with Nehalem.


My expectation is HT has the biggest benefit in SAP type environments, i.e. stored procedure calls that retrieve a single row (50-100 CPU-micro-sec), moderate benefit in TPC-C and E type environments (on the order of 1-3 CPU-milli-sec), and less or no benefit in large TPC-H type queries. When I get the Nehalem Xeon 55xx system, I will look into this.


Unisys to focus on Xeon over Itanium?

On a side note, see the article regarding Unisys.

Unisys just posted a 10TB TPC-H result for a 16-socket Xeon X7460. While the X7460 has six cores, Unisys only enabled four cores per socket, in part, because the current version of Windows and SQL Server only support up to 64 cores. The result is 26% higher than for a 32-socket 64 core Itanium 2 system. The presumption is that later this year, Intel will release the quad-core Itanium, codename Tukwila, so this result might be representative of 16-socket systems in late 2009. Even if Tukwila can achieve 2.0GHz, it will probably just be comparable to the X7460. After Windows Server 2008 R2 releases, the 16-way X7460 will have all 72 cores available for the through-put portion of TPC-H.


64-core TPC-H 10TB

Xeon X7460 Six-core 2.66GHz (Dunnington, 4 cores used) 80,172.7 (ES7600R)

Itanium 9140N Dual-core 1.6GHz (Montecito) 63,650.9 (Superdome)


Dell TPC-E results for Shanghai versus Dunnington

Both systems 4 sockets, 64GB memory, 

4 x six-core Dunnington 2.66GHz, 16M L3   671.35 tpsE

4 x quad-core Shanghai 2.7GHz, 6M L3      635.43 tpsE

So Shanghai quad-core competes very well against Dunnington with six cores. The large cache relative to Barcelona (2M L3) really helped 


SSD Test Platform

Anyway, I am all set to buy the new 2-socket Xeon 5500 series as soon as one becomes available. I will look into Nehalem performance relative to Core 2, with and without HT. I will try to configure this system with 2 PCI-E SAS RAID controllers and 8 SSD (2 per x4 SAS port, 4 devices per controller) initially, and then expand to 4 RAID controllers with 24-32 SSD as budget allows (probably 3 SSD and 1 HDD per x4 SAS port). I should also get a Shanghai platform as well, as my last good numbers for Opteron are now very old, but this is my own money. Business is down with the economy, and too many consultants are dropping their rates to get business. I am not inclined to do so. So I should have time to bring my past performance papers, many of which pertain to SQL Server 2000, up to date. I will also try to re-release some of my performance tools like SQL Clone on I should also be able to release new tools, one for Profiler Trace analysis and another for performance tuning using dm_db_index_usage_stats and dm_exec_query_stats.

Published Monday, February 23, 2009 2:46 PM by jchang
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS



Glenn Berry said:

Have you tried any benchmarks with the desktop Core i7 vs Core2 Quads yet?

Keep up the good work

February 27, 2009 2:51 PM

Glenn Berry said:

Wouldn't a 16-way X7460 Dunnington have 96 cores (16 x 6) running on Windows Server 2008 R2 ?

February 27, 2009 2:54 PM

jchang said:

Glenn: I have been in an extremely deep state of turmoil since November on whether to:

1. pull the trigger on buying a Core i7 desktop system, ie, single socket, merely 12GB memory (the new Dell XPS 435 can be bought with 24GB)


2. Wait one more month in agony so I can order a proper 2-socket Xeon 5500 line, preferably with 64-128GB memory (100GB should let me run TPC-H sf100 in memory, with compression)

Because I did not opt for 1) earlier, now I would feel stupid to buy the desktop now when I should be able to get the 2-way next month. I thought back in Nov it would be early 2009. Had I known it would have been Q2-09, I should pulled the trigger then.

Yes, but Unisys published results with the current release of W2K8,

jeez, give a guy 64 cores and its still not enough for him.

just how many cores do you need?

February 27, 2009 3:33 PM

George Walkey said:


have you left

I miss your regular hardware/software platform reviews.

Especially since you are free of vendor propaganda.

Since my account was locked out over there, and they dont answer contact us emails,  I assume the same happened to you.

Most epecifically, the integrated memory controller in i7

should finally put ATI out of business. (AMD really is gamers company the always hoped to be)

March 7, 2009 1:01 PM

George Walkey said:


have you benched the new hardware on the various NT operating systems from MS?

Id be curious to see how the various versions of SQL work on

2000 vs 2003 vs 2008 (32bit)

then the 64bit versions

There was talk way back in 2003 that OS2000 ran sql2000 faster than OS2003.

yes i know testing all the hardware/OS/SQL version combinations would take time, but would interesting to see of indeed OS2000 runs faster because of kernel changes.

a few of myclients dont see any reason to to to SQL/OS 2008 as OS2003/SQL2005 are just as fast on the same hardware

Ie: the MS degign philosophy of "if we dont have fast code, just cache it" is in error because eventually, you have to save your results.

IOW: OS2008 should not be faster than OS2003 because MS found a trickier way to cache things, but because the kernel and disk code is better....

March 7, 2009 1:10 PM

jchang said:

when sql-server-performance went to new layout, I had a hard time with the new style. I cannot pickup changes like young people.

Plus I like this format here where there is a URL for all my blogs, instead of being spread out over multiple sections. Plus I think Jude did not like some of my humor posts.

do you mean integrate memory controller or graphics? the integrated memory controller really should have been put in the second generation NetBurst processor. The high-revving procs benefit more than the lower revving Core2 and even Nehalem core. But Intel had just got people to change to the P4 bus, and there is economic benefits of riding one set of infrastructure for a long time, even if better technology is possible.

Anyways, as part of the process (90-65-45-32nm) continued integration is mandatory, it is not meant to put a specific industry or company out of business, even though that may happen.

Memory controller definitely should be integrated, of course, Intel says low-end systems should have the memory controller with graphics, possibly because the graphics makes more use of memory.

The large majority of system may as well have graphics integrated either in the CPU die, or package, as common graphics is good enough for most uses, and no sense wasting a perfectly good x4 or 8 PCI-E slot on graphics. I am even thinking that a super high-end gamer should take the integrated graphics,

that is use one monitor as a console,

Pretend you are in a fighter jet simulator. 3 very high-end monitors display what you see outside. Lets use the integrated graphics to an ordinary monitor to display the instrument panel.

It would also help troubleshoot the high graphics system.

I am not convinced Intel will be able to produce high-end graphics to put ATI out of business, even if Intel bought nVidia.

I am reasonably convinced the OS guys at MS are making decent improvements in the core code, especially with regard to multi-threading efficiency.

Of course, any individual decision may make a specific operation slower in the default setting.

This is where it is up to you to identify the specific cause and effect, and find the setting (usually undocumented or poorly documented) to restore the original characteristic.

Now I want to accept that W2008 is improved, but I am having a really difficult time just figuring what setting need to be made to even connect to SQL, sure the default should be secure, but it sure could be alot easier to know what needs to be turned on.

March 11, 2009 1:26 PM

George Walkey said:

im sorry

my humor was too abbreviated

I was talking about the memory controller.

that was the ONLY advantage AMD had.

Now its gone too.

I was saying AMD should just become ATI only...

I never think/talk about graphics on your blog....

March 13, 2009 3:45 PM

George Walkey said:

would you agree that the landy wang and gannapathy kernel and disk IO code has been progressively getting better from W2000 to W2003, W2008?

I have a client going from SQL 2000 on W2000 to W2003, possibly W2008

budget constraints prevent SQL upgrade for them...

March 13, 2009 3:51 PM

jchang said:

I am not convinced AMD will go out business because of Nehalem, or that they will drop out of the server business in 2009. Of course, AMD could go into bankrupcy just on current losses, but would probably be reorganized. Even if Nehalem shows good advantage over Shanghai. The server business shows some degree of persistence to already approved decisions. But if Intel opens the advantage in 2010 with 32nm, then I am not sure how AMD die-hards can hang on. AMD really need micro-architecture enhancements to the current core to stay in the game.

March 25, 2009 11:50 AM

Leave a Comment


About jchang

Reverse engineering the SQL Server Cost Based Optimizer (Query Optimizer), NUMA System Architecture, performance tools developer - SQL ExecStats, mucking with the data distribution statistics histogram - decoding STATS_STREAM, Parallel Execution plans, microprocessors, SSD, HDD, SAN, storage performance, performance modeling and prediction, database architecture, SQL Server engine

This Blog


Privacy Statement