THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Joe Chang

Recommended Server Systems, 2008 Q3 - Dunnington six-core

Intel Dunnington 

Yet another update with the publication TPC results for the Intel X7460 six core (Dunnington)

X7460 2.67GHz 3x3M L2, 16M L3 

Dell, HP lists availability Sep 15, 2008, IBM lists availability on the x3950M2 as Dec 10,

Dell R900 with 4 x X7460, 2.67GHz, 6 core, 16M L3, $17,195

HP DL580G5 with 4 x X7460, 2.67GHz 6 core, 16M L3 $19,151

I think the IBM x3950M2 with 4 x X7460 is $41K (understanding this system can be expanded to 16 sockets, and hence has higher cost structure)

Tukwila Quad Core Itanium due in Q1 2009?

I am looking over the Intel IDF slides on Tukwila. It is a quad-core Itanium, with probably just minor improvements in the core (specifically mentioned are HT) but with integrated memory controller and the QuickPath Interconnect (QPI) replacing the FSB. Tukwila will be 65nm when the first 45nm procs are 1 year old, meaning it really 2 years late (What I mean by this is a 65-nm quad core Itanium could have been built in late 2006/early 2007, if the prep work started early with clear objectives, and its performanc would have been earth shattering relative to x86/64). Frequency improvements are mentioned over the 90nm Dual-core, which is running about the same frequency as the 130nm single core Madison. Still, Tukwila has a large cache, 6M L3 per core, massive bandwith via QPI for good scaling characteristics (4 full width + 2 half width, allowing glueless 8-way), 4 DDR memory channels per socket, an HT, which is good for high call volume database apps. Intel mentioned about 2X performance, so they are probably targeting 740K tpm-C.

This means we will have the choice of 1) the six-core Dunnington, with the most powerful CPU core on earth (prior to Nehalem) on a weak chipset with 4 memory channels support 4 sockets of 24 cores, 2) the new quad-core Itanium with outstanding scaling characteristics plus HT for added throughput, but a weak core, 3) AMD Barcelona, also with very good scaling (8-way glueless), no HT, and slightly better than weak core, 4) Nehalem, with what will be the most powerful core, the new QPI for good scaling, 3 memory channels per socket, HT, but for the first year, only 2-sockets. Decisions, decisions.

Nehalem/Beckton

Due out in Q4 2008, the initial Nehalem core will support 2-way, 3 memory channels, 2 QPI. About a year later, Beckton, the MP server (4 sockets and up) version come out, 4 mem channels/socket, 4QPI.

Performance

TPC-C (Windows Server 2003, SQL Server 2005SP2)

4 x Intel X7460 six core 2.67GHz, 634,825 tpm-C

4 x AMD 8360 quad-core 2.5GHz 471,883 tpm-C

4 x Intel X7350 quad-core 2.93GHz 407,079 tpm-C

 

TPC-E (W2K8, S2K8, Dell PowerEdge R900 results)

4 x Intel X7460 six core 2.67GHz, 671.35 tps-E

4 x Intel X7350 quad core 2.93GHz, 451.29 tps-E

4 x AMD 8360 quad core ??

 

TPC-H (W2K3, S2K5, SF 100) 

4 x Intel X7350 quad core 2.93GHz,  46,034QphH

4 x Intel X7460 six core ??

4 x AMD 8360 quad core ?? 

TPC-H 300GB

8 x Intel X7350 QC 46,034 QqpH (IBM x3950M2, W2K3, S2K5 sp2) 

8 x AMD 8360 QC 52,860 QqpH  (HP DL785, W2K8, S2K8)

 

On TPC-C, the 7460 six core generated a 34% edge over the quad core AMD and 56% advantage over the quad core X3750. Even with the large cache, this is higher than expected. At the time, I suspected HP did not pursue optimization with the 407K result.

On TPC-E, the six core showed a 49% edge over the older quad core.

This could indicate the 7300 chipset with 4 memory channels cannot properly scale the 4 QC 2x4M L2 processors, but can scale the new six core 16M L3 procs.

What's missing are comparable TPC-H numbers, especially at 100GB. The big cache on X7460 helps high call volume apps like TPC-C and E, but not TPC-H. Can the 7300 chipset drive the extra 8 cores (in the X7460 over 16 core in the X7350) in DW queries?

There is an 8xOpteron QC and 8xX7350 at 300GB, but the Opteron is on S2K8 while the X7350 is on S2K5, which has different characteristics.

 

The X7460 (Dunnington) is a clear winner at 4-way for the high-call volume apps. There are not sufficient results in DW to make a call. AMD does have a small openning in the fairly low-priced 8-way (compared with hard-NUMA systems).

 

As much as I would like to buy one of these for my own use in researching SQL Server performance characteristics, I am holding my 2008 budget for a Nehalem system as soon as it comes out, and a SSD array. As soon as I can confirm an SSD can do 10K IOPS on random 8K reads (I see the IDF announcements that the new Intel SSD due early 2009 will do 30K IOPS at 4K), I will get a dozen to see what is involved in reaching 100K IOPS from SQL queries. A few years ago, a quick test on a TMS SSD SAN showed 45K, limited by the SQL Server side CPU. On Nehalem, the big question is whether the Hyper-Threading issues of NetBurst has been fixed.

______________________________________________________________________

 

This is an update to the original post on Server Sizing for SQL Server to reflect the new quad-core Opteron systems. The recommended server systems, as of Q3 2008, for line-of-business database applications are:

2-way Intel Xeon: HP ProLiant ML370G5 and Dell PowerEdge 2900 III

4-way Intel: Dell PowerEdge R900 and HP ProLiant DL580G5

2-way AMD Opteron: Dell PowerEdge R805

4-way AMD Opteron: Dell PowerEdge R905 and HP ProLiant DL585G5

8-way AMD Opteron: HP ProLiant DL 785G5

 

Processors

For 2-socket Xeon, the top processors include the X5460 3.16GHz, and the E5440 2.83GHz, both 2x6M cache.

For 4-socket Xeon, the X7350 2.93GHz 2x4M.

For 4 & 8 socket Opteron, the top processor is the8360SE at 2.5GHz.

 

Processor Notes

I really do not want to get heavily into Xeon versus Opteron. It is too emotional a subject for many people and too infested with FUD driven by marketing people. This frequently involves valid technical points taken out of context. What it comes down to is the Core 2 architecture has by far the highest SPEC CPU integer (not rate) scores, and will generate the best results in certain categories of performance tests. This is most evident in single large query tests.

At the 4 & 8-socket level, AMD Opteron has the better memory architecture, with 2 DDR2 memory channels per socket, 8 total in a 4-socket system and 16 memory channels in an 8-socket system, compared with 4 in the 4-socket Xeon with the 7300 chipset. This may yield an advantage in full saturation tests, which are more difficult to run. So at the 4-socket level, the difference is Xeon has the more compute power in the processor cores, while Opteron can turn memory faster. What is better: a 400HP engine with a transmission with 70% efficiency or a 310 HP engine with a 90% transmission efficiency?

 

At the 8-socket level, Opteron is the best choice for most situations. The 8-socket Opteron (Barcelona) system has what is considered to be soft a NUMA architecture, meaning that memory latency difference between local and remote nodes is low or inconsequential (i.e. do not set the NUMA flag). The IBM and Unisys big iron systems are considered hard NUMA, meaning that memory latency between local and remote nodes is high. Hard NUMA systems can scale, but would most likely required specialized performance analysis skills which are not easily found.

 

Additional comments

I rate the HP ProLiant ML370G5 over the PowerEdge 2900 III on technical grounds: more memory sockets, and more PCI-E sockets. On the same grounds, I rate the Dell PowerEdge R805 over the ProLiant DL385G5 because the R805 has 16 DIMM sockets over 8 for the DL385G5.

At the 4-socket level, for Intel platforms, the Dell and HP systems are sufficiently comparable.

 

Note that Dual-Core Opteron processors are an option in the 2 and 4-socket systems, but not the 8 socket DL785. The original and dual core Opteron processors have up to 3 full (16-bit) HT links, of which 2 connect to other processors, and 1 connects to an IO hub. In a 4-socket system, the processors are at the corners of a square, with each processor connected to processors on the two adjacent corners. Hence there is a far processor two hops away.

The Barcelona quad-core has up to 4 full width HT links, each of which can be split as two half-width (8-bit) HT links. In a 4-socket system, each processor can connect directly to all of the other three sockets with a full HT link, leaving one for IO. In an 8-socket system, each processor can connect directly to all seven other sockets with one half-wide HT link, leaving one half-wide link for IO. The HP 4-socket DL585G5 and 8-socket DL785 only support quad-core Opteron, not dual core, which may indicate the use of three full HT links to processors. The Dell R905 supports both dual and quad-core Opteron, which may indicate an older 2-hop to the far processor. 

 

Finally, until quad-core on a current generation manufacturing process is available, Itanium has the very high memory capacity (>256-512GB) and IO bandwidth (>10GB/sec) niche. It could be pointed out that whatever the criticism of Itanium be since its launch, at the time it was conceived in the 1990s, it was a forgone conclusion that RISC would overwhelm x86, which would be not able to benefit from advanced design concepts. Intel was not content to do a johnny come lately to the RISC party, and with HP, came up with a better idea than RISC. And yet, what processor today has the best SPEC CPU int. 

 

IBM Xeon Systems

I have said before that I do not have recent experience with IBM systems. Just from looking at the IBM redbook ob the x3850 M2, it looks very impressive. For this and the x3950 M2, IBM does their own chipset, which supports a NUMA architecture to 4 nodes of 4 sockets. I do not know if 8 nodes are still supported. What I like about the x3850 M2 memory controller is the 8 DDR2 memory channels. I really think the Intel 7300 with 4 memory channels is too weak to support 4 quad cores, and now 4 six core procs. Intel always was afraid of the high-end chipset, obsessively looking at the entry point price, which drags down the high-end configuration. The IBM x3850 M2 did post a TPC-E of 479.51 tps-E over Dell's 451.29. The IBM system has 128GB memory compared with 64GB for Dell, so it is not clear if the 8 memory channels contributed.

Published Tuesday, August 19, 2008 12:36 PM by jchang
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Raman (RNDTS) said:

Hi Joe,

For 2-socket Xeon one could use the X5482 as well.  Though Intel classifies that as a Workstation CPU it still is a Socket 771 and can be used on boards that support 5400 chipset.  I am building a custom one using 2x 5482 with 64GB RAM to run SQL Server 2008 on Windows Server 2008 Ent to host/process several terabytes of data on 10GbE iSCSI SAN.

Raman

August 19, 2008 2:56 PM
 

jchang said:

I was seriously peeved when neither Dell nor HP rolled the 2 sockets systems to 5400 chipset. I really wanted the 40 PCI-E lanes. This would support 4 x8 and 2 x4 PCI-E slots. And it supports 128GB max memory. The 1600MHz FSB was a bonus over 1333. I used to a lot of Supermicro motherboards because I really liked to match a specific motherboard and chassis. Take a motherboard with the 5400 chipset and the SC216A chassis for 24 2.5in SAS drives. I am a little puzzled that the X7DWN+ lists 3 x8 and 1 x4 slots. A dual GbE occupies one x4 PCI-E off the 5400MCH, I would attached it off the ESB. This leaves 1 x4 unaccounted for.

But I has trouble with using Supermicro chassis as external storage, especially hooking up SAS external storage.

And since Dell offers very low prices on the PE2900, thats what i switched to.

If you are doing a multi-TB DB, I would strongly suggest ditching the iSCSI SAN. At best, each 10GbE will do 700-800MB/sec and it assumes everything else in the chain is done correctly. 10GbE is also expensive. Do direct attach SAS. Figure 3 x8 PCI-E SAS will do 4GB/sec+, thats alot more than you will get out of iSCSI and a SAN.

August 19, 2008 4:43 PM
 

jchang said:

My next server:

4 x8 and maybe 1 x4 PCI-E slots, all full bandwidth, not shared

4 SAS RAID controllers capable of driving 1.6GB/sec (800MB/sec per x4 SAS port)

16-24 SSD per Intel X25 spec, It does not have to be the X25-E,

I can accept the X25-M specs for sequential which are below.

assuming it still gets close to the E model random IO.

the reason is SQL is not going to generate writes >1 GB/sec that I know of, but I can easily get reads > 6GB/sec.

With 16 drives, spread across 8 x4 SAS ports and 4 x8 PCI-E slots, I can then get 4GB/sec sequential and 560K 4K random(??), and 6GB/sec with 24 drives.

I am hoping some OEM, be it Dell, HP or SuperMicro, would build a system that allows distributing 24 2.5in slots over 8 x4 SAS ports.

Intel X25-E specs:

Sequential Reads  250MB/sec

Sequential Writes 170MB/sec

Random 4KB Reads 35K IOPS

Random 4KB Writes 3.3K IOPS

Intel X25-M specs:

Sequential Reads  250MB/sec

Sequential Writes  70MB/sec

August 20, 2008 5:49 PM
 

Glenn Berry said:

Very detailed, informative post, Joe.  I am almost drooling waiting for Dunnington to drop into a R900 for an OLTP workload. I think it will be similiar to the bump I got from going from Paxville to Tulsa in a Dell 6800.

August 26, 2008 1:46 PM
 

Raman said:

Joe,

I agree SAS RAID cards will get you 4GB/s.  But when dealing with large databases accessing data alone is not the issue.  One has to consider importing/exporting data into databases, store efficiently, perform backups etc.  A DAS system is not suitable since I/O for these tasks will suffer.  A SAN with dedicated ports for data and backups will isolate the data transfer in/out of the databases.  A 10GbE, when properly configured can easily get 900-1000MB (based on benchmark reports) from a single port.  If I used Intel dual-port 10GbE card in a x8 slot I could use each port for data and backup respectively.  Depending on the motherboard (not HP or Dell) one can put 2x dual-port 10GbE (~$800 each) each card can be used with faul-tolerance.  What apps or user requires 4GB/s of data transfer constantly?  

Raman

August 26, 2008 11:37 PM
 

Raman said:

Glenn,

I'd wait until CPUs with Nehalem architecture is out.  Dunnington, thoughm, comes with 6-cores, it still is based on old architecture and with 1066Mhz.  It will be worth waiting for few more months or end of this year to get servers or build one with Nehalam quad-core chips.

Nehalam architecture will be very useful for SQL Server 2008 deployments since this version introduces compression, which requires CPU cycles to compress/uncompress.  With 2x Quad-core chips one will get 16 processing threads using multi-threading or simultaneous threading (two threads per core).

Raman

August 26, 2008 11:42 PM
 

jchang said:

Dunnington is the right choice for OLTP, the large cache overcomes any issues with the 1066FSB. Core2 may be an "old" architecure, but it is still top dog. A 4-way 6-core big cache is 24 cores. The first Nehalem is 2-way only, 4 core, for 8 cores total. Each core may be 20% faster, plus another 15% for HT. But you are still short. Beckton is a full year away.

I don't see why 4-6 SAS RAID controllers would have any problems with loading, efficiency or backup. What do you mean DAS is not suitable? It is the best there. If an app does not need 4GB/s, why do you want to spend more money for a lower performance higher overhead technology? well to each his own, live and learn.

August 27, 2008 12:39 AM
 

rtech said:

Each Dunnington CPU would cost at least $2800-$3000.  And a 64GB memory, RAID controllers (assuming DAS), disks etc it would cost a fortune to build a single server based on the Core 2 architecture.  In a year, one will not be able to justify going to Nehalem, which will cost another fortune.  If one has budget and can afford, yes, go for it.  But my view is that one could go for 5400 Xeon, even if it is only 8-cores, but to go with 5482 (3.2Ghz).  SQL utilizing all 8-cores at 100% constantly 24x7 definitely may not happen so why spend a fortune on a outdated architecture, but go with a mid-level today and spend it on Nehalem moving forward instead.  This is just my opinion.

As far as DAS the disadvantage is that one will be installing several controllers in a single-server and have DAS worth 30TB connected is a bit risky as one is solely dependent on this server.  When the server retires the data sitting on DAS has to be migrated to a different storage system or install same controllers in the new server with the new OS and drivers, if available.  Even if one chooses the DAS route one has to have enough expansion slots, specifically x8 or x16, to connect so many spindles, and not having any expansion slots for additional network controllers etc and motherboards that support such configuration are not available today.  In my case there will be a constant import of data from external mobile storage with Firewire800 or eSata, processing of data within databases (at least 10 billion rows), allocating storage, and backing up databases etc happening in paralle.  Relying on a single server will not work for me.  I have used DAS for the last 8-years for SQL 2000/2005 storage and I ran into several issues not having virtualized storage, expanding storage, RAID migration etc so going the SAN route, specifically iSCSI due to the cost compared to FC, so that I can get enterprise level features and not shell out a fortune.  I do plan on having at least 8-drives on the server connected to on-board SAS ports for system files, logs, and indexes and a separate slot connected to hot-pluggable port that will use an independent 1TB SAS-SATA drive for local backups (no tapes).  All other data will be on tier-I and tier-II on SAN using SAS 15K and SAS 7.2K respectively.  

When one is designing server/storage I wouldn’t go with Dell or HP because they do not give enough details on the motherboard used or don’t use the one you want, but use the one that they like or want.  Shouldn’t it be the other way?  Server vendors must be able to build what customer wants or at least use a motherboard that closely matches the features they are looking for and this is not available today from Dell/HP.  This is just my view.  

Anyway, how much do you think a iSCSI SAN storage solution with 8.4TB raw SAS 15K storage, 21TB raw SAS-SATA 500GB 7200, 2x 10GbE SAN controllers (separate SAN) would cost with Snapshot features, virtualized storage etc vs. DAS with similar features?

August 27, 2008 11:51 AM

Leave a Comment

(required) 
(required) 
Submit

About jchang

Reverse engineering the SQL Server Cost Based Optimizer (Query Optimizer), NUMA System Architecture, performance tools developer - SQL ExecStats, mucking with the data distribution statistics histogram - decoding STATS_STREAM, Parallel Execution plans, microprocessors, SSD, HDD, SAN, storage performance, performance modeling and prediction, database architecture, SQL Server engine

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement