THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Joe Chang

Sizing for SQL Server 2000 and 2005

Server sizing strategies using requirement driven approaches tend to go in to inane details involving substantial effort with no clear conclusions at the end. By considering only the systems most suitable for line-of-business database servers, there are just a few meaningful choices, making for a much simpler decision process. The key performance criteria for database servers are processor, memory and IO capability. The relevant processor lines are the Intel Xeon, AMD Opteron, and the Intel Itanium 2. For the current (and past) Intel processor lines, memory and IO capability are determined by the chipset. There are only a few chipset choices for any given processor arrangement (1, 2 and 4-way, etc). The systems that make the most sense for database servers employ the chipset with the best memory and IO, and implement the full memory and IO capability. For the AMD Opteron line, memory and IO are integrated into the processor, so this capability is largely determined by the number of processors. Hence there are very few system choices that actually warrant consideration. Recommended system examples for Dell and HP are given. There are comparable IBM systems, but are not cited here.

 

Processors

Nearly all server systems today use Intel Xeon or AMD Opteron processors. In the 2003-2004 timeframe, the performance competition between (single core) NetBurst architecture Xeon and Opteron processors was very close. With the introduction of dual-core processors in 2005, AMD pulled solidly ahead, in part because the NetBurst processors had to be throttled below electrical limits to keep within a 130W thermal envelope. In June 2006, the Intel Core 2 architecture established a clear performance lead, but was available only for 1-way and 2-way systems. In November 2006, the quad core Xeon 5300 series for 2-way systems closed much of the performance gap relative to 4-way dual core systems. At end of 2007, the meaningful server system choices for most environments are a 2-way based on the quad core Xeon 5400 series processor or a 4-way system based on the quad core Xeon 7300 series. Sometime in 2008, Barcelona quad core systems may be a viable choice if AMD can get frequency into the 3GHz range to be competitive with 3GHz Core 2 architecture processors.

 

Itanium has not kept pace with performance progress in the X64 side. It is seriously disappointing that Itanium 2 processors in 2007 are still on the 90nm process, when Xeon processors are beginning to transition to 45nm. The 90nm dual core Itanium cannot compete against 45nm quad core Xeon on performance per socket. However, chipsets with extraordinary capability, including the HP sx2000, have enabled Itanium systems to carve out a niche in the high-end where either very large memory capacity or exceptionally high IO bandwidth are required.

 

2-way Systems

The recommended 2-way systems as of December 2007 are the Dell PowerEdge 2900 IIII, and the HP ProLiant ML370G5. The preferred processors are the quad core Intel Xeon X5460 3.16GHz with 2x6M L2 cache or the E5450 2.83GHz. The Xeon 5400 series are manufactured on the new 45nm process, with some minor architectural enhancements, slightly higher frequency and larger cache over the 65nm Xeon 5300 series with 2x4M cache. Note these two specific systems are named and not the comparable high density 2U systems because of greater memory capacity and IO capability in terms of the number of available PCI-E x4/8 slots. The large number of internal disk bays also make these systems suitable for entry non-clustered environments.

 

Technically, the ProLiant ML370G5 is the best system because it implements the full capability of the Intel 5000P chipset, with 64GB maximum memory over 16 DIMM sockets and 6 x4 PCI-E slots, compared with 48GB max memory and 1x8 + 3x4 for the PowerEdge 2900 (both have 1 x4 slot occupied by an included SAS/RAID controller). None of the current generation storage IO adapters (SAS RAID controllers or FC HBAs) can use more bandwidth than provide by a x4 PCI-E port, so there is no value in having 1x8 PCI-E over 2x4 slots in current generation database servers. The PowerEdge 2900 is more suitable for situations that do not require the maximum memory or IO configuration, especially considering Dell list prices are very attractive.

 

4-way Systems

The recommended 4-way systems as of December 2007 are the Dell PowerEdge R900 and the HP ProLiant DL580G5. The preferred processor is the quad-core Xeon 7350 2.93GHz with 2x4M L2 cache. There is no point dropping down to the lower priced E7330 2.4GHz as the complete solution price differential does not justify a 20% lower frequency. Both of the Dell and HP 4-way quad core systems are built around the Intel 7300 chipset. The 7300 supports 32 DIMMs for a maximum memory capacity of 128GB with 4GB DIMMs and 256GB with 8GB DIMMs. The IO configuration is 4 x8 and 3 x4 PCI-E slots. The 7300 chipset actually has 7 x4 PCI-E lanes and an ESI port (equivalent to one x4 PCI-E). PCI-E expander chips are used to create PCI-E slots with shared bandwidth. This not an issue except that load should be properly distributed across the dedicated PCI-E ports.

 

Itanium Systems

Itanium systems to consider for the high-end include the HP Integrity 8-way rx7640 and 16-way rx8640, with maximum memory of 256GB and 512GB respectively. Both systems use the sx2000 chipset and have extraordinary memory and IO bandwidth, far beyond what can be achieved in Xeon or Opteron systems. A table scan was observed at 15GB/sec. There is some question as to whether the 4-way rx6600 with the zx2 chipset should be considered over the 4-way Xeon or Opteron systems. The Itanium has the best 4-way dual core TPC-C result for SQL Server, probably due to the very large cache (2x12MB) and hyper-threading, but cannot match the 4-way quad core Xeon. One reason for considering Itanium is to realize the benefits of full 64-bit operating system and application when SQL Server version must be 2000 and not 2005 or later.

 

Summary

Microsoft did (still does?) recommend 4-way (and even 8-way systems during the Pentium III/ProFusion era) as the standard for SQL Server. From what has been observed in recent years, the default recommendation with no effort to analyze requirements should now be the 2-way quad core system. If there are indications the workload load is too heavy, tuning alone may be sufficient to resolve the issue. Even if it is necessary to replace the 2-way with a 4-way quad core or larger system, the original 2-way did not cost very much and can be used for other purposes.

 

Only in special situations with known exceptionally heavy load and big budgets is it actually beneficial to analyze performance requirements to determine the correct solution. The reason for this is that the cost of doing a proper sizing analysis will cost much more than a 2-way or even 4-way system. It makes more sense just to buy a 2-way quad core system that can handle most tasks. Another reason for defaulting to the 2-way system is that Intel introduces new technology in 2-way systems with about a one year lead over 4-way systems.

 

For people considering the 4-way quad core system (16 cores in all), SQL Server 2005 or later and the 64-bit version at that is strongly suggested. SQL Server version 2000 is too old and has too many issues in working with 16 schedulers, especially parallel execution plans for data warehouse environments.

 

Additional Notes

AMD Opteron 8200 and Intel Xeon 7100 series

Even though the dual core Opteron have been out classed by quad-core Xeons at the socket level, for most of 2007, the 4-way Opteron system still had leading edge performance, specifically in the TPC-H benchmark. The 4-way Opteron system had excellent memory performance with a total of 8 DDR2 memory channels when all four processor sockets are populated compared with 4 for the Intel systems. Both the Dell and HP 4-way Opteron systems featured 3x8 and 4x4 PCI-E ports, providing the best nominal IO bandwidth.

 

The 4-way Intel Xeon 7140 system based on the old NetBurst processor architecture did achieve the best 8-core X64 TPC-C score of 318K versus 263K for a 4-way dual core Opteron 8220 and 251K for a 2-way quad core Xeon X5365. This can probably be attributed to the very large 16M L3 on-die cache compared with 1M per core for the Opteron and 4M per 2 cores on the X5365 and the NetBurst Hyper-Threading feature which helps high call volume applications (like TPC-C) but not in other areas. So the 4-way dual core systems do have a modest edge over the 2-way quad core. The 2-way quad core system is recommended on cost grounds for most situations.

 

AMD Barcelona

As noted earlier, the original Opteron single core and both Opteron dual core processors were highly competitive if not industry leading in its day. Barcelona was supposed to keep AMD competitive by integrating four cores on a single die in combination with the transition from 90nm to 65nm manufacturing process. Barcelona appear to have moderate improvements over the previous generation Opteron in core architecture. The expectation is that single core performance between the 90nm Opteron and 65nm Barcelona should be comparable at the same frequency. Since Opteron reached 2.6-2.8GHz relatively quickly on 90nm, the expectation is that Barcelona on 65nm should reach 3.5GHz at a comparable stage. There are currently very few published Barcelona benchmark results (of course, Intel has not been profuse with 5400 or 7300 benchmark results either). The indications are that Barcelona will need to be around 3GHz to be competitive with the Xeon 5400 and 7300 series at around 3GHz. AMD is trying to achieve 2.5GHz in early 2008, with higher frequencies later in 2008. The window of opportunity for 65nm Barcelona to be competitive with the current Xeon 5400 and 7300 processors would probably close in late 2008 when the next generation Intel 45nm processors are released.

 

Intel Seaburg Chipset (5400)

The Intel 5400 chipset arrived about the same time as the Xeon 5400 series processors. The 5400 chipset supports both 65nm and 45nm Xeon 5100-5400 processors. Improvements in the 5400 over the 5000P/X chipset include an increase in the number of PCI-E lanes from 24 to 36 for 9 x4 PCI-E ports (in addition to the ESI port). Maximum memory capacity is increased to 128GB over 16 DIMM sockets, meaning 8GB DIMMs are required for max memory capacity. The maximum practical memory remains 64GB until the 8GB DIMM prices becomes reasonable, probably in the 2009 time frame. The PCI-E lanes configured as x16 ports supports the new PCI-E generation 2 specification 5Gbps signaling, compared with 2.5Gbps in gen 1. For now, this feature is only for workstation graphics and specialty HPC adapters. Some time in the future we may see x4 PCI-E gen 2 slots at 5Gbps.

 

Both the PowerEdge 2900 and ProLiant ML370G5 retain the older 5000P or X chipset. Neither vendor felt it was necessary to transition the workhorse server platform to the new 5400 chipset, even though the increased number of PCI-E ports would be a highly welcome improvement.

 

Core 2 Processors

As mentioned earlier, the first Core 2 processors were manufactured on a 65nm process and featured 4M L2 cache shared between two cores. The second generation Core 2 processors are 45nm with 6M L2 cache. The 45nm processors are only available in the Xeon 5400 and 5200 series. The Xeon 7300 series for 4-way systems will use the 65nm processors until late 2008. Right now, this is not much of an issue as the top Xeon 5400 series frequency (with 1333MHz FSB) is 3.16GHz compared with 2.93GHz for the top Xeon 7300 series. It is possible that the top 45nm 5400 frequency could be pushed higher, while the top 65nm 7300 will mostly likely stay put. The 50% larger cache on the 45nm version would be helpful in high call volume applications on 4-way systems, but this difference is not sufficient to be an absolute requirement. In late 2008, Intel will launch a 45nm processor based on the Core 2 architecture, codename Dunnington, for 4-way systems. Details released show a single die six core processor. The current Intel quad core processors are comprised of 2 dual core die in a single package, or processor socket. Each pair of cores will have 3M L2 cache. The entire die will have a large 12-16M L3 cache shared between all cores. The expectation is this strategy should benefit throughput oriented applications. The late 2008 time frame should also see the next generation Intel processor architecture, code name Nehalem. It is unclear as to how Nehalem systems available in late 2008 will compare to a 4-way Dunnington server with a total of 24 cores.

 

SAS RAID Controllers

Many of the first generation SAS RAID controllers are based on the Intel 80333 IO processor. There is now a next generation 8134x family of IO processors with improved processing and memory bandwidth. None of 80333 generation SAS RAID controllers could drive more bandwidth than available on a x4 PCI-E port. The major system vendors seem to be slow in adopting the new generation 8134x controllers. It is unclear whether the new generation IO controllers can drive more bandwidth than provided by a single x4 PCI-E port.

 

Memory

For SQL Server 2000, there are many considerations, like Enterprise Edition versus Standard Edition, 3GB and PAE. For SQL Server 2005, especially full 64-bit, my recommendation is to fill the DIMM sockets with 2GB DIMMs. When 4GB DIMMs come down in price to around parity in cost per GB with 2GB DIMMs, the recommendation changes to 4GB. Some systems accommodate additional memory cards. Be sure to order the system with all memory cards installed. Trying to get it afterwards can be a real pain, especially if the cost is about $100.

 

IO Bandwidth

I will discuss storage performance in more detail later. In brief, the game is brute force performance. This is achieved with many physical hard drives distributed over several IO controllers. Consider 4-8 PCI-E SAS controllers or FC HBAs as the system allows. Start with 1 rack of (10-25) external disks per controller, adding a second rack per controller only when all PCI-E slots have been filled with disk controllers.

 

4-way TPC-C Performance History

Below are a collection of TPC-C results for SQL Server on 4-way systems from 1998 to 2007.

Pentium Pro 200MHz, 1M L2              12,106,            Mar 1998

Pentium II Xeon 400MHz, 1M L2        20,434,            Nov 1998

Pentium III Xeon 900MHz, 2M L2       39,158,            Sep 2001

Xeon MP 1.6GHz, 1M L3                   61,564,            Aug 2002

Xeon MP 3.0GHz, 4M L3                   102,667           Mar 2004

Xeon MP 3.66GHz, 1M L2                 141,504           Apr 2005

Xeon 7140 3.4GHz, 16M L3               318,407           Oct 2006

Xeon 7350 2.93GHz, 2x4M                 407.079           Nov 2007

 

Note: the 4-way Xeon 7350 performance is somewhat disappointing. There is a result for DB2 of 516K, so the expectation is that SQL Server 2005 should be reasonably close.

 

The main point of revisiting history is to understand the rapid pace of performance progress. It simply does not make sense to buy an expensive system with a great deal of headroom for growth. It is much more cost effective to buy sufficient performance for the next 2 years, and have program for periodic system replacement. One strategy is to replace the main line business system with the latest technology every year or two. Then rotate the replaced system to a secondary task.

  

Published Thursday, December 20, 2007 11:47 AM by jchang
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Linchi Shea said:

Great to see you blogging here Joe!

These are interesting times for selecting database servers. In my personal opinion, the current Harpertown servers from the tier 1 system suppliers are mismatched with 1066MT/s FSBs. There is great (not sure if justified) expectation that Harpertown and Seaburg would make a much better package.

Any reason you didn't mention any of the 16-core blades (e.g. BL680c G5)?

Despite the recently reported trouble with AMD, my money is on Barcelona when it comes to the current and near-future crop of 16-core servers for running compute-constrained database applications.

December 20, 2007 1:29 PM
 

jchang said:

just to be clear,

Clovertown is the codename for Xeon 5300, quad core, 65nm, 2-way

Harpertown is the codename for Xeon 5400, quad core, 45nm, 2-way

Tigerton is the codename for Xeon 7300, quad core 65nm, 4-way

Clovertown & Harpertown are currently at 1066 or 1333MT/s FSB with the 5000P/X chipset, Harpertown has 1600MT/s FSB SKUs for the 5400 chipset,

as soon as system vendors rev their 2-way systems

4-way Tigerton & the 7300 chipset is 1066MT/s with a dedicated bus per socket, compared with 2 sockets per bus on the 8501 chipset

I do not think this is a huge issue,

my bigger issue is that the 7300 has the same 4 FB-DDR2 channels as 5000P & 5400, I really think the 4-way needs 8 memory channels, same as Opteron

but it was probably already very difficult for Intel to squeeze 4 FSB, 4 FB-DIMM, + 28+4 PCI-E lanes into a single MCH

this is really the end of the road for MCH, Nehalem will finally get Intel on an integrated memory controller

I really do not expect 65nm Barcelona to be competitive, even if AMD get frequency to 3GHz, all the indications are 45nm Harpertown can easily do 3.6-4.0 GHz

When AMD get 45nm Barcelona fully up to speed, it will be competing against Nehalem, which is a big unknown

realistically, the Opteron/Barcelona core is old technology (2003),

4 years is the absolute max a given core can be flogged,

a completely new core architecture is due or overdue

December 20, 2007 2:14 PM
 

Linchi Shea said:

We can talk about Barcelona in private. But no need for a 65nm Barcelona to be in 3GHz to be competitive with the Xeon 5400 and 7300 series at around 3GHz.

> There is no point dropping down to the lower priced E7330 2.4GHz as the

> complete solution price differential does not justify a 20% lower frequency.

I don't agree with this at all. E7350's TDP is 130 watts, and E7330 is 80 watts. That's a huge huge difference -- a big deal. In large corporate data centers, that difference translates into a huge cost difference. If I recommend E7350, I'd be kicked out of the door.

December 20, 2007 2:51 PM
 

jchang said:

regarding power,

I am assuming for the main database server, there are several web/app servers connected to it, and the server has significant storage, ie, 60-120 disks

in which the delta between 130 and 80 watts, times 4 for each proc = 200 is negligible in the total environment

now there are a number of big companies that will buy a whole bunch of identical servers, and use them for all tasks, now this is stupid,

for the low intensity app, use VM and/or use the lower power procs,

also note when Intel classes a proc at 80 or 130W, it doesn't really mean that what it consumes, even at full load. sometimes they are just messing with you

that being said, I do not think the blades have sufficient IO capability

if you are doing a 4-way quad core, i really think you ought to configure 2-3GB/sec sequential disk IO capability, you will not get this from a blade

if you db is essentially in memory, then no problem

December 20, 2007 10:11 PM
 

Linchi Shea said:

> the delta between 130 and 80 watts, times 4 for each proc = 200 is

> negligible in the total environment

Well, you have to consider the power supply configuration in a data center. There is a limit in terms of watts per square footage. The difference can mean not being able to fully populate a rack, wasting expensive data center real estate. It may not be negligible when many companies are running out of data center space.

> now there are a number of big companies that will buy a whole

> bunch of identical servers, and use them for all tasks, now this is stupid

Well there is good economics in standardizing on a small selection of servers. Administartive cost can far outweigh hardware cost when one has to manage a disparate array of servers. The trend in large automated data centers is towards homogeneity.

> I do not think the blades have sufficient IO capability

It depends on the app requirements. Take HP BL680c G5 as an example. It has three PCI-Express slots (two 8x and one 4x). So using the two 8x slots, you can theoretically get 2GB/sec (which can be balanced with 4Gb HBA ports). Is that enough? In my experience, 90% or more of the database apps would be okay with that. And for the rest of the I/O intensive database apps, of course, you wouldn't choose a blade (or at least not a BL680c type of blade).

December 21, 2007 12:36 AM
 

James Luetkehoelter said:

Hi Joe,

Nice to see you here!

Question though (and a slight disagreement) - for a pure database server, how often is it processor bound? Whenever I run into a performance issue on a database-only server, it is I/O bound (whether that be a lack of RAM or disk speed).

Thoughts?

December 21, 2007 6:48 AM
 

Linchi Shea said:

James;

I guess it depends on the app and your experience. I have seen processor-bound database apps from time to time. Maybe not as often as I/O bound ones. These apps typically have business logic in their database code (e.g. stored procedures).

December 21, 2007 7:15 AM
 

James Luetkehoelter said:

Linchi:

Good point - and the will probably increase with the SQLCLR. I have a lot of customers that are sort of at that middle tier - if they do have SAN, it's a small one (or misconfigured). So, I see a lot of I/O bound performance problems. Those are the organizations that seem to need the most help, thus that is the bulk of my client base (although I have a few large ones and have seen exactly what you describe -rare though).

December 21, 2007 8:14 AM
 

jchang said:

People are frequently disk IO bound.

This is because they configured storage to capacity and not performance,

people frequently have a SAN with 1 or 2 FC connections, and 1 rack of 300GB disks.

This is why I strongly advise 4-8 dual port 4Gbs FC HBA or 4-8 SAS RAID controllers.

Between 120-240 physical disks total is usually sufficient for brute force capability.

Its also why I think the default system should be a 2-way quad core.

It is not easy building an app and configure storage to saturate more than 8 cores.

Standardization of systems is good, but the main line of business database server is the exception, it gets whatever it needs

December 21, 2007 10:07 AM
 

MonkeyBoy said:

jchang: "...but the main line of business database server is the exception, it gets whatever it needs"

QFT.  "That's no moon.  Its a space station!"

December 23, 2007 5:55 PM
 

Dick Brodine said:

You might be interested in "Mathematical Server Sizing" software, freely available on SourceForge. Here's a link.

http://sourceforge.net/search/?type_of_search=soft&type_of_search=soft&words=mathematical+server+sizing

Be sure and check out the documentation with this project. It describes some sample sizings. For a formal description of the math involved, see IEEE "Computer", July 2006 issue.

Good luck!

May 21, 2008 10:09 AM

Leave a Comment

(required) 
(required) 
Submit

About jchang

Reverse engineering the SQL Server Cost Based Optimizer (Query Optimizer), NUMA System Architecture, performance tools developer - SQL ExecStats, mucking with the data distribution statistics histogram - decoding STATS_STREAM, Parallel Execution plans, microprocessors, SSD, HDD, SAN, storage performance, performance modeling and prediction, database architecture, SQL Server engine

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement