Server sizing strategies using requirement driven approaches tend to go in to inane details involving substantial effort with no clear conclusions at the end. By considering only the systems most suitable for line-of-business database servers, there are just a few meaningful choices, making for a much simpler decision process. The key performance criteria for database servers are processor, memory and IO capability. The relevant processor lines are the Intel Xeon, AMD Opteron, and the Intel Itanium 2. For the current (and past) Intel processor lines, memory and IO capability are determined by the chipset. There are only a few chipset choices for any given processor arrangement (1, 2 and 4-way, etc). The systems that make the most sense for database servers employ the chipset with the best memory and IO, and implement the full memory and IO capability. For the AMD Opteron line, memory and IO are integrated into the processor, so this capability is largely determined by the number of processors. Hence there are very few system choices that actually warrant consideration. Recommended system examples for Dell and HP are given. There are comparable IBM systems, but are not cited here.
Nearly all server systems today use Intel Xeon or AMD Opteron processors. In the 2003-2004 timeframe, the performance competition between (single core) NetBurst architecture Xeon and Opteron processors was very close. With the introduction of dual-core processors in 2005, AMD pulled solidly ahead, in part because the NetBurst processors had to be throttled below electrical limits to keep within a 130W thermal envelope. In June 2006, the Intel Core 2 architecture established a clear performance lead, but was available only for 1-way and 2-way systems. In November 2006, the quad core Xeon 5300 series for 2-way systems closed much of the performance gap relative to 4-way dual core systems. At end of 2007, the meaningful server system choices for most environments are a 2-way based on the quad core Xeon 5400 series processor or a 4-way system based on the quad core Xeon 7300 series. Sometime in 2008, Barcelona quad core systems may be a viable choice if AMD can get frequency into the 3GHz range to be competitive with 3GHz Core 2 architecture processors.
Itanium has not kept pace with performance progress in the X64 side. It is seriously disappointing that Itanium 2 processors in 2007 are still on the 90nm process, when Xeon processors are beginning to transition to 45nm. The 90nm dual core Itanium cannot compete against 45nm quad core Xeon on performance per socket. However, chipsets with extraordinary capability, including the HP sx2000, have enabled Itanium systems to carve out a niche in the high-end where either very large memory capacity or exceptionally high IO bandwidth are required.
The recommended 2-way systems as of December 2007 are the Dell PowerEdge 2900 IIII, and the HP ProLiant ML370G5. The preferred processors are the quad core Intel Xeon X5460 3.16GHz with 2x6M L2 cache or the E5450 2.83GHz. The Xeon 5400 series are manufactured on the new 45nm process, with some minor architectural enhancements, slightly higher frequency and larger cache over the 65nm Xeon 5300 series with 2x4M cache. Note these two specific systems are named and not the comparable high density 2U systems because of greater memory capacity and IO capability in terms of the number of available PCI-E x4/8 slots. The large number of internal disk bays also make these systems suitable for entry non-clustered environments.
Technically, the ProLiant ML370G5 is the best system because it implements the full capability of the Intel 5000P chipset, with 64GB maximum memory over 16 DIMM sockets and 6 x4 PCI-E slots, compared with 48GB max memory and 1x8 + 3x4 for the PowerEdge 2900 (both have 1 x4 slot occupied by an included SAS/RAID controller). None of the current generation storage IO adapters (SAS RAID controllers or FC HBAs) can use more bandwidth than provide by a x4 PCI-E port, so there is no value in having 1x8 PCI-E over 2x4 slots in current generation database servers. The PowerEdge 2900 is more suitable for situations that do not require the maximum memory or IO configuration, especially considering Dell list prices are very attractive.
The recommended 4-way systems as of December 2007 are the Dell PowerEdge R900 and the HP ProLiant DL580G5. The preferred processor is the quad-core Xeon 7350 2.93GHz with 2x4M L2 cache. There is no point dropping down to the lower priced E7330 2.4GHz as the complete solution price differential does not justify a 20% lower frequency. Both of the Dell and HP 4-way quad core systems are built around the Intel 7300 chipset. The 7300 supports 32 DIMMs for a maximum memory capacity of 128GB with 4GB DIMMs and 256GB with 8GB DIMMs. The IO configuration is 4 x8 and 3 x4 PCI-E slots. The 7300 chipset actually has 7 x4 PCI-E lanes and an ESI port (equivalent to one x4 PCI-E). PCI-E expander chips are used to create PCI-E slots with shared bandwidth. This not an issue except that load should be properly distributed across the dedicated PCI-E ports.
Itanium systems to consider for the high-end include the HP Integrity 8-way rx7640 and 16-way rx8640, with maximum memory of 256GB and 512GB respectively. Both systems use the sx2000 chipset and have extraordinary memory and IO bandwidth, far beyond what can be achieved in Xeon or Opteron systems. A table scan was observed at 15GB/sec. There is some question as to whether the 4-way rx6600 with the zx2 chipset should be considered over the 4-way Xeon or Opteron systems. The Itanium has the best 4-way dual core TPC-C result for SQL Server, probably due to the very large cache (2x12MB) and hyper-threading, but cannot match the 4-way quad core Xeon. One reason for considering Itanium is to realize the benefits of full 64-bit operating system and application when SQL Server version must be 2000 and not 2005 or later.
Microsoft did (still does?) recommend 4-way (and even 8-way systems during the Pentium III/ProFusion era) as the standard for SQL Server. From what has been observed in recent years, the default recommendation with no effort to analyze requirements should now be the 2-way quad core system. If there are indications the workload load is too heavy, tuning alone may be sufficient to resolve the issue. Even if it is necessary to replace the 2-way with a 4-way quad core or larger system, the original 2-way did not cost very much and can be used for other purposes.
Only in special situations with known exceptionally heavy load and big budgets is it actually beneficial to analyze performance requirements to determine the correct solution. The reason for this is that the cost of doing a proper sizing analysis will cost much more than a 2-way or even 4-way system. It makes more sense just to buy a 2-way quad core system that can handle most tasks. Another reason for defaulting to the 2-way system is that Intel introduces new technology in 2-way systems with about a one year lead over 4-way systems.
For people considering the 4-way quad core system (16 cores in all), SQL Server 2005 or later and the 64-bit version at that is strongly suggested. SQL Server version 2000 is too old and has too many issues in working with 16 schedulers, especially parallel execution plans for data warehouse environments.
AMD Opteron 8200 and Intel Xeon 7100 series
Even though the dual core Opteron have been out classed by quad-core Xeons at the socket level, for most of 2007, the 4-way Opteron system still had leading edge performance, specifically in the TPC-H benchmark. The 4-way Opteron system had excellent memory performance with a total of 8 DDR2 memory channels when all four processor sockets are populated compared with 4 for the Intel systems. Both the Dell and HP 4-way Opteron systems featured 3x8 and 4x4 PCI-E ports, providing the best nominal IO bandwidth.
The 4-way Intel Xeon 7140 system based on the old NetBurst processor architecture did achieve the best 8-core X64 TPC-C score of 318K versus 263K for a 4-way dual core Opteron 8220 and 251K for a 2-way quad core Xeon X5365. This can probably be attributed to the very large 16M L3 on-die cache compared with 1M per core for the Opteron and 4M per 2 cores on the X5365 and the NetBurst Hyper-Threading feature which helps high call volume applications (like TPC-C) but not in other areas. So the 4-way dual core systems do have a modest edge over the 2-way quad core. The 2-way quad core system is recommended on cost grounds for most situations.
As noted earlier, the original Opteron single core and both Opteron dual core processors were highly competitive if not industry leading in its day. Barcelona was supposed to keep AMD competitive by integrating four cores on a single die in combination with the transition from 90nm to 65nm manufacturing process. Barcelona appear to have moderate improvements over the previous generation Opteron in core architecture. The expectation is that single core performance between the 90nm Opteron and 65nm Barcelona should be comparable at the same frequency. Since Opteron reached 2.6-2.8GHz relatively quickly on 90nm, the expectation is that Barcelona on 65nm should reach 3.5GHz at a comparable stage. There are currently very few published Barcelona benchmark results (of course, Intel has not been profuse with 5400 or 7300 benchmark results either). The indications are that Barcelona will need to be around 3GHz to be competitive with the Xeon 5400 and 7300 series at around 3GHz. AMD is trying to achieve 2.5GHz in early 2008, with higher frequencies later in 2008. The window of opportunity for 65nm Barcelona to be competitive with the current Xeon 5400 and 7300 processors would probably close in late 2008 when the next generation Intel 45nm processors are released.
Intel Seaburg Chipset (5400)
The Intel 5400 chipset arrived about the same time as the Xeon 5400 series processors. The 5400 chipset supports both 65nm and 45nm Xeon 5100-5400 processors. Improvements in the 5400 over the 5000P/X chipset include an increase in the number of PCI-E lanes from 24 to 36 for 9 x4 PCI-E ports (in addition to the ESI port). Maximum memory capacity is increased to 128GB over 16 DIMM sockets, meaning 8GB DIMMs are required for max memory capacity. The maximum practical memory remains 64GB until the 8GB DIMM prices becomes reasonable, probably in the 2009 time frame. The PCI-E lanes configured as x16 ports supports the new PCI-E generation 2 specification 5Gbps signaling, compared with 2.5Gbps in gen 1. For now, this feature is only for workstation graphics and specialty HPC adapters. Some time in the future we may see x4 PCI-E gen 2 slots at 5Gbps.
Both the PowerEdge 2900 and ProLiant ML370G5 retain the older 5000P or X chipset. Neither vendor felt it was necessary to transition the workhorse server platform to the new 5400 chipset, even though the increased number of PCI-E ports would be a highly welcome improvement.
Core 2 Processors
As mentioned earlier, the first Core 2 processors were manufactured on a 65nm process and featured 4M L2 cache shared between two cores. The second generation Core 2 processors are 45nm with 6M L2 cache. The 45nm processors are only available in the Xeon 5400 and 5200 series. The Xeon 7300 series for 4-way systems will use the 65nm processors until late 2008. Right now, this is not much of an issue as the top Xeon 5400 series frequency (with 1333MHz FSB) is 3.16GHz compared with 2.93GHz for the top Xeon 7300 series. It is possible that the top 45nm 5400 frequency could be pushed higher, while the top 65nm 7300 will mostly likely stay put. The 50% larger cache on the 45nm version would be helpful in high call volume applications on 4-way systems, but this difference is not sufficient to be an absolute requirement. In late 2008, Intel will launch a 45nm processor based on the Core 2 architecture, codename Dunnington, for 4-way systems. Details released show a single die six core processor. The current Intel quad core processors are comprised of 2 dual core die in a single package, or processor socket. Each pair of cores will have 3M L2 cache. The entire die will have a large 12-16M L3 cache shared between all cores. The expectation is this strategy should benefit throughput oriented applications. The late 2008 time frame should also see the next generation Intel processor architecture, code name Nehalem. It is unclear as to how Nehalem systems available in late 2008 will compare to a 4-way Dunnington server with a total of 24 cores.
SAS RAID Controllers
Many of the first generation SAS RAID controllers are based on the Intel 80333 IO processor. There is now a next generation 8134x family of IO processors with improved processing and memory bandwidth. None of 80333 generation SAS RAID controllers could drive more bandwidth than available on a x4 PCI-E port. The major system vendors seem to be slow in adopting the new generation 8134x controllers. It is unclear whether the new generation IO controllers can drive more bandwidth than provided by a single x4 PCI-E port.
For SQL Server 2000, there are many considerations, like Enterprise Edition versus Standard Edition, 3GB and PAE. For SQL Server 2005, especially full 64-bit, my recommendation is to fill the DIMM sockets with 2GB DIMMs. When 4GB DIMMs come down in price to around parity in cost per GB with 2GB DIMMs, the recommendation changes to 4GB. Some systems accommodate additional memory cards. Be sure to order the system with all memory cards installed. Trying to get it afterwards can be a real pain, especially if the cost is about $100.
I will discuss storage performance in more detail later. In brief, the game is brute force performance. This is achieved with many physical hard drives distributed over several IO controllers. Consider 4-8 PCI-E SAS controllers or FC HBAs as the system allows. Start with 1 rack of (10-25) external disks per controller, adding a second rack per controller only when all PCI-E slots have been filled with disk controllers.
4-way TPC-C Performance History
Below are a collection of TPC-C results for SQL Server on 4-way systems from 1998 to 2007.
Pentium Pro 200MHz, 1M L2 12,106, Mar 1998
Pentium II Xeon 400MHz, 1M L2 20,434, Nov 1998
Pentium III Xeon 900MHz, 2M L2 39,158, Sep 2001
Xeon MP 1.6GHz, 1M L3 61,564, Aug 2002
Xeon MP 3.0GHz, 4M L3 102,667 Mar 2004
Xeon MP 3.66GHz, 1M L2 141,504 Apr 2005
Xeon 7140 3.4GHz, 16M L3 318,407 Oct 2006
Xeon 7350 2.93GHz, 2x4M 407.079 Nov 2007
Note: the 4-way Xeon 7350 performance is somewhat disappointing. There is a result for DB2 of 516K, so the expectation is that SQL Server 2005 should be reasonably close.
The main point of revisiting history is to understand the rapid pace of performance progress. It simply does not make sense to buy an expensive system with a great deal of headroom for growth. It is much more cost effective to buy sufficient performance for the next 2 years, and have program for periodic system replacement. One strategy is to replace the main line business system with the latest technology every year or two. Then rotate the replaced system to a secondary task.