Every now and then, the question is asked “When will servers be powerful enough that performance tuning will not be necessary.” Sometimes I get the impression this was posed not on technical grounds, but rather that ISVs and CIOs don’t like performance specialists. Fortunately (for me) it does not appear that this will ever happen for two principal reasons: 1) hardware and tuning contribute in such completely different aspects that neither can be ignored, and 2) multi-core processors actually introduce new tuning requirements and strategies.
On the hardware side, performance improves over time with 1) processor compute capability, in terms of both single core and system aggregate performance, 2) memory capacity and 3) storage IO performance. Tuning largely involves the three categories: 1) improving the execution plans of top resource consuming queries, 2) balancing avoidable compiles and recompiles of high volume queries with having the right execution plan, 3) managing deadlocks, blocking and various resource bottlenecks.
In the past, processor performance increased roughly 40% per year at the single core level. Since the 2005 time frame, processor performance has increased mostly with increasing number of cores per socket than at the single core level. Over the last six years, single core performance has roughly doubled. However, the total aggregate performance of recent generation microprocessors is truly impressive. In the past, mainstay database servers had 2-4 processor cores. Today, processors with six or eight core are standard and the server system itself might have between 12 to 32 cores. The current AMD Opteron processors have 12 cores on 2 die in one socket, but the aggregate performance at the socket level seems to be comparable to the Intel Xeon with 6 cores.
Database performance will depend on whether an application can effectively use all of the available processor cores in a scalable manner. If so, then it might be possible to handle performance issues with brute force. More than likely, only a few elements of a complete application will naturally scale exceptionally well. Many elements may have moderate or marginal scaling. Key operations, particularly writes (Insert, Update and Delete) may be restricted to a single core. A complete database application with a rich functionality set cannot be completely handled with brute force compute power comprised of many processor cores.
Memory can dramatically reduce disk IO to manageable levels. This usually requires proper tuning to bring the active database into a limited set of pages that fit in the memory available for the buffer cache. Once upon a time memory was to extremely critical. When system memory was below 1GB, a major effort was required to bring disk IO to manageable levels by ruthlessly restricting queries to basic transactions only. A report run concurrently with transactions could seriously overload the disk system. By the year 2000 or so, system memory capacity with reasonably priced memory modules (usually half the maximum support memory module capacity) crossed the 3-4GB range. Around this time frame, the memory capacity was sufficient to support both transactions and some reporting activity.
This was followed by the tortured path to full 64-bit for unrestricted access to all system memory. Itanium, the first Intel microprocessor designed for 64-bit architecture, was repeatedly delayed and seriously disappointed on performance when it finally did arrive (2001?). The Intel Pentium Pro and later processors did have a band-aid capability with the Physical Address Extension (PAE) feature which let a 32-bit OS manage more than 4GB physical memory. Microsoft reluctantly (perhaps it was a cold day in hell) added Address Windowing Extensions so that applications like SQL Server could map memory outside the normal address range with functionality sufficient for the buffer cache.
The Intel Itanium II finally provided good performance on a full 64-bit operating system supporting a 64-bit version of SQL Server 2000. But acceptance of the Itanium platform was poor. A 64-bit OS did not become pervasive until after the AMD Opteron, and later Intel 64-bit Xeons (based on the Prescott core) in 2004. SQL Server with 64-bit on the more popular X86-64 architecture did not arrive until late 2005.
By this time system memory capacity was in the 32-64GB range, with 16-32GB being very affordable. This seemed to be sufficient to handle properly tuned databases even with rich functionality. Today, server systems can reach 1TB memory at a reasonable price. A 16GB memory module with ECC is around $800. The price for 64 modules (1TB) is just over $50,000. This is relatively low compared with the overall cost of an important database application. Few applications should actually require so much memory. The reality is that most environments have already reached the memory brute force realm.
Interestingly, IO performance, while weak at the single disk level, was the one component that could be scaled with massive numbers of disk drives. The key to implementing this strategy is low cost per disk amortized over the complete storage system. However, storage vendors were interested in selling extremely high value-add features, i.e., extraordinarily expensive disks, which directly conflicts with the performance strategy.
Now with solid-state storage technology sufficiently mature and competitive in price, IO performance has also reached the brute-force realm. Of course, this came only after system memory configuration was so massive that a well designed transaction processing database should not generate heavy IO anymore. Still, it is good to have capability to handle IO surges.
Tuning for Execution Plan Efficiency
On the tuning side, a combination of rewriting the SQL (or using plan hints when this is not possible) and building the right set of indexes allows the query optimizer to find efficient execution plans for the most important queries. This effort targets the top SQL statements, and it may be possible to achieve a very large performance gain on a handful of queries, with essentially no change on the majority of queries.
Note that the range of performance on the hardware side between contemporary mid-range (2 sockets, 12 cores total) and high-end systems (4-8 sockets, 32-64 cores total) is not more than 2-4X. In the old days, there were systems with 32-64 processor sockets, scaling was limited and could only be achieved with complete re-architecture of the application. The difference between a good execution plan and poor one might be one, two or even three orders of magnitude, which cannot be spanned with hardware. Only tuning can close this range, even if this is feasible for just a handful of SQL.
Tuning for Parallelism
As discussed earlier, server system today can have 12, 32 or even 64 processor cores. Achieving good scaling to all processor cores is not a simple matter. The Microsoft SQL CAT team has published strategies for reducing contention for key metadata structures and index root level access. Examples are creating multiple identical indexes on small tables that frequently updated to avoid contention for the index root level page, or partitioning to achieve the same effect.
However powerful modern server systems are in terms of compute, memory and IO, brute force capability is helpful, but does not replace good database architecture, and performance tuning. The impact characteristics from each are very different. Hardware alone cannot solve some performance issues, nor does endless performance tuning effort compensate for obsolete hardware.
Its always difficult to tell when something is an unbiased assessment versus an self-serving opinion. If only life were easy!