A frequently asked question is what counters should be included in a SQL Server baseline. The discussion then quickly proceeds to define a set of perfmon counters to be logged as the performance baseline. And often, people seem to have an urge to try to reduce the baseline to a small number of perfmon counters that may cover processors, memory, I/O, and some aspects of SQL Server such as page life expectancy and cache hit ratio.
Unfortunately, only in very rare cases will such a small set of perfmon counters give you a baseline that would be practically useful for troubleshooting or performance analysis. If your baseline consists of a small set of perfmon counters, sooner or later you'll be in a situation where you wish you had included this other counter. And every time, it may be a different counter whose values you want to use as part of the baseline.
So instead of playing this hit-and-miss game, I'd suggest you include a comprehensive set of perfmon counters in your baseline. In addition, you may also want to include some SQL traces, and snapshots of some DMVs including wait stats.
Is that enough? In other words, is the following a good foundation for capturing a useful SQL Server performance baseline?
•
A comprehensive set of perfmon counters
• Some SQL traces
• Snapshots of DMVs including wait stats
Well, not really! There is still a gaping hole. The main character is still missing from the show, which is the performance information about the application itself. It is fine to collect information on how much CPU is being used, whether there is any memory pressure, whether the I/O latency appears to be within the generally recommended range, whether SQL Server spends a lot of time waiting on log writes, and so on. But without being able to correlate these SQL Server or hardware resource related data points with the application performance behavior, you have not established a baseline that is relevant to the application.
It is critical to realize that when it comes to performance problems, it is not CPU, memory, disks, or SQL Server query processor that will complain. Rather, it is the application folks that can add great misery to your life. That is, your job is not there to solve SQL Server or hardware resource problems. You are there to help resolve application performance issues that these SQL Server or hardware resource problems may be causing or contributing to.
To that end, your performance baseline must include the application performance data points. Exactly what these application performance data points should include and how they should be collected and reported is a decision that must be made jointly between you and your application folks.
A straightforward example is that the application folks select from many OLTP queries some of the key queries as the benchmark queries, record the server processing times of these queries as part of the application,and expose the recorded times to you for inclusion into the baseline.
As another example, assuming that the applicaiton is a long-running database batch job, you can work with the application folks to identify the key steps or key processing units in the batch, and log their processing times--among other useful stats such as resultset size--for inclusion as part of the SQL Server baseline.
Or if the number of application transactions processed per second is a critical performance measure, that number must be exposed and included in the performance baseline. Blindly picking the SQL Server counter transaction/sec or batch requests/sec as a performance indicator of the application transaction processing may help derail your troubleshooting efforts with red herrings, to speak politely, and often a total waste, to put it more harshly.
The key point is that an application should be instrumented to expose performance metrics that can be used to characterize database performance as seen by the application. SQL Server performance baseline must then include these application metrics so that the SQL Server and hardware performance metrics can be precisely correlated with the application metrics.
Unfortunately, I have not seen this being practiced as often as I think I should. And I see valuable troubleshooting time, especially at the initial stages, repetitively wasted on the database folks trying to ascertain exactly what's the problem from the application perspective.