SQL Server waits enable you to identify where SQL Server is spending a lot of time doing nothing but waiting. This allows you to quickly zoom in and focus on the problem areas that are most relevant to the root cause, and will most likely help cut down time you may otherwise waste on, for instance, irrelevant perfmon counters or SQL trace events. No doubt these waitstats are extremely useful, and I’ve had good successes using them in a number of projects.
To find whether there is any processor pressure with SQL Server waitstats, Microsoft suggests that you look at signal waits as a percentage of the total waits. High percentage of signal waits suggests processor pressure (e.g. http://blogs.msdn.com/sqlcat/archive/2005/09/05/461199.aspx).
So I was rather surprised to find literally no waits (using track_waitstats_2005 for instance) when I applied a known CPU-bound workload to a four-processor server with 16GB of physical memory. Not even a blip on the SQL Server waitstats radar screen while all the processors were being driven to 100% utilization and there was a sustained long processor queue (>= 20), and SQL Server was processing more than 5000 transactions per second! The behavior was the same for both SQL Server 2000 and SQL Server 2005. In the case of SQL Server 2005, sys.dm_os_schedulers showed that the number of runnable tasks per scheduler was constantly greater than 15, yet another strong indicator of significant processor pressure.
So where was the wait that I was expecting to point me to processor pressure?
Before trying to answer this question, let me briefly describe the workload used in the tests. To help study pure host performance without any bother by difficult-to-control differences in any non-host system components such as disk storage and network, I often run tests--among other kinds of tests--with a processor-bound workload generated as follows:
-
The test database is created to be smaller than the amount of memory allocated to the SQL instance. This is essentially a TPC-C database scaled to fit entirely in memory.
-
Each client executes fairly simple read-only queries that randomly touch all the database pages. The queries are essentially the TPC-C OrderStatus and StockLevel transactions.
-
The level of load is controlled with the number of clients (i.e. threads running on a remote machine). The more clients, the heavier the load up to a point.
In a nutshell, this is an in-memory read-only SQL Server workload to exercise the processors, a subset of physical memory, and their interconnects.
In the actual tests, the number of clients (i.e. threads) was set to 100, each with its own SQL Server connection. The database was about 8GB and 12GB out of the 16GB physical memory was allocated to the SQL instance. The workload kept all the processors very busy without exhausting all the SQL Server workers, and after the initial ramp up, all the database pages were cached in memory; there was then absolutely no storage I/O.
Microsoft’s published literature on SQL Server waitstats describes a simplified execution model in which SQL Server sessions rotate through the following statuses: Running, Runnable (waiting for CPU), and Suspended (waiting for resources such as locks and I/O). For more details, see Tom Davidson’s excellent article SQL Server 2005 Waits and Queues. Conceptually, sessions in suspension are on so-called Wait List, and SQL Server waitstats are the statistics with respect to Wait List. SQL Server provides a long list of different wait types describing different resources for which a session may be waiting while it's on Wait List.
Note that in this execution model, waiting for CPU is accounted for by the Runnable status, separate from Wait List, and waiting for CPU does not appear to be one of the wait types you can find in DBCC SQLPERF(waitstats) with SQL Server 2000 or in sys.dm_os_wait_stats with SQL Server 2005. In addition, signal waits are always obtained on a per wait type basis, or via some wait type(s).
So if a session waits very little, or does not wait at all, on any of these wait types, SQL Server would report little to no signal wait on that session no matter how much time the session may have been waiting in the Runnable status. And that seems to be the case with the SQL Server sessions in my in-memory read-only workload. Checking the sessions during the tests confirmed that they were either running, runnable, or sleeping.
What does this tell us? Well, it’s clear that you can’t rely on SQL Server waitstats alone to tell you whether you have CPU pressure. There is a reason Microsoft calls its performance tuning methodology SQL Server Waits and Queues instead of just SQL Server Waits, and Microsoft is very specific about using waitstats together with perfmon counters, DMVs, and other data. Personally, I’d prefer to see some wait types to directly account for this situation. After all, CPU is a type of resource, and waiting for CPU is a type of wait.
Now a word of caution. What I have described here is not from any internal SQL Server knowledge. It's what I observed empirically in my tests. I don't know why SQL Server waitstats don't directly capture or represent waits for CPU in this particular scenario, and I'd be very happy if anyone can shed some light on that.