THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

CPU Pressure is Huge, But Where is the Wait?

SQL Server waits enable you to identify where SQL Server is spending a lot of time doing nothing but waiting. This allows you to quickly zoom in and focus on the problem areas that are most relevant to the root cause, and will most likely help cut down time you may otherwise waste on, for instance, irrelevant perfmon counters or SQL trace events. No doubt these waitstats are extremely useful, and I’ve had good successes using them in a number of projects.

 

To find whether there is any processor pressure with SQL Server waitstats, Microsoft suggests that you look at signal waits as a percentage of the total waits. High percentage of signal waits suggests processor pressure (e.g. http://blogs.msdn.com/sqlcat/archive/2005/09/05/461199.aspx).

 

So I was rather surprised to find literally no waits (using track_waitstats_2005 for instance) when I applied a known CPU-bound workload to a four-processor server with 16GB of physical memory. Not even a blip on the SQL Server waitstats radar screen while all the processors were being driven to 100% utilization and there was a sustained long processor queue (>= 20), and SQL Server was processing more than 5000 transactions per second! The behavior was the same for both SQL Server 2000 and SQL Server 2005. In the case of SQL Server 2005, sys.dm_os_schedulers showed that the number of runnable tasks per scheduler was constantly greater than 15, yet another strong indicator of significant processor pressure.

 

So where was the wait that I was expecting to point me to processor pressure?

 

Before trying to answer this question, let me briefly describe the workload used in the tests. To help study pure host performance without any bother by difficult-to-control differences in any non-host system components such as disk storage and network, I often run tests--among other kinds of tests--with a processor-bound workload generated as follows: 

  • The test database is created to be smaller than the amount of memory allocated to the SQL instance. This is essentially a TPC-C database scaled to fit entirely in memory. 
  • Each client executes fairly simple read-only queries that randomly touch all the database pages. The queries are essentially the TPC-C OrderStatus and StockLevel transactions.
  • The level of load is controlled with the number of clients (i.e. threads running on a remote machine). The more clients, the heavier the load up to a point.

In a nutshell, this is an in-memory read-only SQL Server workload to exercise the processors, a subset of physical memory, and their interconnects.

 

In the actual tests, the number of clients (i.e. threads) was set to 100, each with its own SQL Server connection. The database was about 8GB and 12GB out of the 16GB physical memory was allocated to the SQL instance. The workload kept all the processors very busy without exhausting all the SQL Server workers, and after the initial ramp up, all the database pages were cached in memory; there was then absolutely no storage I/O.

 

Microsoft’s published literature on SQL Server waitstats describes a simplified execution model in which SQL Server sessions rotate through the following statuses: Running, Runnable (waiting for CPU), and Suspended (waiting for resources such as locks and I/O). For more details, see Tom Davidson’s excellent article SQL Server 2005 Waits and Queues. Conceptually, sessions in suspension are on so-called Wait List, and SQL Server waitstats are the statistics with respect to Wait List. SQL Server provides a long list of different wait types describing different resources for which a session may be waiting while it's on Wait List.

 

Note that in this execution model, waiting for CPU is accounted for by the Runnable status, separate from Wait List, and waiting for CPU does not appear to be one of the wait types you can find in DBCC SQLPERF(waitstats) with SQL Server 2000 or in sys.dm_os_wait_stats with SQL Server 2005. In addition, signal waits are always obtained on a per wait type basis, or via some wait type(s).

 

So if a session waits very little, or does not wait at all, on any of these wait types, SQL Server would report little to no signal wait on that session no matter how much time the session may have been waiting in the Runnable status. And that seems to be the case with the SQL Server sessions in my in-memory read-only workload. Checking the sessions during the tests confirmed that they were either running, runnable, or sleeping.

 

What does this tell us? Well, it’s clear that you can’t rely on SQL Server waitstats alone to tell you whether you have CPU pressure. There is a reason Microsoft calls its performance tuning methodology SQL Server Waits and Queues instead of just SQL Server Waits, and Microsoft is very specific about using waitstats together with perfmon counters, DMVs, and other data. Personally, I’d prefer to see some wait types to directly account for this situation. After all, CPU is a type of resource, and waiting for CPU is a type of wait.

Now a word of caution. What I have described here is not from any internal SQL Server knowledge. It's what I observed empirically in my tests. I don't know why SQL Server waitstats don't directly capture or represent waits for CPU in this particular scenario, and I'd be very happy if anyone can shed some light on that.

 

Published Sunday, December 31, 2006 1:20 AM by Linchi Shea

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

JasonM said:

I have see similar behavior. I guessed that the cpu queue was from AWE and privledged CPU time. It was taking 15-20% of the total.

January 2, 2007 12:42 PM
 

Mario said:

I'm quite surprised that you want to use wait statistics to actually say something about non-wait situations..As you found out, using CPU is something else than waiting. In many cases it just adds up: response time=cpu time+ wait time. If you have a system that's 100% cpu bound, there isn't too much sense in trying to find out what the box is waiting for. Except for CPU: and that's exactly what your processor queue is telling you.

What Davidson is telling you with high 'signal time', is that after a wait event end, it takes 'x millieseconds' to get scheduled on a SQLServer worker thread. That is scheduling pressure within Sqlserver. What you are observing with 100% cpu usage and a high processor queue is external CPU pressure. SO if you don't wait much on Sqlserver events, like you do when you are doing only logical I/O's, looking at wait events doesn't say anything. And that's exactly why you should always see wait time in context of cpu time/usage.

I think.

March 18, 2007 2:16 PM
 

Mario said:

..I just read your last remark a bit better, and yes, it would be nice to have one view with both CPU and wait data, so see a nice breakdown immediately, and know where to start looking for bottlenecks. But since CPU (OS scheduler) waits are not (SQLOS) scheduler waits, sys.dm_os_wait_stats is not the place to put them. ANd to make things worse: Windows doesn't measure time spend on the CPU scheduler queue, only the scheduler queue size (depth). It takes one of the Unixes to get that unfortunately..

One question for you too: didn't you see: SOS_SCHEDULER_YIELD waits ?

Mario

March 18, 2007 3:08 PM
 

Linchi Shea said:

When it comes to SOS_SCHEDULER_YIELD waits, I'm getting inconsistent results (well inconsistent to me anyway and I could be missing something that may explain it). Basically, on 32-bit SQL Server 2005, I don't see any SOS_SCHEDULER_YIELDS waits at all even though there is are substantial queues on the processors. On x64 SQL Server 2005, however, I do see significant SOS_SCHEDULER_YIELD waits most of the time, and because of SOS_SCHEDULER_YIELD waits, I see significant signal waits, thus the wait indicator of substantial processor pressure.

I don't know to what extent the 32-bit vs. x64 was signifcant or whether there is an issue with reporting signal waits on 32-bit SQL Server 2005.

March 26, 2007 10:37 AM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement