THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

Signal Waits and SQLOS Scheduler Pressure

What does it mean if you see a high percentage of signal waits? Thanks to Microsoft whitepapers, presentations, and blogs, everybody would say it implies CPU pressure.

 

Well, almost everyone except Mario Broodbakker, whose excellent blog “Taking the guesswork out of SQL Server performance profiling Part 2” shows us that SQL Server signal waits are in fact an indication of the SQLOS scheduler pressure, which may or may not be an indication of the CPU pressure as seen by the OS. I don’t know whether Mario Broodbakker was the first one who came up with the term ‘SQLOS scheduler pressure’, but I have not seen it used elsewhere or by anybody else.

 

I read this excellent blog a while ago. But I don’t think I appreciated it significance at the time, or else I would not have totally forgotten everything about it until Stuart Ozer of Microsoft brought it to my attention a few days ago. Well, there was a scenario where the OS CPU perfmon counters suggested that none of the processors was fully used, while at the same time SQLOS showed very high signal wait time with no apparent bottleneck on other resources.

 

The reason this appears to be contradictory is that if signal waits are truly an indicator of the CPU pressure, then SQL Server should be using more CPU time if it’s not waiting on any other resources. Clearly, something doesn’t add up.

 

Well, the key to this puzzle is Mario Broodbakker’s observation that signal waits are an indicator of the SQLOS scheduler pressure. Although the SQLOS scheduler pressure often translates into the CPU pressure, it’s not always the case. Mario Broodbakker showed a case where a CPU intensive query was run on two connections, they were both scheduled on the same SQLOS scheduler, but they were run on two different processor cores. Because the queries had to wait for each for their turns to run on the same scheduler, there was very high signal wait time. But at the same time, each of the two processors was only half used.

 

In addition to understanding the behavior conceptually, the best way to really appreciate the behavior is to see it in action. But if you are using a multi-socket multi-core machine such as an HP DL585 with four dual-core sockets, it may be hard to reproduce the behavior with Mario’s CPU intensive query (given in his blog referenced above).

 

To help you see the behavior, I’ve modified Mario Broodbakker’s query to the following (table t1 has a single column with 10,000 rows):

 

use mytest

go

exec begin_schedstats

go

exec begin_waitstats

go

select * from sys.dm_os_tasks where session_id = @@spid

go

waitfor delay '00:05:00'

SELECT COUNT(*)

  FROM t1 a   INNER LOOP JOIN t1 b

    ON b.c1 = a.c1

option (maxdop 1)

select * from sys.dm_os_tasks where session_id = @@spid

go

exec end_schedstats

go

exec end_waitstats

go

 

Follow this procedure to see the behavior:

 

  • First, fire up perfmon to display the % Processor Time counters for all the processors
  • Start a Query Analyzer instance, and open several QA connection to the same SQL instance to run the above query
  • For each connection, from the QA output, check to see what SQLOS scheduler the query is running. Note that sys.dm_os_tasks has a scheduler_id column.
  • If you don’t see any two connections sharing the same scheduler, open more QA connections to run the query and eventually SQL Server will wrap around to use the same scheduler.
  • If you see two connections running the query on the same scheduler, you are done. Abort all the other connections, and wait to see how these two connections drive the % Processor Time counter values on the perfmon display. At steady state, you should see two processors being driven to ~50% each, when this query should in fact drive each processor to 100% if different SQLOS scheduler is used.

 

I don’t know whether there is a guarantee that you’ll see the queries running on two different processor cores. But from my tests, I saw it in each and every of the test runs.

 

Mario Broodbakker’s message on the SQLOS scheduler pressure is not well known. It’s a very important message that should be promoted.

Published Saturday, February 09, 2008 5:59 PM by Linchi Shea

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Thomas Kejser said:

Hi Linchi

Great article. As you can see in my previous TechNote other workload may observe similar behavior

http://sqlcat.com/technicalnotes/archive/2008/04/09/resolving-scheduler-contention-for-concurrent-bulk-insert.aspx

April 10, 2008 7:41 AM
 

Viking said:

I am not sure if you still would read my comment on this old entry. Anyways, I still wanted to ask a question and maybe the question is from 20,000 feet height.

How do we size CPU’s for a DB server for an application which is being coded anew?

May 6, 2008 5:21 AM
 

Linchi Shea said:

Hi Thomas;

Excellent information in your SQLCAT blog post. Thanks for pointing it out.

May 26, 2008 9:49 PM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement