THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Adam Machanic

Adam Machanic, Boston-based SQL Server developer, shares his experiences with programming, monitoring, and performance tuning SQL Server. And the occasional battle with the query optimizer.

Seeing the Wait That Matters Most (A Month of Activity Monitoring, Part 15 of 30)

This post is part 15 of a 30-part series about the Who is Active stored procedure. A new post will run each day during the month of April, 2011. After April all of these posts will be edited and combined into a single document to become the basis of the Who is Active documentation.


The posts over the past two days delved into the internal workings of the query processor, specifically around tasks and the work-wait lifecycle. Understanding how tasks work is a key prerequisite for understanding some of the data most useful that Who is Active can return—so if you haven’t read those posts, go do that before reading this one.

The “waits and queues” performance tuning methodology involves tuning a system by looking at what processes are waiting on, rather than by focusing solely on performance counters and similar metrics. Waits and queues is generally applied in the SQL Server world on an instance-global level, using the sys.dm_os_wait_stats view. And while this can give you some insight into what your instance is spending its time doing, I’m personally much more interested in the real-time wait stats provided by some of the other DMVs—and that is, of course, the data that Who is Active returns.

Yesterday’s post was all about blocking, and the fact that a given request can have multiple waits reported by the DMVs. In the example I used for the post, most of these waits were CXPACKET. But in real-world queries there can be numerous different waits, all happening on different tasks that are busy doing various things on behalf of the query.

Initial versions of Who is Active returned information about all of the waits reported in the waiting tasks view, but I found that this was a bit too heavy in most cases. Not only did it slow down the stored procedure, but it also delivered a lot more information than many people were equipped to deal with. So at some point I introduced the current default model, the “lightweight” waits collection mode. This mode returns, at most, information about one wait per request; the most important wait, per the logic I’ve implemented. Here’s how it works:

  • All of the waits are evaluated and prioritized
  • CXPACKET waits—the parallelism waits—are discarded
  • Blocking waits get top priority
  • After that, priority is given to waits that have been outstanding for the longest amount of time
  • Any remaining ties are broken by ordering by the blocking session ID

The most controversial part of this system, based on the number of questions I’ve received, seems to be the removal of the CXPACKET waits. But for me it’s a relatively straightforward move. CXPACKET waits mean parallelism. Parallelism means CXPACKET waits. The coordinator thread will always wait on CXPACKET while the worker threads are busy. That’s just how the system is designed. Therefore, the longest waits for any given plan that is being processed in parallel will be waits on CXPACKET. Does that mean there’s a problem? No—not unless you feel that parallelism itself is a problem (and I most certainly do not).

If you have a parallel query that’s taking a long time, it's probably not due to excessive waiting on CXPACKET. While there is such a thing as a query that probably shouldn’t be using parallelism, there are no simple metrics or formulas that will help you figure that out based on the CXPACKET wait time. It’s really something that needs to be addressed on a case-by-case basis. A much better idea is to look at the other threads—those that aren’t waiting on CXPACKET—to see what they’re waiting on. The threads waiting on CXPACKET are waiting for data from those other threads, so it’s by looking at the other threads that you’ll find the actual performance culprits.

For these reasons, CXPACKET is discarded in the default waits collection mode. After getting rid of those waits, blocking waits are sent to the top of the list. Heavy blocking is certainly something that you want to be aware of—it absolutely will slow down your queries (and maybe even cause them to grind to a halt), and information about blocking is one of the core things that Who is Active is designed to show.

Assuming that there is no blocking, or even if there is blocking on two different threads and a priority is needed, the longer waits get pushed to the top. The reasoning is simple: longer waits slow down your query more than shorter waits. And you want to see the worst culprits when looking at wait information.

All of this sorting results in a single wait (or, on very rare occasions, multiple waits, in which case the tie is broken using a row number). Unlike the wait information in sys.dm_exec_requests, this wait does not arbitrarily correspond to the coordinator thread. It is instead the one wait that is most important for you to focus on at the time the data was collected. The wait that actually matters. Here is the blocking situation from yesterday as viewed through the eyes of Who is Active’s lightweight waits collection mode:

F15_53_is_blocked

 

Homework

Yesterday I asked you to come up with a query that would cause the same execution context ID to be reported as “blocked” multiple times in sys.dm_os_waiting_tasks. The key to solving this challenge starts with to looking at the plan for the query I showed yesterday. It contains only one exchange iterator, so not much is going on. As you the complexity of the query increases and multiple exchanges are needed, there may be cases where blocking iterators mean that a given thread cannot continue to run until all downstream tasks complete. This can cause a given execution context—usually 0, the coordinator—to be reported as waiting on all of the other execution contexts. The following query exhibits this behavior:

SELECT TOP(1000)
    *,
    ROW_NUMBER() OVER
    (
        PARTITION BY OrderQty
        ORDER BY ProductId
    ) AS r
FROM Sales.SalesOrderDetail
ORDER BY
    SalesOrderDetailId DESC

How well do you know your wait statistics? One CPU-related wait type that is quite commonly seen as a high wait in the sys.dm_os_wait_stats view will never show up in sys.dm_os_waiting_tasks. Can you name the wait type and explain why you’ll never see it?

Published Friday, April 15, 2011 11:00 AM by Adam Machanic

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Amit Banerjee said:

IMHO CXPACKET waits are always a victim of issues like outdated statistics, poor indexing or no indexes available, lack of statistics on the table etc. which leads to a poor query plan. SQL Server determines on parallel execution primarily based on "cost threshold for parallelism". To optimize a parallel query, you need to start with inspecting the query plan and looking for means to remove the huge number of rows fetched by the inner-most operators in the plan. This could be done either by updating statistics or by adding/modifying indexes or adding query hints. Changing MAXDOP value to a lower value is just a workaround to combat parallelism waits.

April 15, 2011 10:18 AM
 

Adam Machanic said:

Amit, Saying that CXPACKET waits are caused by "poor query plans" is equivalent to saying that parallelism is a bad thing. It is very rare these days to find "pure" OLTP systems of the type where parallelism isn't an appropriate choice for various queries, so this mindset is, in my opinion, a bit outdated. Personally I specialize in larger systems where parallelism is quite welcome. If I'm *not* seeing CXPACKET waits, that's a great indication that there is a major problem!

April 15, 2011 10:26 AM
 

Earnest Grass said:

At what point does CXPACKET sound the alarm for being too high?  Do you use this wait in tuning the MAXDOP settings?

April 15, 2011 11:51 AM
 

Adam Machanic said:

Earnest: 99% of the time, no. I don't bother looking at CXPACKET waits at all. I shared some thoughts on tuning MAXDOP here:

http://sqlblog.com/blogs/adam_machanic/archive/2010/05/28/sql-university-parallelism-week-part-3-settings-and-options.aspx

More in depth thoughts are available if you attend one of my sessions on parallelism at a conference or training event--and some more will be available in print later this year but I can't share specifics quite yet :-)

April 15, 2011 11:58 AM
 

TheSmilingDBA said:

Good explanation. Maybe you can break sp_WhoIsActive into multiple SPs. SP_WhoIsActiveBeginner, SP_WhoIsActiveIntermiediate and Advanced.

Thanks again - Thomas

April 15, 2011 2:33 PM
 

Adam Machanic said:

This post is part 16 of a 30-part series about the Who is Active stored procedure. A new post will run

April 16, 2011 10:06 AM
 

Amit Banerjee said:

My comment was not to indicate that all CXPACKET waits are a problem. There are situations where a query can run more efficiently without a parallel plan or with a parallel plan. It depends completely on the type of query being executed along with the type of indexes/statistis being used.

All I am trying to say here is that CXPACKET waits *cannot always* be ignored. There are situations where parallelism helps query execution and there are situations where parallelism is introduced due to the plan choice but a serial plan with query hints works out better. I have seen environments both ways.

April 18, 2011 6:28 AM
 

Adam Machanic said:

Hi Amit,

Indeed, it cannot always be ignored. Just usually ;-)

See post #16 for how to see CXPACKET waits with Who is Active.

April 18, 2011 10:32 AM
 

Adam Machanic said:

This post is part 30 of a 30-part series about the Who is Active stored procedure. A new post will run

April 30, 2011 11:45 AM
 

Reiner said:

Hi Adam and all,

I'm a bit desperate about my CXPacket Situation. I have a 24 Core HP machine (4x6 NUMA AMD) and incredibly high CXPacket waits. In addition to that, during peak hours I'm seeing the System\Context Switches\s count go to about 250000/s. MAXDOP is set to 6. I'd be really grateful for some hints as to how that situation could be changed.

WaitType Wait_S Resource_S Signal_S WaitCount Percentage

CXPACKET 5826475.42 5404313.02 422162.40 2866188281 89.39

BACKUPBUFFER 228008.21 227000.54 1007.67 4871571 3.50

SOS_SCHEDULER_YIELD 166674.22 158.94 166515.28 199137201 2.56

July 12, 2011 6:10 AM
 

Adam Machanic said:

Hi Reiner,

Looking at that data I'd say you have nothing at all to worry about. You have a lot of parallel queries--nothing wrong with that--and parallel queries mean CXPACKET. The waits themselves are miniscule on average:

5404313 seconds / 2866188281 waits == 0.00188 seconds

High context switches may or may not be a problem. How's performance? That's really all that matters at the end of the day.

--Adam

July 12, 2011 10:46 AM
 

Reiner said:

Hi Adam,

Well, I guess you and all the others saying CXPacket waits are just part of life finally have me convinced. Performance isn't bad considering all databases have several hundred million rows combined. (It's a DWH server). Runtimes are up to 14 hours but that is to be expected. I think it's now all about good SQL statistics, indexes and execution plans. We are going to test 'Cost Threshold for Parallelism' though and push it up to 35 to see whether that'll make a final difference.

Do you think the high signal time per wait makes a difference here?

wait_type waiting_tasks_count wait_time_ms max_wait_time_ms signal_wait_time_ms resource_wait_time_ms signal_wait_time_per_wait resource_wait_time_per_wait

REQUEST_FOR_DEADLOCK_SEARCH 120310 597703791 5190 597703791 0 4968 0

XE_TIMER_EVENT 19924 597702917 30194 597702682 235 29999 0

July 13, 2011 9:34 AM

Leave a Comment

(required) 
(required) 
Submit

About Adam Machanic

Adam Machanic is a Boston-based SQL Server developer, writer, and speaker. He focuses on large-scale data warehouse performance and development, and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "SQL Server 2008 Internals" (Microsoft Press, 2009) and "Expert SQL Server 2005 Development" (Apress, 2007). Adam regularly speaks at conferences and training events on a variety of SQL Server topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server, a Microsoft Certified IT Professional (MCITP), and an alumnus of the INETA North American Speakers Bureau.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement