THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Adam Machanic

Adam Machanic, Boston-based SQL Server developer, shares his experiences with programming, monitoring, and performance tuning SQL Server. And the occasional battle with the query optimizer.

Analyzing Tempdb Contention (A Month of Activity Monitoring, Part 21 of 30)

This post is part 21 of a 30-part series about the Who is Active stored procedure. A new post will run each day during the month of April, 2011. After April all of these posts will be edited and combined into a single document to become the basis of the Who is Active documentation.


Tempdb. Everyone’s favorite shared bottleneck.

The funny thing about tempdb is that it’s not used by every query. It’s only really used by the biggest queries. The queries where performance really matters. And of course, that makes the situation all the worse. When tempdb is a problem, it’s a major problem.

A common cause of tempdb issues is latch contention. This occurs due to three “special” page types: GAM (Global Allocation Map), SGAM (Shared Global Allocation Map), and PFS (Page Free Space). Each of these pages (sometimes many of each) exist in all database files, and they are responsible for helping to identify where incoming data can be written in the physical file. For complete background, read this post by Paul Randal.

Whenever a process needs to update one of these special pages, a latch is taken. A latch can be thought of as a kind of very lightweight lock; it’s designed for synchronization and is intended to be quickly turned on, and then just as quickly turned off after it’s no longer needed. The problem occurs when lots of processes happen to hit the SQL Server instance at the exact same time, and they all need to find a place to store some data. Suddenly, there are numerous outstanding latch requests. They start queuing up, and before you know it the last one in line has been waiting for seconds—and not doing any work at all in the meantime.

Seeing this in action is quite simple. Create a bunch of temporary tables, in a very short period of time. I like to demo this using SQLQueryStress. Ideally you should do this kind of test on a server with only a single tempdb data file, to really highlight the issue. Here’s the code to run:

SELECT TOP(10000)
    a.*
INTO #x
FROM
    master..spt_values a,
    master..spt_values b

20 or so concurrent threads should do it. Hit the start button in your load tool, wait a few seconds, run Who is Active, and you’ll see something similar to:

F21_01_latch_waits

See those PAGELATCH waits? They’re all on the same resource: PFS pages, in tempdb file ID 1. The format for PAGELATCH and PAGEIOLATCH waits is: [wait_type]:[database_name]:[file_id](page_type). Who is Active can decode the page types for GAM, SGAM, PFS, DCM (Differential Change Map), and BCM (Bulk Change Map) pages. For any other page, the page type will be an asterisk (*).

These waits are all on update (UP) latches, but it's also quite common to see exclusive (EX) latches when this problem occurs

Fixing this problem is amazingly simple: just create more tempdb files! When you create additional tempdb files—as long as they’re equally sized—SQL Server will automatically balance incoming requests across the multiple files. Since each file maintains its own PFS, GAM, and SGAM information, the contention will be eliminated.

How many files should you create? Well, that depends on your workload. I take a simple approach: start with a number of files equal to one quarter to one half of the number of logical schedulers. If the contention goes away (i.e. Who is Active is no longer showing PAGELATCH waits on these special pages in tempdb), stop there. Otherwise, keep increasing the number of files until the contention does go away.

Again: make sure to keep the files equally sized! SQL Server’s algorithm is based on a proportional fill model, which means that if one file is bigger than the others it will be chosen more often. This will cause it to grow more quickly, which will cause it to be chosen more often, which will cause it to grow more quickly, which will keep compounding until your disk is full and your performance is back in the red zone. Not a good situation to get yourself into. Grow the files bigger than you need them at create time, and monitor to make sure things don’t get out of hand.

 

Homework

Use Who is Active to check your production servers for tempdb contention! Tell me in the comments below whether you found any. It’s amazing how common this issue is, yet how simple the fix turns out to be. A very satisfying task for even the most harried of DBAs.

Published Thursday, April 21, 2011 11:00 AM by Adam Machanic

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Yih Lee said:

I think there is a bug in Sp_whoisActive where the "program_Name" is incorrectly displayed.

For some reason, When I run this

EXEC Perf..sp_WhoIsActive

   @get_additional_info = 1,

   @filter_type = 'Session',

   @filter = '435',

   @get_plans = 1,

   @get_task_info = 1

It is displaying program_name as 'Service Broke queue_id: 2075870462' but the query is running from SSMS. this is repeatable on other servers as well.

April 21, 2011 3:37 PM
 

Adam Machanic said:

Hi Yih,

Can you share some more information? Is Service Broker running in your environment? And can you try again with the most recent beta:

http://sqlblog.com/files/folders/35046/download.aspx

 

Thanks!

April 21, 2011 7:35 PM
 

Adam Machanic said:

This post is part 30 of a 30-part series about the Who is Active stored procedure. A new post will run

April 30, 2011 11:45 AM
 

mjswart said:

Hi Adam,

I've noticed that sp_whoisactive uses tempdb itself and when there's tempdb contention I've seen sp_whoisactive take a while. So I've learned to do quick and dirty ad hoc queries from dm_exec_requests...

But does this make sense? That sp_whoisactive can struggle to report on  tempdb contention because it needs to use tempdb itself?

September 25, 2012 9:45 AM
 

Adam Machanic said:

Hi Michael,

First off, which version are you using? v11.11 was specifically tuned to use less tempdb than prior versions.

Second, unfortunately I had to make a choice between [workspace] memory and tempdb [spills]. I chose the latter, as I consider tempdb to be a much cheaper and more expendable resource than memory. Admittedly, I tuned this using a pretty heavy knob and I didn't do a lot of analysis on whether or not I could balance things out a bit more. I hope to do that for a future version.

--Adam

September 25, 2012 6:49 PM
 

Centurian said:

Hello Adam,

Just used your SQLQueryStress utility which helped verify my query detected contention straight away. Thanks a bunch.

:-)

May 6, 2013 8:47 AM

Leave a Comment

(required) 
(required) 
Submit

About Adam Machanic

Adam Machanic is a Boston-based SQL Server developer, writer, and speaker. He focuses on large-scale data warehouse performance and development, and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "SQL Server 2008 Internals" (Microsoft Press, 2009) and "Expert SQL Server 2005 Development" (Apress, 2007). Adam regularly speaks at conferences and training events on a variety of SQL Server topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server, a Microsoft Certified IT Professional (MCITP), and an alumnus of the INETA North American Speakers Bureau.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement