THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

SQL Server challenge: show me trace flag 1118 is significant

Here is a SQL Server challenge as I have failed so far:

 

Describe a reproducible workload that would see significant throughput improvement when trace flag 1118 is enabled.

 

I have seen a lot of chatters, anecdotes, and stories on the impact of trace flag 1118 on reducing tempdb contentions. But I have not seen any concrete evidence or test cases that actually show its impact on SQL Server 2005 or SQL Server 2008. The empiricist in me just itches for some solid data points.

 

I have also talked about the impact of trace flag 1118 in two posts here (1, 2). In either case, I was not able to find very strong evidence that T1118 was significantly useful in alleviating the tempdb contention in SQL Server 2005. Recently, there have been some more discussions on T1118. Many questions on T1118 were asked and answered in various SQL Server forums, newsgroups, websites, and lists. Two blog posts, in particular, caught my attention: Bob Dorr’s and Paul Randal’s. They approach the topic from different angles, but both are worth reading.

 

The consensus—not just from Dorr's and Randal's posts, but from the SQL Server community in general—seems to be that while in SQL Server 2005 and SQL Server 2008 trace flag T1118 is no longer that useful because of the improvement made to the tempdb storage allocation internals, but it may still be useful in some cases.

 

The trouble is that I have not seen any specific cases that show the trace flag to make a huge difference. Since I can’t seem to be able to generate a test workload that shows T1118 to make a significant difference, I have been on a vigilant lookout for any cases others may wish to share.

 

To that end, what Bob Dorr mentioned in his post especially piqued my interest. He said:

 

For example, I am working with a customer on a TEMPDB PFS page contention.   The stored procedures heavily use "select into ... #tmp where 1=2"   This syntax is not moving rows because of the where clause limitation but is creating tables.   This will place pressure on the allocation mechanisms.  Use of trace flag 1118 may still be required to force uniform allocation behavior. ”

 

So I decided to give it another shot on SQL Server 2008 with two test workloads described below.

 

Test workloads

Two versions of a single stored procedure were used to create tempdb contentions:

 

-- Version 1

CREATE PROC dbo.spT1118_v1

AS

SET NOCOUNT ON

  select * into #tmp1

    from sysobjects

   where 1=2

 

  select * into #tmp2

    from sysobjects

   where 1=2

 

  drop table #tmp1

  drop table #tmp2

go

 

This proc does nothing but to create two temp tables and then drop them. In particular, no data is inserted into either of the two temp tables.

 

The second version of the proc included inserting data:

 

-- Version 2

CREATE PROC dbo.spT1118_v2

AS

SET NOCOUNT ON

  select * into #tmp1

    from sysobjects

   where 1=2

 

  select * into #tmp2

    from sysobjects

   where 1=2

 

  insert #tmp1

  select * from sysobjects

 

  insert #tmp2

  select * from sysobjects

 

  drop table #tmp1

  drop table #tmp2

go

 

Tests

 

Two tests were conducted: Test I with spT1118_v1 and Test II with spT1118_v2.

 

In each test, I fired the respective proc against a SQL Server 2008 instance (10.0.1600) from a multi-threaded C# program with each thread simulating a user, making calls to proc spT1118_v1 in Test I and making calls to spT1118_v2 in test II.. No wait time was included between two consecutive calls. Throughout was measured in terms of the number of calls to proc spT1118_v1 (or proc spT1118_v2) that could be processed per second on the SQL Server 2008 instance.

 

The test server was a DB365 with four 2.8GHz cores (two dual-core AMD Opteron 2220 sockets) and with 4GB of RAM running 32-bit Windows Server 2003 SP2 and 32-bit SQL Server 2008 (10.0.1600).

 

Now for the tests to be informative, the workload levels should range from light to heavy so that we don’t end up drawing any conclusion that may be applicable to, for instance, lighter load only. In this particular test environment, through experiments, I found that 6 users (i.e. 6 client threads) proved to be too light, 8 ~ 10 users produced the best transaction throughput (i.e. the highest number of calls to the proc could be made), and the throughout started to drop as the number of users continued to increase.

 

Consequently, the number of users tested in both Test I and Test II includes 4, 8, 10, 12, 14, 16, 18, and 20.

 

In addition, looking at various wait stats and wait types on the spids from the C# test client program that were calling the spT1118_v1 proc or the spT1118_v2 proc, I found that with around 8 or fewer users, waits on tempdb internal resources--especially PFS (Page Free Space) pages--were light. With around 10 users, waits on PFS pages were frequently observed, but not severe enough to reduce the throughout. Beyond 14 users, massive waits on PFS pages were consistently observed, users were blocking each other on these tempdb internal structures, and the throughput dropped significantly.

 

In other words, whether or not actual data was being inserted into the temp tables did not change the contention on the PFS pages.

 

Test results

The charts below summarize the test results. Chart I is for test I and Chart II is for Test II.

 

 

From these two charts, you may be able to argue that there was a few percentage of a difference in throughout whether or not trace flag 1118 was used. At least, in Test I where no data was inserted into the temp tables, we see that the maximum throughput difference was ~7%.

 

But taking into consideration margin of error, you can’t really conclude the difference was significant, especially not across all the load levels. With the users heavily waiting on the tempdb PFS pages, you could argue that the two charts actually suggest that enabling T1118 produced lower throughput than disabling T1118 in some cases, though again the difference was not significant at all.

 

Just want to mention that in addition to the tests already described and whose results are presented in the above two charts, I checked many other cases including more temp tables and/or large amount of data inserted into the temp tables in the test procedures. In all the cases I saw significant tempdb contentions, but in no case did I see any significant improvement when T1118 was enabled.

 

If T1118 is indeed useful in some cases as suggested by Microsoft, it would be nice that Microsoft publishes a test workload that clearly shows T1118 to make a significant difference. And given the specific and narrow focus of this trace flag, such a test workload should not be difficult to come by. Without seeing any such concrete proof, enabling T1118 on SQL Server 2008 remains to be one of those fuzzy feel-good and just-in-case best practices. Since there does not seem to be any downside risk to enable it and anyone else seems to believe there is potential benefit, why not just enable it?

 

I’m not going to argue against the desire to be on the safe side. But until I see specific evidence, it doesn’t make me feel any better about this fuzzy feel-good just-in-case best practice. So far, I have not been able to find any specific evidence, and that takes us back to the SQL Server challenge at the beginning at this post.

 

 

Published Thursday, October 01, 2009 12:53 PM by Linchi Shea

Attachment(s): T1118_SQL2008.GIF

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Konstantin Korobkov said:

How many file did you have in the tempdb? Can you re-run these test w/o T1118 and jsut one tempdb file and w/ T1118 and 4 (eq. to number of cores) <sic>equally sized<sic> tempdb files?

October 1, 2009 3:11 PM
 

Linchi Shea said:

Konstantin;

Tempdb had 8 data files, 500MB each. I believe I did run the tests with one data file, but will double check.

October 1, 2009 7:28 PM
 

Linchi Shea said:

I posted the following SQL Server challenge yesterday: Describe a reproducible workload that would see

October 2, 2009 11:52 AM
 

Jimmy May said:

I, too, would like to see the evidence you seek.  I've done experiments in which multiple tempdb data files enhanced throughput, but have never had the opportunity to test T1118 such as you have.  Nicely done.

October 3, 2009 9:29 AM
 

Patrick LeBlanc said:

I too was non-believer until I came to work one morning and my 24 core, 96 gig RAM server was completely bogged down.  This is a server who’s CPU and Memory is extremely underutilized.  I share the details in my blog posting here:http://www.sqlservercentral.com/blogs/sqldownsouth/archive/2009/09/25/trace-flag-1118.aspx.  As I stated, Trace Flag 1118 did not completely solve the problem, but it did relieve some of the contention.  Unfortunately, I could not reproduce the behavior on a machine with fewer resources.

October 3, 2009 2:52 PM
 

Linchi Shea said:

Patrick;

Did you remove T1118 to see if the problem recur? I know this may not be possible in a prod environment. But just in case you did. The difficulty with the prod environment is often that the ultimate goal is to get the problem resolved first with the root cause analysis taking a secondary priority. As such, multiple changes may be made and we may not know exactly what cured the problem. In your case, maybe it was T1118. It could also be possible that something else also changed. Maybe, restarting the clients helped.

So the question is, can you reproduce it on a 24-core machine? If so, can you describe the workload so that I can try to reproduce it?

October 3, 2009 9:43 PM
 

Patrick LeBlanc said:

I did, and the problem recurred.

October 4, 2009 12:10 AM
 

Patrick LeBlanc said:

Every two minutes we issued about 1000 calls to the server which all included a variable table declaration.  Unfortunately I cannot give you more.

October 4, 2009 12:12 AM
 

Linchi Shea said:

Patrick;

I'm not asking for any app-specific details. For the type of issues tha T1118 is supposed to help, you should be able to create a genetic workload that shares sufficient characteristics in terms of the tempdb behavior with your app to reproduce the beheavior and the impact of T1118 without devulging anything about your app itself.

I'm not challenging the validity of your case at all. But I hope to see a case that can be independently reproduced.

October 4, 2009 12:33 AM
 

Patrick LeBlanc said:

Will work on something for you.  

October 4, 2009 10:34 PM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement