Here is a SQL Server challenge as I have failed so far:
Describe a reproducible workload that would see significant throughput improvement when trace flag 1118 is enabled.
I have seen a lot of chatters, anecdotes, and stories on the impact of trace flag 1118 on reducing tempdb contentions. But I have not seen any concrete evidence or test cases that actually show its impact on SQL Server 2005 or SQL Server 2008. The empiricist in me just itches for some solid data points.
I have also talked about the impact of trace flag 1118 in two posts here (1, 2). In either case, I was not able to find very strong evidence that T1118 was significantly useful in alleviating the tempdb contention in SQL Server 2005. Recently, there have been some more discussions on T1118. Many questions on T1118 were asked and answered in various SQL Server forums, newsgroups, websites, and lists. Two blog posts, in particular, caught my attention: Bob Dorr’s and Paul Randal’s. They approach the topic from different angles, but both are worth reading.
The consensus—not just from Dorr's and Randal's posts, but from the SQL Server community in general—seems to be that while in SQL Server 2005 and SQL Server 2008 trace flag T1118 is no longer that useful because of the improvement made to the tempdb storage allocation internals, but it may still be useful in some cases.
The trouble is that I have not seen any specific cases that show the trace flag to make a huge difference. Since I can’t seem to be able to generate a test workload that shows T1118 to make a significant difference, I have been on a vigilant lookout for any cases others may wish to share.
To that end, what Bob Dorr mentioned in his post especially piqued my interest. He said:
“For example, I am working with a customer on a TEMPDB PFS page contention. The stored procedures heavily use "select into ... #tmp where 1=2" This syntax is not moving rows because of the where clause limitation but is creating tables. This will place pressure on the allocation mechanisms. Use of trace flag 1118 may still be required to force uniform allocation behavior. ”
So I decided to give it another shot on SQL Server 2008 with two test workloads described below.
Test workloads
Two versions of a single stored procedure were used to create tempdb contentions:
-- Version 1
CREATE PROC dbo.spT1118_v1
AS
SET NOCOUNT ON
select * into #tmp1
from sysobjects
where 1=2
select * into #tmp2
from sysobjects
where 1=2
drop table #tmp1
drop table #tmp2
go
This proc does nothing but to create two temp tables and then drop them. In particular, no data is inserted into either of the two temp tables.
The second version of the proc included inserting data:
-- Version 2
CREATE PROC dbo.spT1118_v2
AS
SET NOCOUNT ON
select * into #tmp1
from sysobjects
where 1=2
select * into #tmp2
from sysobjects
where 1=2
insert #tmp1
select * from sysobjects
insert #tmp2
select * from sysobjects
drop table #tmp1
drop table #tmp2
go
Tests
Two tests were conducted: Test I with spT1118_v1 and Test II with spT1118_v2.
In each test, I fired the respective proc against a SQL Server 2008 instance (10.0.1600) from a multi-threaded C# program with each thread simulating a user, making calls to proc spT1118_v1 in Test I and making calls to spT1118_v2 in test II.. No wait time was included between two consecutive calls. Throughout was measured in terms of the number of calls to proc spT1118_v1 (or proc spT1118_v2) that could be processed per second on the SQL Server 2008 instance.
The test server was a DB365 with four 2.8GHz cores (two dual-core AMD Opteron 2220 sockets) and with 4GB of RAM running 32-bit Windows Server 2003 SP2 and 32-bit SQL Server 2008 (10.0.1600).
Now for the tests to be informative, the workload levels should range from light to heavy so that we don’t end up drawing any conclusion that may be applicable to, for instance, lighter load only. In this particular test environment, through experiments, I found that 6 users (i.e. 6 client threads) proved to be too light, 8 ~ 10 users produced the best transaction throughput (i.e. the highest number of calls to the proc could be made), and the throughout started to drop as the number of users continued to increase.
Consequently, the number of users tested in both Test I and Test II includes 4, 8, 10, 12, 14, 16, 18, and 20.
In addition, looking at various wait stats and wait types on the spids from the C# test client program that were calling the spT1118_v1 proc or the spT1118_v2 proc, I found that with around 8 or fewer users, waits on tempdb internal resources--especially PFS (Page Free Space) pages--were light. With around 10 users, waits on PFS pages were frequently observed, but not severe enough to reduce the throughout. Beyond 14 users, massive waits on PFS pages were consistently observed, users were blocking each other on these tempdb internal structures, and the throughput dropped significantly.
In other words, whether or not actual data was being inserted into the temp tables did not change the contention on the PFS pages.
Test results
The charts below summarize the test results. Chart I is for test I and Chart II is for Test II.
From these two charts, you may be able to argue that there was a few percentage of a difference in throughout whether or not trace flag 1118 was used. At least, in Test I where no data was inserted into the temp tables, we see that the maximum throughput difference was ~7%.
But taking into consideration margin of error, you can’t really conclude the difference was significant, especially not across all the load levels. With the users heavily waiting on the tempdb PFS pages, you could argue that the two charts actually suggest that enabling T1118 produced lower throughput than disabling T1118 in some cases, though again the difference was not significant at all.
Just want to mention that in addition to the tests already described and whose results are presented in the above two charts, I checked many other cases including more temp tables and/or large amount of data inserted into the temp tables in the test procedures. In all the cases I saw significant tempdb contentions, but in no case did I see any significant improvement when T1118 was enabled.
If T1118 is indeed useful in some cases as suggested by Microsoft, it would be nice that Microsoft publishes a test workload that clearly shows T1118 to make a significant difference. And given the specific and narrow focus of this trace flag, such a test workload should not be difficult to come by. Without seeing any such concrete proof, enabling T1118 on SQL Server 2008 remains to be one of those fuzzy feel-good and just-in-case best practices. Since there does not seem to be any downside risk to enable it and anyone else seems to believe there is potential benefit, why not just enable it?
I’m not going to argue against the desire to be on the safe side. But until I see specific evidence, it doesn’t make me feel any better about this fuzzy feel-good just-in-case best practice. So far, I have not been able to find any specific evidence, and that takes us back to the SQL Server challenge at the beginning at this post.