|
|
|
|
Checking out SQL Server via empirical data points
-
As organizations start to deploy various data grids into their application layers, they may start to experience what I’d call a SQL storm—a massive number of simple SQL requests that are issued over a large number of concurrent database connections at a high frequency.
This is not about SQL viruses or denial of service attacks. But the end result is similar. On the SQL Server machine, the CPU usage is being pegged to 100% for a sustained time period, and the applications that share the SQL Server instance complain about slow performance or query/connection timeouts. Upon further examination of the SQL Server instance, you typically find that a large number of connections coming from a group of application servers, and on each connection you see the application server trying to issue simple SQL statements in either ad hoc or proc calls against the SQL Server instance at a high frequency.
This is not an inherent problem of a data grid (such as those based on Oracle Coherence). But if not managed properly, a data grid has the capability and potential to cripple a SQL Server instance (or for that matter any DBMS).
Typically, these data grids cache frequently used data on the application servers, close to where the data is being used. A grid enables the app to automatically and dynamically partition the data in memory across the app servers, and help manage their transactional consistency and data integrity.
One of the issues any data grid must grapple with is how to ensure changes to data in memory across the app servers are persisted in the database. This is where things may not be managed properly and SQL storms are seen against the database.
Now, these changes can be reflected on the database backend in a number of different ways. The simplest approach is to leverage the normal grid calls to the database to perform your typical OLTP inserts, updates and deletes. Often, a grid performs these inserts, updates, and deletes in a rather simple and uniform fashion over all its connections, and this works well when changes to the cached data only trickle in.
Sometimes, business rules may dictate that a large amount of grid-cached data be updated at a given point in time. The grid can usually handle it without any issue because all the updates are memory only operations and are happening on multiple app servers. But as you can imagine, if all the app servers in the grid start to update the database backend all at once with the changes they see in their own memory, it can cause a significant load on the database server. Furthermore, note that these grids are often architected in such a way that each app server maintains multiple connections to the database server. If the grid keeps all the connections busy with tight loops in trying to sync’ing up the database with its cache as quickly as possible, it may cause the database server to be so busy that it ends up creating a significant CPU bottleneck on the database server, prolonging rather than shortening the synchronization.
If you have data grids running in your environments, watch out for SQL storms from these grids, and ask your application development folks to channel these massive updates through a more efficient batch process against your database backend. Avoid performing batch updates in an OLTP fashion.
|
-
In the SQL Server communities, it's common to hear people talking about HP SAN, EMC SAN, 3Par SAN, and so on as if there were such things as HP SAN, EMC SAN, etc.
Technically, SAN stands for Storage Area Network, but can be, and has been, used in two different ways. First, outside the storage communities, people often view everything beyond the drive at the OS level as the SAN with no regard to how that SAN is architected or configured as long as that drive is presented from some kind of SAN infrastructure. Typically, this is the way SQL Server professionals talk about SAN.
Within the storage communities, the interpretation is often different. To a storage engineer, there is a distinction between SAN and disk arrays: SAN is the network fabric that is made up of switches that provide your host a point-to-point path/link to the disks on some disk arrays. Disk arrays are the storage devices where physical disks are pooled together and managed by sophisticated software and additional controller hardware.
I personally have tried consciously to stick to the second interpretation, and refrained from using terms such as HP SAN or EMC SAN, primarily because this type of speaking adds no value, but introduces inaccuracies, and can be potentially misleading.
First of all, even if you interpret SAN broadly, it’s still probably wrong, or inaccurate, to speak of EMC SAN or IBM SAN, for instance, because it’s highly unlikely that the SAN environment is entirely made up of the EMC or IBM devices (I have not seen one anyway). The disk arrays may be from EMC or IBM, but the switches are often from Broacade or Cisco, or a mix of switches from different vendors. SAN is not a monolithic piece.
Although SAN in terms of the switch fabric can be a limiting factor, especially when it comes to throughput, disk arrays are often where the disk I/O performance is determined, especially when it comes to disk I/O latency. Identifying from what disk array a LUN is carved automatically puts us in a better position in our performance analysis.
Even if we stop being ‘picky’ and interpret EMC SAN to mean an EMC disk array behind a SAN fabric, the phrase is not specific enough to add as much value as identifying it to be an EMC DMX-3 disk array, for instance. Note that there can be many different models, makes, and versions of disk arrays behind a SAN, and their performance characteristics can be vastly different. It does not help to say that your drive is presented from an EMC SAN because two drives presented from the same SAN can perform very differently even with the same number of drives, same type of drives, and same RAID configuration. Many factors of the disk array heavily influence the performance of LUNs carved from the disk array.
Although different LUNs can be carved from the same disk array and these LUNs can be configured to have different performance characteristics, identifying the disk array allows one to understand why two LUNs of the same configuration have different performance characteristics, or at least gives one an opportunity to look for more information, and make the conversations easier with the storage folks.
Saying that a drive is from an EMC SAN is simply too generic to be useful, even if we are willing to ignore the fact that there is no such thing as an EMC SAN.
|
-
In this previous post I asked whether 100% logical scan fragmentation is always worse than 85% local scan fragmentation for table/index scans. (To be precise, I was talking about a B-tree table, i.e. a table with a clustered index).
The answer is no. 100% logical scan fragmentation is not always worse than 85% logical scan fragmentation in terms of table or clustered index scans. And the following is an example.
Two T-SQL scripts were used to populate a test table in separate tests. The key differences between these two scripts are highlighted in the code segments below in bold red.
Script I – populating data with 85% logical scan fragmentation
Script I below populates the test table with 2,000,000 rows. The clustered index key column is assigned values that alternate between small and large with the intention to fragment the clustered index.
create table test(c1 int, c2 int, c3 char(1000) default ' ')
go
create clustered index cix_test on test(c2)
go
set nocount on
go
declare @i int
set @i = 1
begin tran
while @i <= 2000000
begin
insert test(c1, c2)
select @i,
case when @i % 2 = 0 then @i else 2000000 - @i end
if @i % 100000 = 0
begin
commit tran
begin tran
end
set @i = @i + 1
end
commit tran
go
At the completion of Script I, the test table can be summarized as follows:
|
Property |
Value |
|
Logical scan fragmentation |
~85% |
|
Average page density |
~54% |
|
Pages scanned |
466,668 |
|
Extents scanned |
300,131 |
|
Rows |
2,000,000 |
|
Reserved space |
~3.6GB |
Script II – populating data with 100% logical scan fragmentation
Script II below populates the test table with 3,000,000 rows. That’s right, 3,000,000 instead of 2,000,000. The clustered index key column is assigned values in the order exactly opposite of that for the clustered index.
create table test(c1 int, c2 int, c3 char(1000) default ' ')
go
create clustered index cix_test on test(c2)
go
set nocount on
go
declare @i int
set @i = 1
begin tran
while @i <= 3000000
begin
insert test(c1, c2)
select @i,
3000000 - @i; -- from high to low
if @i % 100000 = 0
begin
commit tran
begin tran
end
set @i = @i + 1
end
commit tran
go
At the completion of Script II, the test table can be summarized as follows:
|
Property |
Value |
|
Logical scan fragmentation |
100% |
|
Average page density |
~88% |
|
Pages scanned |
428,572 |
|
Extents scanned |
53,772 |
|
Rows |
3,000,000 |
|
Reserved space |
~3.3GB |
Script III – the test query
At the completion of Script I and Script II, the following test script is run:
dbcc dropcleanbuffers
go
select COUNT(*) from test;
go
The test results
And the elapsed time of the SELECT statement is timed. The following table shows the performance of the test Script III:
|
Data |
SELECT COUNT(*) Performance (sec) |
|
Script I (85% fragmentation) |
100 |
|
Script II (100% fragmentation) |
27 |
The following chart shows the same data in the above table:
Observations
Note that since Script II populates the test in a more compact fashion (avg. page density = ~88% as opposed ~54% for Script I) and there would be fewer pages if the same number of rows are entered by both scripts. So, to make the comparison meaningful, I increased the number of rows for Script II to 3,000,000, and this resulted in the test table to have similar number of pages and similar sizes.
Why does the table scan performed significantly faster with the data that was 100% fragmented than with the data that was 85% fragmented? In fact, the query performed four times faster with the data that was 100% fragmented.
The key reason is that even though the data was 100% fragmented in terms of DBCC SHOWCONTIG or sys.dm_db_index_physical_stats, the pages were still considered ‘contiguous’, and SQL Server was able to post large reads and drove the I/O subsystem with significantly higher read-ahead reads.
More specifically, with the data populated by Script II (logical scan fragmentation = 100%), SQL Server was able to process the test query with significantly larger reads (~360KB/read vs. ~14KB ~37KB/read for the Script I data) and with significantly higher readahead reads (11,000~12,000 readahead pages/sec vs. 3,600~6,000 readahead pages/sec for the Script I data).
So, higher logical scan fragmentation does not necessarily mean worse table/index scan performance.
|
-
Everyone else probably knew, but I didn't know that you could do the following:
SELECT 'abc' WHERE <some where clause conditions>;
I probably didn't really miss anything, but was surprised that this worked! I mean, a SELECT statement that has a WHERE clause but without also having a FROM clause.
|
-
Certain things just don’t go well together! No, not talking about liberals and Republicans. Rather, I’m talking about user defined scalar functions and SQL traces.
If you run SQL Trace or SQL Profiler often, you’d find user defined scalar functions to be rather annoying in that 90% of your trace file can be easily taken up by functional calls if you trace for SP:Starting and/or SP:Completed. And it can be much worse if you trace for SP:StmtStarting and/or SP:StmtCompleted.
Burying useful trace data in an avalanche of functional calls is but a minor issue compared to the adverse performance impact you may experience when combining a SQL trace and user defined scalar functions, perhaps inadvertently.
First, an anecdote. Some time ago, I was at a site where a stored procedure consistently performed much worse on the production server than it did on the QA server with the same database and with the production server supposedly being much better equipped than the QA server. The proc executed with the same query plan, and a trace confirmed that it was going through the same steps in both prod and QA. The proc ran for less than one second in QA, but took five seconds to complete in prod.
Just when we started to suspect whether the production server was really better than the QA server, it was found that there was a trace running on the production server. Turning off the trace on the production led to immediate improvement of the stored procedure on the production server. In fact, it now performed faster in prod than it did in QA. Furthermore, if we took out the scalar function calls in the stored procedure, it would run faster in prod than in QA regardless of whether that trace was running or not.
Now, let’s look at some controlled test results on the performance impact of a SQL trace on a stored procedure that calls a user defined scalar function.
The test setup
The scalar function is a dummy function in that it doesn’t really do anything. It just returns the input parameter:
create function dbo.fn_Dummy (@c varchar(20))
returns varchar(20)
as
begin
declare @a varchar(20);
select @a = @c;
return (@a);
end;
go
The test table is named customer and its definition is as follows:
CREATE TABLE customer
(
c_id int,
c_first char(20),
c_middle char(20),
c_last char(20),
c_data char(500)
)
The customer table is populated with 50,000 rows as follows:
declare @i int
set @i = 1
begin tran
while @i <= 50000
begin
insert customer(c_id, c_first, c_middle, c_last, c_data)
select @i, CAST(@i as CHAR(20)), CAST(@i as CHAR(20)),
CAST(@i as CHAR(20)),
CAST(@i as CHAR(500))
set @i = @i + 1
end
commit tran
go
create clustered index ci_customer on customer(c_id)
Two stored procedures are used in the tests: p_test and p_testUDF. They are identical except that p_test does NOT call the scalar function.
create proc p_test
as
set nocount on -- does not call fn_Dummy()
select max(DATALENGTH(
c.c_first +
c.c_middle +
c.c_last )),
COUNT(*)
from customer c
where c_id <= 50000;
go
create proc p_testUDF
as
set nocount on -- does call fn_Dummy()
select max(DATALENGTH(
dbo.fn_Dummy(c.c_first) +
dbo.fn_Dummy(c.c_middle) +
dbo.fn_Dummy(c.c_last) )),
COUNT(*)
from customer c
where c_id <= 50000;
go
The tests
The key test parameters include:
· With scalar function calls vs. without scalar function calls. This is controlled by calling either p_test or calling p_testUDF. The former makes no scalar function calls.
· SQL trace configurations (i.e. level of tracing details). Four different SQL traces are configured with different levels of details being traced:
o No Trace. This is the test baseline when no trace is configured.
o Default Trace. This comes off the default trace template defined by SQL Server 2005. It basically traces for RPC:Completed and SQL:BatchCompleted. This is the least detailed trace of the tested trace configurations.
o SP Trace. This trace configuration includes the events covered by the Default Trace plus SP:Completed.
o Stmt Trace. This trace configuration include the events covered by the SP trace plus SP:StmtCompleted. This is the most detailed trace of the tested trace configurations.
So these four trace configurations are progressively more detailed.
The test results
All the test results are recorded only after multiple runs of the same tests. And since the customer table is about 2GB in size and the SQL Server buffer pool has 6GB allocated, all the test results are taken when the table is fully cached in the buffer pool.
The key metric is the response time of the test proc: either p_test or p_testUDF as defined above.
As always, it’s better to present the test results in a chart format.
A number of things stand out rather strikingly from these two charts.
First, when a proc calls a scalar UDF, the level of the details you trace can have a huge negative impact on the proc performance. In particular, if you do statement level tracing (e.g. tracing for SP:StmtCompleted or SP:StmtStarting), you could see your proc performing several times worse. In my tests, the proc takes 10 times longer to complete when the statement level trace is enabled.
Secondly, the exact degree of the impact depends, not surprisingly, on the number of calls to the scalar function, which is in turn dependent on the number of rows and the number of columns the scalar function is applied to. In my tests, the number of rows is set to 50,000. This may sound like a lot of rows to apply a scalar function to. And it is actually a lot of rows in light of the impact we are seeing. However and unfortunately, this is not uncommon in the real world applications.
Thirdly, and from the second chart, it is clear that the proc that does not call the scalar function (i.e. p_test) is insensitive to the trace configuration. This does not mean that any proc that does not call a scalar function is completely unaffected by how you configure a trace. Obviously, for a complex proc, a statement level trace would have to step into its inner steps and may have an impact. What the second chart does show is that it is a solid baseline in that the impact seen in the first chart is caused only by the change in the trace configuration.
Another point to highlight is that when you trace at the SQL batch or RPC level (e.g. RPC:Completed), the impact is rather small even with scalar UDF calls. In my tests, when the default trace is enabled, the elapsed time of p_testUDF increases from 576ms to 750ms consistently. That is a ~30% increase. But it is arguably negligible when you compare that with an increase from 576ms to 6436ms when the statement level trace is applied.
The key takeaway here is that you must be extra careful when you trace at the stored procedure level (e.g. tracing for SP:Starting), and you must be extraordinarily careful when you trace at the statement level.
|
-
All the previously posted results (May 25th and May 29th) on this exercise were obtained with query parallelism disabled (i.e. the sp_configure ‘max degree of parallelism’ option was set to 1).
Since the following test query is sensitive to query parallelism, we need to see what impact query parallelism may have.
DBCC DROPCLEANBUFFERS
go
SELECT COUNT(*) FROM dbo.test;
It turns out that with parallelism the test query exhibited significantly different performance characteristics with the three data sets we have been using as test vehicles than without parallelism. And again, the nature of the storage system proved to be a significant confounding factor. The following table shows the results with query parallelism on both the internal drive (the C: drive) and the drive presented from a departmental level disk array (the E: drive), the same two drives used in the May 25th update and the May 29th update.
|
Drive |
Test Run |
Elapsed Time (second) |
|
Internal (C:) |
Adam’s script – test run 1 |
90 |
|
Adam’s script – test run 2 |
88 |
|
Adam’s script – test run 3 |
87 |
|
Tibor’s script – test run 1 |
217 |
|
Tibor’s script – test run 2 |
219 |
|
Tibor’s script – test run 3 |
226 |
|
Linchi’s script – test run 1 |
325 |
|
Linchi’s script – test run 2 |
324 |
|
Linchi’s script – test run 3 |
324 |
|
Disk array (E:) |
Adam’s script – test run 1 |
74 |
|
Adam’s script – test run 2 |
75 |
|
Adam’s script – test run 3 |
72 |
|
Tibor’s script – test run 1 |
93 |
|
Tibor’s script – test run 2 |
97 |
|
Tibor’s script – test run 3 |
89 |
|
Linchi’s script – test run 1 |
51 |
|
Linchi’s script – test run 2 |
51 |
|
Linchi’s script – test run 3 |
51 |
Note the dramatic change in the relative impact of the data sets on the test query when the storage performance and query parallelism were added to the mix. In particular, note how the test query was able to perform much faster on the faster drive (drive E:) with some data sets.
I just want to present the data points in this post, and will dive in a bit in the next follow up.
|
-
If you use transactional replication, I have no doubt that from time to time you are asked to explain why there is an increased latency between the publisher and the subscriber. More often than not, this end-to-end latency is caused by latency in distributing the commands from the distribution database to the subscriber. So I’ll focus on the distribution latency in this post.
If you are monitoring the latency of your transaction replication using SQL Server native alert system, you may get an alert such as the following:
SQL Server Alert System: 'Replication Warning: Transactional replication latency (PUBLISHER-publication-SUBSCRIBER-780)' occurred on \\DISTRIBUTOR
DATE/TIME: 6/4/2009 5:03:30 PM
DESCRIPTION: The SQL Server performance counter 'Dist:Delivery Latency' (instance 'PUBLISHER-publication-SUBSCRIBER-780') of object 'SQLServer:Replication Dist.' is now above the threshold of 10000.00 (the current value is 40160.00).
This tells you that the distribution agent, named PUBLISHER-publication-SUBSCRIBER-780, has crossed the latency alert threshold.
Now what?
Well, there can be more root causes for an increase in the distribution latency. A transaction volume increase is one of the most common causes. Hey, it justs longer to pump more commands through the system. So you want to quickly determine if that’s the cause or to rule it out quickly so that you can focus your attention on some other potential causes.
More specifically, you want to determine how many transactions, and most usefully, how many commands are being pumped into the distribution database for the distribution agent to apply to the subscriber. Note that a single update that modifies 100,000 rows—that is, one transaction with 100,000 commands—can easily send the distribution latency through the roof, at least, temporarily.
In order for you to determine if the transaction volume is the culprit and if the problem still persists, you need to be able to answer the following questions concerning the current state of replication and the state of replication just prior to the reported latency:
· What are the replication perfmon counter values related to the distribution agent?
· What are the undistributed vs. distributed command counts?
· What is the pending command count?
· What is the recent traffic pattern of the incoming transactions (e.g. transaction count by minute and transaction count to command count ratio over the past 20 minutes)?
· What is the recent traffic pattern of the incoming commands from the log reader (e.g. command count by minute by article over the past 20 minutes)?
· What are the to-be-distributed command counts by article?
· Is the distribution agent being blocked on the subscriber? Okay, this is not a volume issue, but nevertheless is an important piece of information to review.
You need a script to collect the stats if you want to get to the answers quickly, for instance, while your client is pestering you on the phone for an explanation.
The attached script accepts a distribution agent id—which you can get from the alert—as one of the parameters, and gives you the stats to determine if the latency is caused by a transaction/command volume surge.
A number of things to note about the script.
First, it is a Perl script. But that’s just because I wanted to streamline things a bit, and to be able to have better control on the output and the parameter passing. You can easily extract the SQL statements and put them into a T-SQL script with not much additional work.
Secondly, the script does accept parameters in addition to a distribution agent id. For instance, you can specify how far back in minutes you want the script to collect the stats on the incoming transaction and command traffic patterns. But other than a distribution agent id, all the parameters have default values that you can set in the script to match your environment.
Thirdly, it can be expensive to run some queries against the MSrepl_transactions and MSrepl_commands tables. So for instance, by default the script does not print the sample commands. You can control which query to run or not to run through the script parameters. Look up the comments in the script for details.
Finally, the script currently gets the stats primarily from the distributor. Although I have found this script to be very handy in many occasions, it can be significantly improved. One area I want to improve is to collect more stats from the publisher and the subscriber. For that, I’ll probably convert the script into a C# program because Perl is not good for multi threading and I do want to collect stats from the publisher, the distributor, and the subscriber in parallel from different threads.
If you use the script, I appreciate any feedback you may have.
|
-
This is another follow-up on the T-SQL exercise.
So the test query below is rather simple:
DBCC DROPCLEANBUFFERS
go
SELECT COUNT(*) FROM dbo.test;
But beneath its simple appearance, many factors are at play and interact in a complex way to influence the query performance. In other words, trying to predict its performance is not trivial. You may do better than predicting the stock market performance. But still it can be a rather unreliable business.
So in the May 25th update, I reported on the results of running the test queries with the data file on a RAID-1 set with two internal disks. Yesterday, I triple checked the test result and the result could be re-produced at will.
To see whether or what impact a different storage system may have on the performance, I placed the data file on a departmental disk array behind a SAN. Unlike a typical enterprise class disk array, this departmental disk array does not have a huge amount of cache. In addition, the data block placement is effectively virtualized in that it uses a mapping table to place data across all the drives, making it effectively impossible to know or control where the data is placed at the OS level.
With the data file on this disk array, I repeated the same tests. Each time before the data was loaded, the test table was dropped and the database was empty with no user data.
Again, after the data was loaded by each script, the test query (including DBCC DROPCLEANBUFFERS) was run three times, and each time the query elapsed time was logged. Additional validation test runs were performed to ensure that the test results were re-producible, and they are re-producible.
The following table shows the recorded elapsed times.
|
Test run |
Elapsed time (second) |
|
Adam’s script – test run 1 |
74 |
|
Adam’s script – test run 2 |
82 |
|
Adam’s script – test run 3 |
81 |
|
Tibor’s script – test run 1 |
248
|
|
Tibor’s script – test run 2 |
235 |
|
Tibor’s script – test run 3 |
239 |
|
Linchi’s script – test run 1 |
155
|
|
Linchi’s script – test run 2 |
146 |
|
Linchi’s script – test run 3 |
148 |
First thing to note from the above table is that the ranking of the test query performance changed significantly. In this setup, Tibor’s data set caused the test query to have the worst performance.
But what is most striking is that while the test query performed significantly better with Adam’s and my data sets on the disk array than they did on the internal disks, the test query performed noticeably worse with Tibor’s data set on the disk array than it did on the internal disks.
This is getting more interesting. And I might add that this is yet another piece of evidence on the conspiracy of the every-table-must-have-a-clustered-index crowd :-)
I’ll come back to explore what might be the cause later.
|
-
This is a quick update on the T-SQL exercise I posted a few days ago. The goal was to write a simple T-SQL script to generate and load 4,000,000 rows into a table so that the following query would produce the worst performance, i.e. take longest time to finish:
DBCC DROPCLEANBUFFERS
go
SELECT COUNT(*) FROM dbo.test;
The original intent was to highlight some pitfalls in data loading that may lead to bad query performance. But then I thought why take all the fun away by having too many constraints, and why not just let it loose and see how bad it can get if one is to do it intentionally.
I must say that just looking at the scripts from Denis, Adam, and Tibor, I came to realize that things were not quite what I had thought they were.
For instance, I did not anticipate that a heap table could have performed the table scan so terribly in Tibor’s script. I thought that SQL Server would apply an allocation order scan and would optimize on the physical order regardless of forward pointers. Adam’s script surprised me with a freshly created clustered index. I thought that even if you could create a clustered index to push the limit on the fillfactor, the clustered index would defragment the data enabling SQL Server to scan the table efficiently, easily outweighing whatever gains (actually performance loss) you might get from lowering the page density.
So predicting the query performance turned out to be not very precise at all. It’s better to actually test these scripts out. I ran Adam’s script, Tibor’s script, and my script in a database whose data file was placed on an internal RAID-1 set (i.e. two mirrored physical drives). The internal drive (which was also the system drive C) was used to avoid dealing with the confounding factors because of SAN cache, storage virtualization, and so on. In addition, the test database was given 45GB for data and 20GB for log, significantly larger than what would be required during the tests.
All tests were done with SQL Server 2008 Enterpries x64 Edition (build 10.0.1600) on Windows Server 2003 Enterprise x64 Edition SP2 on an old HP ProLiant DL365 G1 with four cores and 4GB of physical memory.
Adam Machanic’s script
if exists (select * from sysobjects where name = 'test')
drop table test
go create table test ( x int not null, y char(896) not null default (''), z char(120) not null default('') )
go
insert test (x)
select r
from
(
select row_number() over (order by (select 1)) r
from master..spt_values a, master..spt_values b
) p
where r <= 4000000
go
create clustered index ix_x on test (x, y) with fillfactor=51
go
Tibor Karaszi’s script
if exists (select * from sysobjects where name = 'test')
drop table test
go CREATE TABLE test (
x int NOT NULL
,x2 int NOT NULL
,y char(10) NOT NULL DEFAULT ('')
,z char(10) NOT NULL DEFAULT('')
)
DECLARE @rows int = 6666666, @toKeep int = 4000000, @diff int
INSERT test (x, x2)
SELECT TOP(@rows)
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS r
,ROW_NUMBER() OVER (ORDER BY (SELECT 1)) % 10 AS s
FROM master..spt_values a CROSS JOIN master..spt_values b
go
ALTER TABLE test ALTER COLUMN y char(892)
ALTER TABLE test ALTER COLUMN z char(100)
go
CHECKPOINT
go
DECLARE @rows int = 6666666, @toKeep int = 4000000, @diff int
DELETE TOP(@rows - @toKeep) FROM test WHERE x2 IN(2, 4, 6, 8)
go
Note that Tibor’s script generated 3768021 rows instead of 4,000,000 rows in my tests.
Linchi Shea’s script
if exists (select * from sysobjects where name = 'test')
drop table test
go create table test( c1 int, c2 int, c3 char(256) default ' ', c4 char(740) default ' ')
go
create clustered index cix_test on test(c2, c3)
go
set nocount on
go
declare @i int
set @i = 1
begin tran
while @i <= 4000000
begin
insert test(c1, c2)
select @i,
case when @i % 2 = 0 then @i else 4000000 - @i end
if @i % 100000 = 0
begin
commit tran
begin tran
end
set @i = @i + 1
end
commit tran
go
To ensure that when the test table was created, the database was in the same state. The test table was dropped at the beginning of each script. And at that point, the database was empty with no user data.
After the data was loaded by each script, the test query (including DBCC DROPCLEANBUFFERS) was run three times, and each time the query elapsed time was logged. The following table shows the recorded elapsed times.
|
Test run |
Elapsed time (second) |
|
Adam’s script – test run 1 |
123 |
|
Adam’s script – test run 2 |
121 |
|
Adam’s script – test run 3 |
120 |
|
Tibor’s script – test run 1 |
205 |
|
Tibor’s script – test run 2 |
207 |
|
Tibor’s script – test run 3 |
177 |
|
Linchi’s script – test run 1 |
408 |
|
Linchi’s script – test run 2 |
354 |
|
Linchi’s script – test run 3 |
358 |
Afterwards, for validation, the above entire process was repeated one more time. In other words, the test query was run six times for each data-loading script. The test query elapsed times were consistent with those recorded in the above table.
Now, in no way I’d claim that my script has managed to produce the worst query performance. I’m pretty sure that you can find a way to load data to produce much worse query performance.
In addition, as Adam and Tibor observed, the results may be dependent on many factors including the test system configurations. Although on a given test system the test results should be consistent, it could be misleading to compare the results across different test systems.
A case in point is that if you disable read-ahead reads entirely, the query performance would be much worse than any of the elapsed times posted above. So perhaps, your dataset can be loaded so that it induces SQL Server not to make effective use of read-ahead reads, and therefore get worse query performance.
I’m looking forward to seeing more scripts with worse query performance than the three scripts referenced in this post.
|
-
Logical scan fragmentation as reported by DBCC SHOWCONTIG is bad for queries that perform large scans such as table/index scans. Often, we hear that the higher the logical scan fragmentation percentage is, the worse it is for the scans.
So the question is: is 100% logical scan fragmentation always worse than 85% (or 80% or 90%) logical scan fragmentation for table/index scans?
|
-
Here is a T-SQL scripting exercise in case you have a few minutes to spare or are bored with whatever else you are doing.
Objective
The task is to write a simple T-SQL script to generate and load 4,000,000 rows into a test table. The objective is to make the following simple test query to have the worst performance in terms of elapsed time:
DBCC DROPCLEANBUFFERS
go
SELECT COUNT(*) FROM dbo.test;
The longer the elapsed time, the worst it is.
Constraints
Of course, there must be a number of constraints, and they are as follows:
· The test table can have any number of columns of any fixed-length data types. However, the first column should be an integer column with values from 1 through 4,000,000, inclusive.
· The sum of the all the column widths are between 1000 bytes plus or minus 20 (i.e. between 980 bytes and 1020 bytes).
· No NULL is allowed for any column.
· No undocumented features are allowed in the script.
· When the test query is run, the test table must meet the following condition:
§ The test table has 4,000,000 rows.
§ The total size of data and index must be less than 7.5GB, as measured by sp_spaceused.
§ Avg. page density is greater than 50% as reported by DBCC SHOWCONTIG().
· The test query shown above must be run as is without any change.
Why?
Well, it’ll reveal a lot of about how data is stored and how the simple test query is processed.
If you would like to share your solution, and I hope you do, please post it here in the comment. Or if you want it better formatted, I can append it to the main text of this post.
|
-
In an earlier post, I showed that you could run out the log space or force the log file to autogrow in a user database even if you are not running any large transaction, and the user database is in the simple recovery mode.
Whether or not you see that behavior depends on which one gets there first. If the log can be truncated first via a checkpoint that may be woken up when the threshold of a number of criteria is crossed--including the log being 70% full, your log may not grow or become full. However, if the small transactions are issued fast enough, they can fill up the log before the log can be truncated. It’s a kind of race condition.
A question was then raised on whether this is true with tempdb because tempdb does transaction logging quite differently than a user database.
The answer is that you may still see the race condition with tempdb.
I ran the same scripts used in the earlier post. Initially, I could not fill up the tempdb log with small transactions. Its log always got truncated before it became full. By adjusting the parameters in the script, I was able to fill up the tempdb log of 1GB in size with a series of individually committed 8000-byte updates. Changing the update size to 7000 bytes, all the transactions went through fine without blowing the log.
The script to create the test table is as follows:
set nocount on
go
drop table #test
go
create table #test(i int, c char(8000))
go
declare @i int
set @i = 1
while @i < 40000
begin
insert #test values(@i, 'abc')
set @i = @i + 1
end
The following script is run to see if the tempdb log is filled up or forced to autogrow. You may have to adjust the value for parameter @change_size several times before you can find the threshold, above which the script will fills up the tempdb log and below which the script will run to completion with no issue.
set nocount on
go
declare @change_size int
set @change_size = 7000 -- adjust this value
declare @i int
set @i = 1
while @i < 200
begin
if @i % 2 = 0
update #test set c = replicate('a', @change_size)
else
update #test set c = replicate('b', @change_size)
select @i = @i + 1
end
Compared with a regular user database, you indeed have to try harder to fill up the transaction log of tempdb with small transactions.
Regardless whether it’s a user database or tempdb, when its transaction log is filled up or forced to autogrow, large long-running transactions should not be the only potential culprit for you to track down.
|
-
One of the things I like about SQL Server (and Sybase for that matter) is that it comes with tons of system stored procedures. Who doesn't find procs like sp_who, sp_help, and and sp_helprotect indispenable?
But unfortunately it seems to me that Microsoft is on the path to abandon these system stored procedures. Judging from the fact that many of these procs are not being updated to keep up with the new versions of SQL Server and the responses from Microsoft to query the system catalogs and DMVs to make up the difference when this is pointed out and reported, I'd say that updating these system stored procedures are at least not on their priority list.
True, one can argue that you could write scripts to query system catalogs and DMVs yourself, and Microsoft has provided quite a few scripts through their blogs and codeplex projects. But that is not the same as shipping these procs in the product and therefore making them ubiquitous. Yes, you probably can't expose everything--that the new system catalogs and DMVs have to offer--through these system stored procedures. I'd be happy if Microsoft can stick to the 80/20 rules to have the system stored procedures cover 80% of the common admin tasks and leave the other 20% for people to write their own scripts, or at least just keep the existing system procedures updated, whereever applicable.
It just seems to me to be such a huge collective waste of time for the users to keep writing the same or similar admin scripts when they could be written once by Microsoft.
|
-
By server, I mean a computer that is fit for an enterprise data center.
Different people will no doubt look at this question from different angles. If you are into computer technical specs, you probably would be quick to point out the difference in the number and type of processors they support, the number and type of expansion slots they support, the difference in fault tolerance, and so on and so forth.
I spent a few years doing testing and stress testing for a living, and have been conditioned to look at the question from that perspective. As such, my answer is that a server is not supposed to crash under stress test, whereas you probably don’t want to subject your desktop under severe stress workload (unless of course you are a manufacturer or work for a test lab).
|
-
SQL Server can run in one of two modes: thread mode or fiber mode. By default, SQL Server runs in thread mode in which a SQL Server worker is associated with a Windows thread throughout all phases of its execution. This can be changed with the sp_configure option ‘Lightweight Pooling’. When Lightweight Pooling is turned on, SQL Server runs in fiber mode in which a SQL Server worker is associated with a user-mode Windows construct called fiber. Switching among fibers is handled in the user mode with the objective of reducing the cost of calling into the kernel for context switches.
Whenever fiber mode is discussed, Microsoft has always piled a huge load of ominous warnings, generally related to the potential stability issues that may come up with fiber mode. And because of these warnings, fiber mode is rarely, if ever, used in real world production environments. Staying away from fiber mode is of course the right thing to do because many of the ‘external SQL Server components’ that may not be fiber-mode friendly are usually essential in a real production environment.
If you run a bare minimum SQL Server instance (e.g. no XML, no mail, no linked servers), can you expect performance gain from fiber mode? Generally speaking, you may not see any performance gain. But there are cases you can see performance gains.
I was curious whether I could see any performance gain at all on a low-end server such as a four-core HP ProLiant DL365 G1 with 4GB of RAM. Note that this is a rather old and obsolete server model.
It turns out that under certain workloads, fiber mode can give you a huge performance boost. For my tests, I populated a TPC-C like database with about 9GB of data, and ran read-only workloads against the database (with the two read-only transactions in TPC-C in a 50/50 transaction mix). I configured each client to wait for 20 milliseconds before submitting the next query. The following chart shows the difference in transaction throughput at various load levels (in terms of the number of simulated concurrent users running queries against the server) between thread mode and fiber mode. The data points were obtained on a SQL Server 2008 instance (Enterprise x64 Edition and the build level was 10.0.1600).
If we look at the cases for 400 and 600 users for example, the performance gain from thread mode to fiber mode was ~ 40% (with the thread-mode transaction throughput as the baseline). This was a huge gain. I ran the tests in various random orders and repeated the tests at various random times. The results were consistent and reproducible.
I was surprised to see such a difference myself for two reasons. One is that I didn’t see much of a performance difference between thread mode and fiber mode when I ran the same workload in the same test environment without the 20ms wait time. I observed only about 5% consistent but marginal gain, switching from thread mode to fiber mode when the next query was submitted immediately after the previous one was finished. I can’t explain why the wait time made the difference.
The other reason I was surprised is that I had never seen any performance gain from enabling lightweight pooling or fiber mode before, though I had never tried this specific workload and had never tried this on SQL Server 2008.
I’m not sure if there is any practical value in this post because stability concerns will and should always trump any potential performance gain, especially when that performance gain is rather elusive. But at least it’s good to report that I have seen some real performance gain with fiber mode instead of just hearing somebody else talking about it.
|
|
|
|
|
|