THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

T-SQL Exercise: Simplest data-loading script to produce worst query performance – May 29th Update

This is another follow-up on the T-SQL exercise.


So the test query below is rather simple:




SELECT COUNT(*) FROM dbo.test;


But beneath its simple appearance, many factors are at play and interact in a complex way to influence the query performance. In other words, trying to predict its performance is not trivial. You may do better than predicting the stock market performance. But still it can be a rather unreliable business.


So in the May 25th update, I reported on the results of running the test queries with the data file on a RAID-1 set with two internal disks. Yesterday, I triple checked the test result and the result could be re-produced at will.


To see whether or what impact a different storage system may have on the performance, I placed the data file on a departmental disk array behind a SAN. Unlike a typical enterprise class disk array, this departmental disk array does not have a huge amount of cache. In addition, the data block placement is effectively virtualized in that it uses a mapping table to place data across all the drives, making it effectively impossible to know or control where the data is placed at the OS level.


With the data file on this disk array, I repeated the same tests. Each time before the data was loaded, the test table was dropped and the database was empty with no user data.


Again, after the data was loaded by each script, the test query (including DBCC DROPCLEANBUFFERS) was run three times, and each time the query elapsed time was logged. Additional validation test runs were performed to ensure that the test results were re-producible, and they are re-producible.


The following table shows the recorded elapsed times.


Test run

Elapsed time (second)

Adam’s script – test run 1


Adam’s script – test run 2


Adam’s script – test run 3



Tibor’s script – test run 1


Tibor’s script – test run 2


Tibor’s script – test run 3



Linchi’s script – test run 1


Linchi’s script – test run 2


Linchi’s script – test run 3



First thing to note from the above table is that the ranking of the test query performance changed significantly. In this setup, Tibor’s data set caused the test query to have the worst performance.


But what is most striking is that while the test query performed significantly better with Adam’s and my data sets on the disk array than they did on the internal disks, the test query performed noticeably worse with Tibor’s data set on the disk array than it did on the internal disks.


This is getting more interesting. And I might add that this is yet another piece of evidence on the conspiracy of the every-table-must-have-a-clustered-index crowd :-)


I’ll come back to explore what might be the cause later.

Published Friday, May 29, 2009 1:23 PM by Linchi Shea



Mike Walsh said:

Well it has been awhile since we've seen the CIX vs Heap crowd come out fighting...

.:Grabbing some popcorn and a coke:.

May 29, 2009 12:49 PM

Linchi Shea said:

All the previously posted results ( May 25 th and May 29 th ) were obtained with query parallelism disabled

June 7, 2009 9:32 PM
New Comments to this post are disabled

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog


Privacy Statement