In this previous post I asked whether 100% logical scan fragmentation is always worse than 85% local scan fragmentation for table/index scans. (To be precise, I was talking about a B-tree table, i.e. a table with a clustered index).
The answer is no. 100% logical scan fragmentation is not always worse than 85% logical scan fragmentation in terms of table or clustered index scans. And the following is an example.
Two T-SQL scripts were used to populate a test table in separate tests. The key differences between these two scripts are highlighted in the code segments below in bold red.
Script I – populating data with 85% logical scan fragmentation
Script I below populates the test table with 2,000,000 rows. The clustered index key column is assigned values that alternate between small and large with the intention to fragment the clustered index.
create table test(c1 int, c2 int, c3 char(1000) default ' ')
go
create clustered index cix_test on test(c2)
go
set nocount on
go
declare @i int
set @i = 1
begin tran
while @i <= 2000000
begin
insert test(c1, c2)
select @i,
case when @i % 2 = 0 then @i else 2000000 - @i end
if @i % 100000 = 0
begin
commit tran
begin tran
end
set @i = @i + 1
end
commit tran
go
At the completion of Script I, the test table can be summarized as follows:
|
Property |
Value |
|
Logical scan fragmentation |
~85% |
|
Average page density |
~54% |
|
Pages scanned |
466,668 |
|
Extents scanned |
300,131 |
|
Rows |
2,000,000 |
|
Reserved space |
~3.6GB |
Script II – populating data with 100% logical scan fragmentation
Script II below populates the test table with 3,000,000 rows. That’s right, 3,000,000 instead of 2,000,000. The clustered index key column is assigned values in the order exactly opposite of that for the clustered index.
create table test(c1 int, c2 int, c3 char(1000) default ' ')
go
create clustered index cix_test on test(c2)
go
set nocount on
go
declare @i int
set @i = 1
begin tran
while @i <= 3000000
begin
insert test(c1, c2)
select @i,
3000000 - @i; -- from high to low
if @i % 100000 = 0
begin
commit tran
begin tran
end
set @i = @i + 1
end
commit tran
go
At the completion of Script II, the test table can be summarized as follows:
|
Property |
Value |
|
Logical scan fragmentation |
100% |
|
Average page density |
~88% |
|
Pages scanned |
428,572 |
|
Extents scanned |
53,772 |
|
Rows |
3,000,000 |
|
Reserved space |
~3.3GB |
Script III – the test query
At the completion of Script I and Script II, the following test script is run:
dbcc dropcleanbuffers
go
select COUNT(*) from test;
go
The test results
And the elapsed time of the SELECT statement is timed. The following table shows the performance of the test Script III:
|
Data |
SELECT COUNT(*) Performance (sec) |
|
Script I (85% fragmentation) |
100 |
|
Script II (100% fragmentation) |
27 |
The following chart shows the same data in the above table:
Observations
Note that since Script II populates the test in a more compact fashion (avg. page density = ~88% as opposed ~54% for Script I) and there would be fewer pages if the same number of rows are entered by both scripts. So, to make the comparison meaningful, I increased the number of rows for Script II to 3,000,000, and this resulted in the test table to have similar number of pages and similar sizes.
Why does the table scan performed significantly faster with the data that was 100% fragmented than with the data that was 85% fragmented? In fact, the query performed four times faster with the data that was 100% fragmented.
The key reason is that even though the data was 100% fragmented in terms of DBCC SHOWCONTIG or sys.dm_db_index_physical_stats, the pages were still considered ‘contiguous’, and SQL Server was able to post large reads and drove the I/O subsystem with significantly higher read-ahead reads.
More specifically, with the data populated by Script II (logical scan fragmentation = 100%), SQL Server was able to process the test query with significantly larger reads (~360KB/read vs. ~14KB ~37KB/read for the Script I data) and with significantly higher readahead reads (11,000~12,000 readahead pages/sec vs. 3,600~6,000 readahead pages/sec for the Script I data).
So, higher logical scan fragmentation does not necessarily mean worse table/index scan performance.