THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

Performance impact: Logical scan fragmentation -- 100% may not be as bad as 85%

In this previous post I asked whether 100% logical scan fragmentation is always worse than 85% local scan fragmentation for table/index scans. (To be precise, I was talking about a B-tree table, i.e. a table with a clustered index).

 

The answer is no. 100% logical scan fragmentation is not always worse than 85% logical scan fragmentation in terms of table or clustered index scans. And the following is an example.

 

Two T-SQL scripts were used to populate a test table in separate tests. The key differences between these two scripts are highlighted in the code segments below in bold red.

 

Script I – populating data with 85% logical scan fragmentation

 

Script I below populates the test table with 2,000,000 rows. The clustered index key column is assigned values that alternate between small and large with the intention to fragment the clustered index.

 

create table test(c1 int, c2 int, c3 char(1000) default ' ')

go

create clustered index cix_test on test(c2)

go

 

set nocount on

go

declare  @i int

set @i = 1

begin tran

while @i <= 2000000

begin

      insert test(c1, c2)

      select @i,

             case when @i % 2 = 0 then @i else 2000000 - @i end

            

      if @i % 100000 = 0

      begin

         commit tran

         begin tran

      end

      set @i = @i + 1

end

commit tran

go

 

At the completion of Script I, the test table can be summarized as follows:

 

Property

Value

Logical scan fragmentation

~85%

Average page density

~54%

Pages scanned

466,668

Extents scanned

300,131

Rows

2,000,000

Reserved space

~3.6GB

 

Script II – populating data with 100% logical scan fragmentation

 

Script II below populates the test table with 3,000,000 rows. That’s right, 3,000,000 instead of 2,000,000. The clustered index key column is assigned values in the order exactly opposite of that for the clustered index.

 

create table test(c1 int, c2 int, c3 char(1000) default ' ')

go

create clustered index cix_test on test(c2)

go

 

set nocount on

go

declare  @i int

set @i = 1

begin tran

while @i <= 3000000

begin

      insert test(c1, c2)

      select @i,

             3000000 - @i;  -- from high to low

            

      if @i % 100000 = 0

      begin

         commit tran

         begin tran

      end

      set @i = @i + 1

end

commit tran

go

 

At the completion of Script II, the test table can be summarized as follows:

 

Property

Value

Logical scan fragmentation

100%

Average page density

~88%

Pages scanned

428,572

Extents scanned

53,772

Rows

3,000,000

Reserved space

~3.3GB

 

Script III – the test query

 

At the completion of Script I and Script II, the following test script is run:

 

dbcc dropcleanbuffers

go

select COUNT(*) from test;

go

 

The test results

 

And the elapsed time of the SELECT statement is timed. The following table shows the performance of the test Script III:

 

Data

SELECT COUNT(*) Performance (sec)

Script I (85% fragmentation)

100

Script II (100% fragmentation)

27

 

The following chart shows the same data in the above table:

 

Observations

Note that since Script II populates the test in a more compact fashion (avg. page density = ~88% as opposed ~54% for Script I) and there would be fewer pages if the same number of rows are entered by both scripts. So, to make the comparison meaningful, I increased the number of rows for Script II to 3,000,000, and this resulted in the test table to have similar number of pages and similar sizes.

 

Why does the table scan performed significantly faster with the data that was 100% fragmented than with the data that was 85% fragmented? In fact, the query performed four times faster with the data that was 100% fragmented.

 

The key reason is that even though the data was 100% fragmented in terms of DBCC SHOWCONTIG or sys.dm_db_index_physical_stats, the pages were still considered ‘contiguous’, and SQL Server was able to post large reads and drove the I/O subsystem with significantly higher read-ahead reads.

 

More specifically, with the data populated by Script II (logical scan fragmentation = 100%),  SQL Server was able to process the test query with significantly larger reads (~360KB/read vs. ~14KB ~37KB/read for the Script I data) and with significantly higher readahead reads (11,000~12,000 readahead pages/sec vs. 3,600~6,000 readahead pages/sec for the Script I data).

 

So, higher logical scan fragmentation does not necessarily mean worse table/index scan performance.

 

Published Monday, June 22, 2009 1:02 PM by Linchi Shea

Attachment(s): lsf.GIF

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Ranga Narasimhan said:

Linchi,

Interesting and Informative as always!

Ranga

June 22, 2009 1:43 PM
 

Scott R. said:

Linchi,

Thanks for sharing these test results.  Good info to know.

As a point of reference (best case), could you run a load of a similar table like your script II but setting c2 and c1 to the same loop value @i (hopefully resulting in a fully unfragmented table) with a similar Select Count(*) as a test?

I hope to reproduce your tests when I get some free time.

Thanks,

Scott R.

June 22, 2009 4:40 PM
 

Linchi Shea said:

Scott;

With 2,000,000 rows, the the c2 values are sorted in teh same order as expected by the clustered index, the test query completes in 28 seconds -- the same as wiht the Script II data. Note that the Script II data has 3,000,000 rows.

After I have changed the number of rows to 3,000,000, the test query completes in ~38 seconds. Note that in this case the logical scan fragmentation is 0%, but yet the tets query performs much better with the data whose logical scan fragmentation is 100%.

To make sure that this is not an accident, I have re-run the tests several times with both 0% logical scan fragmentation and 100% logical scan fragmentation with the same number of rows (3,000,000). The test query performance is consistently better with 100% logical scan fragmentation (~38 seconds vs. ~ 28 seconds).

I'm puzzled. I mean, the same amount of data, same number of pages, same page density.

BTW, all the tests are run on an internal disk. So there is no interfere that may comes with a SAN.

June 22, 2009 5:36 PM
 

Brian Tkatch said:

Linchi, excellent. i hope to see similar tests so i can get the point a little better.

BTW, there is a typo in the graph. It says "Performane" instead of "Performance".

June 23, 2009 8:13 AM
 

Linchi Shea said:

Thanks Brian! I have corrected the typo in the graph.

June 23, 2009 8:51 AM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement