THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

Performance Impact: Some Data Points on Read-Ahead

In the next series of posts, I'll focus on SQL Server I/O, revisiting some common issues and taking a closer look at some others. In each post and as always, I'll make the case with specific data points from my tests. For the first two posts in this series, let me check out the read-ahead technique used by SQL server.

Read-ahead is an important I/O optimization technique used by SQL Server.  Intuitively, if SQL Server can correctly forecast the need for more pages and read these pages ahead of time when they are needed for query processing in memory, your query is expected to perform better. Bob Dorr in his classic SQL Server 2000 I/O Basics whitepaper has an excellent description on how SQL Server read-ahead works. Every SQL Server professional with any interest at all in the storage engine should read that paper.

But just how important is the read-ahead technique in query processing? Let's look at some results from an extremely simple test.

First of all, a single table was used in the test, and here is the definition:

CREATE TABLE test (
      i           int NOT NULL,
      j           int NOT NULL,   
      dt          datetime,
      filler      char(5000) NOT NULL
)

I then created a clustered index on the i column (for this test column j was not used and you can ignore it):

CREATE CLUSTERED INDEX cix_test ON test(i);

And the table was populated with 2,000,0000 rows with the following INSERT statement in a loop with @i going from 1 to 2,000,000:

INSERT test(i, j, dt, filler)
SELECT @i, CASE WEHN @i % 2 = 0 THEN @i ELSE 2000000 - @i END, getdate(), 'abc'

With the statistics updated, I then ran the following SELECT query in two test scenarios:

DBCC DROPCLEANBUFFERS;
go
SELECT max(dt) FROM test;

The two test scenarios are as follows:

Test Scenario Description
Read-ahead enabled This is the default SQL Server behavior.
Read-ahead disabled Read-ahead was disabled with trace flag 652: DBCC TRACEON(-1, 652)

First, let's look at the elapsed time of the above SELECT query in the two scenarios (the numbers reported in the following table are averages over several test runs):

Test Scenario Elapsed Time of SELECT (second)
Read-ahead enabled 80
Read-ahead disabled 210

In both scenarios, the SELECT query was correctly processed with a clustered index scan. When read-ahead was disabled, the clustered index scan took almost three times as long as did the clustered index scan when read-ahead was not disabled. For this query, the performance difference was astounding.

I don't have access to the source code that controls read-ahead so I can't tell exactly why and how it made such a huge difference from the code logic perspective. However, from Bob Dorr's description in the SQL Server 2000 I/O Basics whitepaper, it's rather clear that SQL Server read-ahead is quite aggressive in exploiting the performance capacity of the storage I/O subsystem.  So we can turn to observing the storage I/O behavior for explanation. The following table summarizes the values of the key I/O counters observed during the test runs:

I/O Counter Value When Read-ahead is Enabled Value When Read-ahead is Disabled
Bytes/Read > 350KB/read ~ 8KB/read
Megabytes/sec 180 ~ 200MB/sec ~ 80MB/sec
SQL Server Readahead ~20,000 readahead pages/sec 0

Clearly, with read-ahead, SQL Server was able to take advantage of large sized I/Os (e.g. ~350KB per read). Large-sized I/Os are generally much more efficient than smaller-sized I/Os, especially when you actually need all the data read from the storage as was the case with the test query. From the table above, it's evident that the read throughput was significantly higher when read-ahead was enabled than it was when read-ahead was disabled. In other words, without read-ahead, SQL Server was not pushing the storage I/O subsystem hard enough, contributing to a significantly longer query elapsed time.

That is, the table was about 16,000MB in size. At ~200MB/sec, it would take about 80 seconds to read 16,000MB, and at ~80MB/sec, it would take about 200 seconds to read the same amount of data. And these numbers (i.e. 80 seconds and 200 seconds) match nicely the recorded query elaped times which are reported in the second table above.

In the next post, I'll check out the impact of disabling read-ahead on bookmark or key lookups when a nonclustered index is used.

Published Friday, July 04, 2008 4:24 PM by Linchi Shea

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Linchi Shea said:

I don't know how to get the table borders displayed in this blog post, although all the HTML border tags are in place and the borders are visible in FrontPage, Windows Live Writer, and even in the edit window of this site. If you know the trick, please let me know.

July 4, 2008 4:28 PM
 

Linchi Shea said:

It’s well known that bookmark lookup (aka key lookup in case of a clustered index) is not cheap. So I’m

July 6, 2008 10:06 PM
 

Demystifying SharePoint Performance Management Part 6 – The unholy trinity of Latency, IOPS and MBPS | Clever Workarounds said:

November 9, 2014 2:41 PM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement