In the next series of posts, I'll focus on SQL Server I/O, revisiting some common issues and taking a closer look at some others. In each post and as always, I'll make the case with specific data points from my tests. For the first two posts in this series, let me check out the read-ahead technique used by SQL server.
Read-ahead is an important I/O optimization technique used by SQL Server. Intuitively, if SQL Server can correctly forecast the need for more pages and read these pages ahead of time when they are needed for query processing in memory, your query is expected to perform better. Bob Dorr in his classic SQL Server 2000 I/O Basics whitepaper has an excellent description on how SQL Server read-ahead works. Every SQL Server professional with any interest at all in the storage engine should read that paper.
But just how important is the read-ahead technique in query processing? Let's look at some results from an extremely simple test.
First of all, a single table was used in the test, and here is the definition:
CREATE TABLE test (
i int NOT NULL,
j int NOT NULL,
filler char(5000) NOT NULL
I then created a clustered index on the i column (for this test column j was not used and you can ignore it):
CREATE CLUSTERED INDEX cix_test ON test(i);
And the table was populated with 2,000,0000 rows with the following INSERT statement in a loop with @i going from 1 to 2,000,000:
INSERT test(i, j, dt, filler)
SELECT @i, CASE WEHN @i % 2 = 0 THEN @i ELSE 2000000 - @i END, getdate(), 'abc'
With the statistics updated, I then ran the following SELECT query in two test scenarios:
SELECT max(dt) FROM test;
The two test scenarios are as follows:
||This is the default SQL Server behavior.|
||Read-ahead was disabled with trace flag 652: DBCC TRACEON(-1, 652)|
First, let's look at the elapsed time of the above SELECT query in the two scenarios (the numbers reported in the following table are averages over several test runs):
||Elapsed Time of SELECT (second)|
In both scenarios, the SELECT query was correctly processed with a clustered index scan. When read-ahead was disabled, the clustered index scan took almost three times as long as did the clustered index scan when read-ahead was not disabled. For this query, the performance difference was astounding.
I don't have access to the source code that controls read-ahead so I can't tell exactly why and how it made such a huge difference from the code logic perspective. However, from Bob Dorr's description in the SQL Server 2000 I/O Basics whitepaper, it's rather clear that SQL Server read-ahead is quite aggressive in exploiting the performance capacity of the storage I/O subsystem. So we can turn to observing the storage I/O behavior for explanation. The following table summarizes the values of the key I/O counters observed during the test runs:
||Value When Read-ahead is Enabled
||Value When Read-ahead is Disabled |
||180 ~ 200MB/sec
|SQL Server Readahead
||~20,000 readahead pages/sec
Clearly, with read-ahead, SQL Server was able to take advantage of large sized I/Os (e.g. ~350KB per read). Large-sized I/Os are generally much more efficient than smaller-sized I/Os, especially when you actually need all the data read from the storage as was the case with the test query. From the table above, it's evident that the read throughput was significantly higher when read-ahead was enabled than it was when read-ahead was disabled. In other words, without read-ahead, SQL Server was not pushing the storage I/O subsystem hard enough, contributing to a significantly longer query elapsed time.
That is, the table was about 16,000MB in size. At ~200MB/sec, it would take about 80 seconds to read 16,000MB, and at ~80MB/sec, it would take about 200 seconds to read the same amount of data. And these numbers (i.e. 80 seconds and 200 seconds) match nicely the recorded query elaped times which are reported in the second table above.
In the next post, I'll check out the impact of disabling read-ahead on bookmark or key lookups when a nonclustered index is used.