Recently, I was asked why the following script took a very long time to insert 24GB of data into a single table in SQL Server 2005 (the database was in the simple recovery mode):
-- Name: Script 1
SET NOCOUNT ON
SET IMPLICIT_TRANSACTIONS ON
DECLARE @i int
SET @i = 1
WHILE @i <= 3145728 -- (1024*1024*3)
BEGIN
INSERT tbTest(i, filler)
VALUES (@i, ' ')
IF (@i % 16 = 0)
BEGIN
COMMIT TRAN
CHECKPOINT
END
SET @i = @i + 1
END
The table used in the script (referred to as Script 1 in the rest of this post) is defined as follows:
CREATE TABLE tbTest(
i int NOT NULL,
filler char(8000) NOT NULL
)
My response was to comment out the CHECKPOINT statement in Script 1, i.e. to rely on automatic checkpoints instead. The result was a dramatic reduction in the script run time--from 211 minutes (3.5 hours) with CHECKPOINT to 14 minutes without CHECKPOINT--to insert the same 24GB of data (the recovery interval was the default value of 0). The following chart illustrates the impact:
Why such a huge difference in performance whether or not to checkpoint after inserting every 16 pages? Among other things, one significant factor is that issuing checkpoints at a high frequency with a small number of dirty pages doesn't give SQL Server a sufficient opportunity to perform large disk writes. Disk writes in small I/O sizes cannot take full advantage of the throughput potential that the storage on which the data files reside may offer. For the details of how SQL Server flushes its data pages, I highly recommend reading Bob Dorr "SQL Server 2000 I/O Basics".
So, be careful out there with frequent manual checkpoints.
Note that relying on automatic checkpoints is not the only way to speed up the script. Actually, it's not even the most optimal way. I'll look at this issue in more detail in two follow-up posts here.