THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

Performance Impact: Finding the Most Optimal Batch Size

In my previous posts (1, 2, 3), I focused on the performance behavior of setting the checkpoints and transaction commit sizes to once every 16 inserts and once every 100,000 inserts. A question remains: what is the most optimal size?

In other words, in the following script, what value should we give to variable @batch_size so that the script will complete in the least amount of time?

CREATE TABLE tbTest(
    i      int NOT NULL, 
    filler char(8000) NOT NULL
)

sp_configure 'recovery interval', 32767  -- or 0 for auto checkpoints
go
reconfigure with override
go

SET NOCOUNT ON
SET IMPLICIT_TRANSACTIONS ON

DECLARE @batch_size int
SET @batch_size = 100,000   -- to be set to different values

DECLARE @i int
SET @i = 1
WHILE @i <= 3145728  -- (1024*1024*3)
BEGIN
    INSERT tbTest(i, filler)
    VALUES (@i, ' ')

    IF (@i % @batch_size = 0)
    BEGIN
       COMMIT TRAN 
       CHECKPOINT
    END

    SET @i = @i + 1
END

As mentioned previously, the optimal value for @batch_size--the batch size of the transaction commits and checkpoints--is the one that can get SQL Server to take full advantage of the storage performance capacity. The performance of this test script is first and foremost a storage issue. You can try to calculate the size of each checkpoint and transaction for a given value of @batch_size, and therefore get some idea as to what kind of I/Os SQL Server may do. But that's not all that reliable. It's easier to just run a series of tests to see the performance impact of the @batch_size values.

For the tests, I ran the script with @batch_size set to the following, respectively:

  • 3,000,000,
  • 2,000,000,
  • 1,000,000,
  • 100,000,
  • 10,000,
  • 32,
  • 16, and
  • 8

The following two charts show the script run duration for each of the @batch_size values when (1) manual checkpoints were used, and (2) automatic checkpoints were used. For manual checkpoints, the 'recovery interval' sp_configure option was set to 32,767 effectively disabling automatic checkpoints, and for automatic checkpoints, the option was set to the default value of 0.

Fig. 2 is a subset of Fig. 1 with @batch_size only for the range between 3,000,000 and 10,000 since in Fig. 1 the duration difference between manual checkpoints and automatic checkpoints for this range is completely obscured by the huge values for the range between 10,000 and 8.

Three observations are immediately obvious from these two charts.

First of all, you definitely don't want to commit and checkpoint too often. The adverse impact is rather dramatic (e.g. the duration could increase from several minutes to several hundred minutes).

Secondly, for practical reasons, rather than going through all the trouble to find an optimal value for @batch_size, it's easier to just rely on automatic checkpoints. And in that case, it doesn't make much a difference what value you may set variable @batch_size to. In other words, the performance of the script is not sensitive to the size of the transaction commits. Well that's true at least in this particular test setup. And, no doubt one can find a test setup that the transaction commit size can make a huge difference, and should be carefully considered. So, don't over-generalize on this observation.

Finally, if you do decide to control the checkpoints yourself, issuing checkpoints too infrequently can't degrade performance, as shown in Fig. 2, though one can argue whether it really matters that much if the script completes in 7 minutes rather than 15 minutes. But the trend of the impact is unmistakable.

My take for the performance degradation when @batch_size was set to 2,000,000 or 3,000,000 is that dirty pages eventually filled up all the buffer pool and forced SQL Server to rely on lazy writes to flush pages to disk and make room in memory. Note that in this test 2,000,000 inserts translate into 2,000,000 pages which are 16GB in size and greater than 12GB allocated to the buffer pool of the SQL Server instance. If you think there is a better explanation, let me know.

Published Thursday, August 23, 2007 4:44 PM by Linchi Shea

Attachment(s): checkpoint4.gif

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Linchi Shea said:

With the insert script and the test configurations in my previous posts , the best data load throughput

August 27, 2007 11:05 AM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement