THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

Performance Impact: the Most Optimal Insert Script can't Beat BulkCopy

With the insert script and the test configurations in my previous posts, the best data load throughput was 24GB in ~7 minutes when the checkpoint (and/or transaction commit) batch size was set to 100,000 ~ 1,000,000. That was the best result when I was trying to get the most out of the insert script. But if we forget about tinkering the insert script and do something completely different, it's not the best we can do on the same hardware.

We can do much better with BulkCopy. No, I'm not talking about the bcp.exe utility or the BULK INSERT statement in T-SQL. These two approaches require that the source data file be present. If we add the time to generate the data file and the time to load from the data file, neither bcp.exe nor BULK INSERT would be the most performant solution.

A better approach is to use the SQL Server BulkCopy programming interface. At the end of this post is a C# program that uses the SqlBulkCopy class from .NET 2.0. For the test, I compiled this program into an executable bulkCopy.exe. Note that the program takes two command-line parameters:

  1. Connection string: a connection string which identifies the target server and database and the connection credential,
  2. Batch_size: the number of rows SqlBulkCopy will copy from the DataTable object to the target table. Note that this is not the same batch size discussed in the previous posts.

Then, I ran bulkCpoy.exe locally on the test server as follows (note that the test table was truncated before each call to bulkCopy.exe and the database was sized big enough so that no additional storage extents were allocated to the database during the load):

bulkCopy "server=testServer;database=testDB;trusted_connnection=yes" 1000
bulkCopy "server=testServer;database=testDB;trusted_connnection=yes" 5000
bulkCopy "server=testServer;database=testDB;trusted_connnection=yes" 10000
bulkCopy "server=testServer;database=testDB;trusted_connnection=yes" 50000
bulkCopy "server=testServer;database=testDB;trusted_connnection=yes" 100000
bulkCopy "server=testServer;database=testDB;trusted_connnection=yes" 200000

The best result was obtained when batch_size was set to 100,000. Again note that the SQL Server 2005 instance used in the test was given 12GB for its buffer pool (max server memory and min server memory both were set to 12GB and the row width of the test table was 8K). And the best result was 24GB in ~2.3 minutes, which was about three times as fast as what the insert script could do (~7 minutes).

The following two charts show the bulkCopy.exe run duration for each of the batch_size values (see Fig. 1) and compare the best of the bulkCopy.exe and the best of the insert script (see Fig. 2):

In my previous posts, I stated that it all came down to whether you could take full advantage of the storage throughput potential. That's certainly true when you are focused on tuning the insert script. But with SqlBulkCopy, it gets the same job done with much less I/O traffic. So don't get carried away in pushing your storage.

Here's the source code of bulkCopy.exe:

// -- Target table
// CREATE TABLE tbTest(
//     i      int NOT NULL, 
//     filler char(8000) NOT NULL
// )
using System;
using System.IO;
using System.Data;
using System.Data.SqlClient;

public class BulkCopy
{

    static void Main(string[] args)
    {
        const int number_of_rows_to_insert = 1024*1024*3;

        string conn = args[0];
        int batch_size;
        DateTime bcp_begin_time = DateTime.Now;        
        
        try
        {
            batch_size = int.Parse(args[1]);
        }
        catch (FormatException)
        {
            throw new ArgumentException("**2nd parameter must be an integer.");
        }

        using (SqlConnection cx = new SqlConnection(conn))
        {
            try
            {
                cx.Open();
            }
            catch(Exception e)
            {
                Console.WriteLine(e.ToString());
                Environment.Exit(-1);
            }

            try
            {
                DataTable dt = new DataTable();
                dt.MinimumCapacity = batch_size;

                DataColumn i = new DataColumn();
                i.DataType = System.Type.GetType("System.Int32");
                i.ColumnName = "i";
                dt.Columns.Add(i);

                DataColumn filler = new DataColumn();
                filler.DataType = System.Type.GetType("System.String");
                filler.ColumnName = "filler";
                dt.Columns.Add(filler);

                using (SqlBulkCopy bcp = 
                                new SqlBulkCopy(cx, SqlBulkCopyOptions.TableLock, null))
                {
                    bcp.DestinationTableName = "dbo.tbTest";
                    bcp.BatchSize = 0;
                    bcp.BulkCopyTimeout = 100000;  // effectively no timeout

                    for (int bcp_id = 1; bcp_id <= number_of_rows_to_insert; bcp_id++)
                    {
                        DataRow row = dt.NewRow();

                        row["i"]     = bcp_id;
                        row["filler"]   = " ";

                        dt.Rows.Add(row);

                        if (bcp_id % batch_size == 0)
                        {
                            bcp.WriteToServer(dt);
                            dt.Rows.Clear();
                        }
                    }
                    if (dt.Rows.Count > 0)
                    {
                        bcp.WriteToServer(dt);
                        dt.Rows.Clear();
                    }
                }  // using

                Console.WriteLine("Rows inserted = {0}, Batch_size = {1}, Duration = {2}",
                      number_of_rows_to_insert,
                      batch_size,
                      Convert.ToInt32(DateTime.Now.Subtract(bcp_begin_time).TotalSeconds));
            }
            catch(Exception e)
            {
                Console.WriteLine(e.ToString());
                Environment.Exit(-1);
            }
        }  // using cx


    }  // Main()

}  // class BulkCopy
Published Monday, August 27, 2007 11:46 AM by Linchi Shea

Attachment(s): bulkcopy.gif

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

AlbertoFerrari said:

Nice test Jason.

We too have used the SqlBulkCopy class along with partitioning to perform very fast table copy (we use it for mirroring DBs) between different servers, maybe you're interested in taking a look at it.

You'll find sources at www.sqlbi.eu/SqlBulkTool.aspx.

Alberto

September 1, 2007 5:34 AM
 

Chuck Lathrope said:

What if you want to log data from 10 different servers sending total of 10 million plus commands a day with some of the data rows going past the 8K limit in varchar(max) or xml datatype since data is most likely in XML form from most web applications? Should you use the .net bulkcopy class and stage data in small batches and send, or possible use Service Broker and can it handle that kind of load?

April 3, 2008 4:31 PM
 

s said:

no

July 3, 2008 7:59 AM
 

Naveen Prabhu said:

Thanks for writing such a good article. I was searching how to improve my application speed.

August 12, 2008 9:42 AM
 

Steve said:

Hi Linchi,

I am having an issue with ms sql 2005 service. It is filling up the memory very fast. I have a file with 24 mil records and it is a serious problem doing the import using the sqlBulkImport class. How do you dispose the memory from the sql service say every 100000 records of import?

If reset the sql service then the memory is being released into the OS. I need to import multiple files that be could either CSV or DBF files. I found out removing the indexes reduce by far the memory foot print it takes.

Thanks,

Steve

November 12, 2008 6:54 PM
 

Engin said:

Solid empirical study, really helped me figure out my own problem.

October 15, 2009 3:32 PM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement