With the insert script and the test configurations in my previous posts, the best data load throughput was 24GB in ~7 minutes when the checkpoint (and/or transaction commit) batch size was set to 100,000 ~ 1,000,000. That was the best result when I was trying to get the most out of the insert script. But if we forget about tinkering the insert script and do something completely different, it's not the best we can do on the same hardware.
We can do much better with BulkCopy. No, I'm not talking about the bcp.exe utility or the BULK INSERT statement in T-SQL. These two approaches require that the source data file be present. If we add the time to generate the data file and the time to load from the data file, neither bcp.exe nor BULK INSERT would be the most performant solution.
A better approach is to use the SQL Server BulkCopy programming interface. At the end of this post is a C# program that uses the SqlBulkCopy class from .NET 2.0. For the test, I compiled this program into an executable bulkCopy.exe. Note that the program takes two command-line parameters:
- Connection string: a connection string which identifies the target server and database and the connection credential,
- Batch_size: the number of rows SqlBulkCopy will copy from the DataTable object to the target table. Note that this is not the same batch size discussed in the previous posts.
Then, I ran bulkCpoy.exe locally on the test server as follows (note that the test table was truncated before each call to bulkCopy.exe and the database was sized big enough so that no additional storage extents were allocated to the database during the load):
bulkCopy "server=testServer;database=testDB;trusted_connnection=yes" 1000
bulkCopy "server=testServer;database=testDB;trusted_connnection=yes" 5000
bulkCopy "server=testServer;database=testDB;trusted_connnection=yes" 10000
bulkCopy "server=testServer;database=testDB;trusted_connnection=yes" 50000
bulkCopy "server=testServer;database=testDB;trusted_connnection=yes" 100000
bulkCopy "server=testServer;database=testDB;trusted_connnection=yes" 200000
The best result was obtained when batch_size was set to 100,000. Again note that the SQL Server 2005 instance used in the test was given 12GB for its buffer pool (max server memory and min server memory both were set to 12GB and the row width of the test table was 8K). And the best result was 24GB in ~2.3 minutes, which was about three times as fast as what the insert script could do (~7 minutes).
The following two charts show the bulkCopy.exe run duration for each of the batch_size values (see Fig. 1) and compare the best of the bulkCopy.exe and the best of the insert script (see Fig. 2):
In my previous posts, I stated that it all came down to whether you could take full advantage of the storage throughput potential. That's certainly true when you are focused on tuning the insert script. But with SqlBulkCopy, it gets the same job done with much less I/O traffic. So don't get carried away in pushing your storage.
Here's the source code of bulkCopy.exe:
// -- Target table
// CREATE TABLE tbTest(
// i int NOT NULL,
// filler char(8000) NOT NULL
// )
using System;
using System.IO;
using System.Data;
using System.Data.SqlClient;
public class BulkCopy
{
static void Main(string[] args)
{
const int number_of_rows_to_insert = 1024*1024*3;
string conn = args[0];
int batch_size;
DateTime bcp_begin_time = DateTime.Now;
try
{
batch_size = int.Parse(args[1]);
}
catch (FormatException)
{
throw new ArgumentException("**2nd parameter must be an integer.");
}
using (SqlConnection cx = new SqlConnection(conn))
{
try
{
cx.Open();
}
catch(Exception e)
{
Console.WriteLine(e.ToString());
Environment.Exit(-1);
}
try
{
DataTable dt = new DataTable();
dt.MinimumCapacity = batch_size;
DataColumn i = new DataColumn();
i.DataType = System.Type.GetType("System.Int32");
i.ColumnName = "i";
dt.Columns.Add(i);
DataColumn filler = new DataColumn();
filler.DataType = System.Type.GetType("System.String");
filler.ColumnName = "filler";
dt.Columns.Add(filler);
using (SqlBulkCopy bcp =
new SqlBulkCopy(cx, SqlBulkCopyOptions.TableLock, null))
{
bcp.DestinationTableName = "dbo.tbTest";
bcp.BatchSize = 0;
bcp.BulkCopyTimeout = 100000; // effectively no timeout
for (int bcp_id = 1; bcp_id <= number_of_rows_to_insert; bcp_id++)
{
DataRow row = dt.NewRow();
row["i"] = bcp_id;
row["filler"] = " ";
dt.Rows.Add(row);
if (bcp_id % batch_size == 0)
{
bcp.WriteToServer(dt);
dt.Rows.Clear();
}
}
if (dt.Rows.Count > 0)
{
bcp.WriteToServer(dt);
dt.Rows.Clear();
}
} // using
Console.WriteLine("Rows inserted = {0}, Batch_size = {1}, Duration = {2}",
number_of_rows_to_insert,
batch_size,
Convert.ToInt32(DateTime.Now.Subtract(bcp_begin_time).TotalSeconds));
}
catch(Exception e)
{
Console.WriteLine(e.ToString());
Environment.Exit(-1);
}
} // using cx
} // Main()
} // class BulkCopy