THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

Performance impact: What is the optimal payload for SqlBulkCopy.WriteToServer()?

For many years, I have been using a C# program to generate the TPC-C compliant data for testing. The program relies on the SqlBulkCopy class to load the data generated as per the TPC-C specifications into the SQL Server tables. In general, the performance of this C# data loader is satisfactory. Lately however, I found myself in a situation where I needed to generate a much larger amount of data than I typically do and the data needed to be loaded within a confined time frame. So I was driven to look into the code more carefully to see if it could be significantly sped up.

 

Among other things, I became curious about the performance impact of the amount of data copied by the WriteToServer(DataTable) method. I’ll call this amount of data the WriteToServer payload in this post, or simply the payload.

 

I was a bit lazy when I first coded the C# data loader, and no knowing the full implication, I set the payload to the same value I assigned to the BatchSize property. That proves to be terribly inconvenient when it comes to tuning the loader’s performance.

 

By the way, in the Microsoft Windows SDK documentation, the descriptions for WriteToServer(DataTable)  and BatchSize are as follows:

 

WriteToServer

Copies all rows in the supplied DataTable to a destination table specified by the DestinationTableName property of the SqlBulkCopy object.

BatchSize

Number of rows in each batch. At the end of each batch, the rows in the batch are sent to the server. If UseExternalTransaction is not in effect, each batch of rows is inserted as a separate transaction.

 

Going off the topic a bit, in the BatchSize description, it says that, “At the end of each batch, the rows in the batch are sent to the server” This is at least confusing. I could be wrong, but the setting of BatchSize alone will not cause the SqlBulkCopy object to initiate any action of sending any rows over the wire to the server.

 

The question is how many rows should we have in the DataTable object before we send them over to the server with WriteToServer? In other words, how should we size the WriteToServer payload?

 

The most optimal setting no doubt depends on the configuration of the environment. However, intuitively there is overhead in sending data across the wire, and you would not want the payload to overhead ratio to be too low. On the other hand, you probably don’t want to wait for an excessive amount of time just for the DataTable object to be populated with a very large payload before you send its content. So there should be a happy medium somewhere, we hope.

 

Some tests should help give us a more precise idea

 

I used the TPC-C customer table for my tests. The customer table was defined as follows:

 

(

    c_id                int,

    c_d_id              tinyint,

    c_w_id              int,

    c_first             char(16),

    c_middle            char(2),

    c_last              char(16),

    c_street_1          char(20),

    c_street_2          char(20),

    c_city              char(20),

    c_state             char(2),

    c_zip               char(9),

    c_phone             char(16),

    c_since             datetime,

    c_credit            char(2),

    c_credit_lim        numeric(12,2),

    c_discount          numeric(4,4),

    c_balance           numeric(12,2),

    c_ytd_payment       numeric(12,2),

    c_payment_cnt       smallint,

    c_delivery_cnt      smallint,

    c_data              char(500)

)

 

For each of the tests, three million rows were loaded into the customer table without any index , and the following row numbers were tested for the WriteToServer method:

 

2, 20, 100, 200, 2,000, 20,000, and 3,000,000

 

Three million rows amounted to about 2.2GB of data after they were all loaded into the customer table. This amount of test data was small compared to the amount of data I needed to load, but was sufficient for the test purpose, and it helped reducing the test time. In addition, for all the tests, SqlBulkCopy.BatchSize was set to 20,000.

 

The other test parameter considered was the number of threads in loading the data concurrently into non-overlapping ranges. I looked at loading the three million rows single threaded versus in 10 threads.

 

The results are in the following table:

 

Threads

Payload

(WriteToServer Row Number)

Load Duration (second)

1

2

>1440

1

20

~1440

1

100

397

1

200

300

1

2,000

289

1

20,000

278

1

3,000,000

270

10

2

1998

10

20

238

10

100

105

10

200

90

10

2,000

94

10

20,000

100

10

3,000,000

N/A

 

So, at least for my test environment, it does not look like a good idea for WriteToServer to send 20 or fewer rows to the server each time. We should stock up the payload a bit more before each trip.

 

The loader program was run on one of the nodes in a two-node cluster, and the SQL Server instance was on the other node. The network in between was 1Gbps. There was more than sufficient bandwidth and the latency was low. With extremely low payload (i.e. when the DataTable object only had 2~20 rows), the network bandwidth was barely used (4% or so).

 

A salient point to note from the test result is that once the payload crossed a certain point, for instance 200 rows in these tests, the exact payload size no longer mattered much. For instance, it made little difference whether it was 2000 rows or 20,000 rows.

 

Loading data concurrently did help to mask the penalty of a very low payload. For instance, when the data was loaded in a single thread, setting payload to 20 would cause the loader to take 1440 seconds to load the three million rows, whereas the same payload with 10 threads dramatically reduced the load time to 238 seconds.

 

In the next post, I’ll look at a few other implications of the payload setting more closely.

Published Friday, July 01, 2011 12:54 PM by Linchi Shea

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Steven Yampolsky said:

When dealing with large data payload transfer, it merits taking a look at the network packet size configuration. There are two questions that need to be answered:

1) Does SqlBulkCopy packet size defaults to SqlClient's 8000 packet size or database default size of 4096? http://support.microsoft.com/kb/2008195

2) Does your network MTU window size forces packet fragmentation on the way to the server. Try the following command on the client: ping <server> -l 4096 –f . If the packets are getting fragmented on the way to the server, this command will fail with “Packet needs to be fragmented but DF set.” error

July 1, 2011 4:26 PM
 

Linchi Shea said:

The network packet size was kept at 4096 throughout all the tests. The netwok MTU between the two nodes is 1473.

For these tests, I don't think the network packet size setting was a significant factor. But I agree that it may be worth taking a systematic look just to be sure.

July 2, 2011 3:59 PM
 

Linchi Shea said:

If you have some time to kill, you could try to load a tiny amount of data, say ~3GB or 3,000,000 rows,

July 2, 2011 10:25 PM
 

Jack Vamvas said:

The network packet problem, i.e over 8000 ,i've experienced in the past , which required multi-page allocation from VAS.

I used

select s.session_id, s.host_name, s.host_process_id, s.program_name, c.net_packet_size

from sys.dm_exec_connections c

join sys.dm_exec_sessions s on c.session_id = s.session_id

where net_packet_size >= 8000

to identify errant information.

July 3, 2011 5:03 AM
 

Ivo Tops said:

I load from c# by setting batchsize supplying a datareader for WriteToServer. This makes sqlbulkcopy read and write efficiently in its own chunks.

October 7, 2014 11:56 AM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement