THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Linchi Shea

Checking out SQL Server via empirical data points

Which is Faster -- (1) Copy and Local Bulk Insert or (2) Bulk Insert across Network?

 

A while back, I was asked which of the following two bulk insert options would be faster:

  • Copy/Bulk Insert: First copy the data file from server B to server A, and then bulk insert the data into a table on server A
  • Bulk Insert across Network: Directly bulk insert the data from server B into the table on server A

Intuitively, the first option--Copy/Bulk Insert--would be slower than the second option--Bulk Insert across Network, because the Copy/Bulk Insert option would write the data to disk twice, first for the copy operation and the second time for the bulk insert operation, and read the data file from disk twice, first from server B and then from server A.

So the question is really not which one is faster, but how much faster the second option may be. To get a feel for the difference, I conducted several tests, and I thought some of you might be interested in the results.

In these tests, I used two data files--orders.txt and stock.txt, representing a relatively small data file and a relatively large data file, respectively. Their sizes and row counts are as follows:

  • orders.txt: size ~= 130MB and row count =  3,000,000
  • stock.txt:  size ~= 3GB   and row count = 10,000,000

The schema of the two tables are as follows:

CREATE TABLE orders
(
    o_id                int,
    o_d_id              tinyint,
    o_w_id              int,
    o_c_id              int,
    o_entry_d           datetime,
    o_carrier_id        tinyint,
    o_ol_cnt            tinyint,
    o_all_local         tinyint
)
CREATE TABLE stock
(
    s_i_id              int,
    s_w_id              int,
    s_quantity          smallint,
    s_dist_01           char(24),
    s_dist_02           char(24),
    s_dist_03           char(24),
    s_dist_04           char(24),
    s_dist_05           char(24),
    s_dist_06           char(24),
    s_dist_07           char(24),
    s_dist_08           char(24),
    s_dist_09           char(24),
    s_dist_10           char(24),
    s_ytd               int,
    s_order_cnt         smallint,
    s_remote_cnt        smallint,
    s_data              char(50)    
)

For the tests, I ran the attached batch files multiple times. One batch file was for loading orders data, and the other for loading stock data. Each batch file performed Copy/Bulk Insert once, followed by Bulk Insert across Network once. The table below shows the average results:

Option

orders
(smaller data file)

stock
 (larger data file)

Copy/Bulk Insert 52 sec 554 sec
Bulk Insert across Network 42 sec 411 sec

So in this particular test environment, for the orders data file, the Bulk Insert across Network option is ~24% faster than the Copy/Bulk Insert option, and for the stock data file, the Bulk Insert across Network option is ~35% faster than the Copy/Bulk Insert option.

The performance difference between the two options is obviously more pronounced for the larger data file. But I wouldn't want to over-generalize this observation, because a number of factors may influence the results, and these factors may vary from environment to environment. For instance, two key factors are (1) the performance of the drive to which the data file is copied, and (2) whether any database file resides on this same drive. Nevertheless, it is safe to say that the performance difference between the two bulk insert options can be quite significant.

One argument in favor of the Copy/Bulk Insert option that I sometimes hear is that it's safer or more robust to copy the data file to a local disk first as the exposure on the network is shorter. I don't think this is a compelling argument because network connectivity these days is rather reliable, and if it isn't, you've got a bigger problem.

The other argument in favor of the Copy/Bulk Insert option is a permission issue. This sometimes is more compelling because you may be restricted as to what security privileges you can assign to an account.

Published Tuesday, July 24, 2007 12:29 AM by Linchi Shea

Attachment(s): BULK_tests.zip

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Adam Machanic said:

How did you do the bulk insert over the network?  Using a UNC path?

July 24, 2007 7:57 AM
 

Linchi Shea said:

Yes, Adam, using a UNC. The detail is in the attached zip file.

July 24, 2007 9:22 AM
 

Sean said:

Do you have a sample orders.txt and stock.txt?  Only need a few lines for each.

July 24, 2007 2:23 PM
 

Linchi Shea said:

Sean;

I've added two sample data files in the attached zip file.

July 27, 2007 12:18 PM
 

rg said:

Thanks

Here's a way to bulk insert from a dataset to an sql database with parameters

http://www.dataportweb.com/post/Sql-bulk-insert-and-paramaters.aspx

September 19, 2008 1:29 AM

Leave a Comment

(required) 
(required) 
Submit

About Linchi Shea

Checking out SQL Server via empirical data points

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement