A while back, I was asked which of the following two bulk insert options would be faster:
- Copy/Bulk Insert: First copy the data file from server B to server A, and then bulk insert the data into a table on server A
- Bulk Insert across Network: Directly bulk insert the data from server B into the table on server A
Intuitively, the first option--Copy/Bulk Insert--would be slower than the second option--Bulk Insert across Network, because the Copy/Bulk Insert option would write the data to disk twice, first for the copy operation and the second time for the bulk insert operation, and read the data file from disk twice, first from server B and then from server A.
So the question is really not which one is faster, but how much faster the second option may be. To get a feel for the difference, I conducted several tests, and I thought some of you might be interested in the results.
In these tests, I used two data files--orders.txt and stock.txt, representing a relatively small data file and a relatively large data file, respectively. Their sizes and row counts are as follows:
- orders.txt: size ~= 130MB and row count = 3,000,000
- stock.txt: size ~= 3GB and row count = 10,000,000
The schema of the two tables are as follows:
CREATE TABLE orders
(
o_id int,
o_d_id tinyint,
o_w_id int,
o_c_id int,
o_entry_d datetime,
o_carrier_id tinyint,
o_ol_cnt tinyint,
o_all_local tinyint
)CREATE TABLE stock
(
s_i_id int,
s_w_id int,
s_quantity smallint,
s_dist_01 char(24),
s_dist_02 char(24),
s_dist_03 char(24),
s_dist_04 char(24),
s_dist_05 char(24),
s_dist_06 char(24),
s_dist_07 char(24),
s_dist_08 char(24),
s_dist_09 char(24),
s_dist_10 char(24),
s_ytd int,
s_order_cnt smallint,
s_remote_cnt smallint,
s_data char(50)
)
For the tests, I ran the attached batch files multiple times. One batch file was for loading orders data, and the other for loading stock data. Each batch file performed Copy/Bulk Insert once, followed by Bulk Insert across Network once. The table below shows the average results:
| Option |
orders (smaller data file) |
stock (larger data file) |
| Copy/Bulk Insert |
52 sec |
554 sec |
| Bulk Insert across Network |
42 sec |
411 sec |
So in this particular test environment, for the orders data file, the Bulk Insert across Network option is ~24% faster than the Copy/Bulk Insert option, and for the stock data file, the Bulk Insert across Network option is ~35% faster than the Copy/Bulk Insert option.
The performance difference between the two options is obviously more pronounced for the larger data file. But I wouldn't want to over-generalize this observation, because a number of factors may influence the results, and these factors may vary from environment to environment. For instance, two key factors are (1) the performance of the drive to which the data file is copied, and (2) whether any database file resides on this same drive. Nevertheless, it is safe to say that the performance difference between the two bulk insert options can be quite significant.
One argument in favor of the Copy/Bulk Insert option that I sometimes hear is that it's safer or more robust to copy the data file to a local disk first as the exposure on the network is shorter. I don't think this is a compelling argument because network connectivity these days is rather reliable, and if it isn't, you've got a bigger problem.
The other argument in favor of the Copy/Bulk Insert option is a permission issue. This sometimes is more compelling because you may be restricted as to what security privileges you can assign to an account.