If you have some time to kill, you could try to load a tiny amount of data, say ~3GB or 3,000,000 rows, with SqlBulkCopy, but do remember to give the WriteToServer method a very small payload every time it is called. The test results reported in my previous post suggest that once you have started loading, you might as well go for a lunch break.
It turns out wasting space is just as easy. In fact, wasting a great amount of space is made easy with calling SqlBulkCopy.WriteToServer with a small payload. All you need to do is direct that payload towards a heap table.
How much space can you waste? Well, how about SQL Server allocating more than 90GB of the storage space in order to import about 3GB of data?
I ran a series of data loading tests with SqlBulkCopy against the same customer table used in my previous post. Each test loaded 3,000,000 rows into the customer table without any index. The database was pre-allocated 200GB of space, and the customer table was the only user table. The test data was generated in a C# program and added to a DataTable object, which was then passed to the WriteToServer method to be copied to the customer table on the SQL Server 2008 instance. This data loader program can be configured to run the bulk importing either single threaded or in any number of threads concurrently into multiple non-overlapping ranges.
The tests were first divided into two groups. The first group of tests was to load the data in a single thread, and the second group of tests in 10 threads. In each group, tests were run with different levels of payload. More specifically, I tested populating the DataTable object with the following number of rows before calling the WriteToServer method with that object:
2, 5, 10, 20, 50, and 100
Before presenting the results, let me establish the frame of reference with the results from calling WriteToServer with a payload of 2,000 rows:
|
Payload (# of rows) |
Threads |
Duration (second) |
Data (GB) |
Reserved (GB) |
Unused (GB) |
|
2,000 |
1 |
289 |
2.4 |
3.8 |
1.4 |
|
2,000 |
10 |
94 |
2.4 |
3.8 |
1.4 |
Okay, there was a huge performance difference between loading the data in a single thread versus loading the data in 10 threads. But I’d like to turn your attention to the space usage. At the completion of loading the data, I looked at how much space was actually used by the table data, how much space was reserved by SQL Server for the table, and how much space was allocated to the table but not used.
The above table shows that with payload = 2,000 rows, there was 1.4GB of unused space allocated to the table in both cases.
Now take a look at the following table and pay attention to the highlighted cells:
|
Payload (# of rows) |
Threads |
Duration (second) |
Data (GB) |
Reserved (GB) |
Unused (GB) |
|
2 |
1 |
10,228 |
12 |
96 |
84 |
|
5 |
1 |
4,074 |
4.8 |
38 |
33.2 |
|
10 |
1 |
2,206 |
2.4 |
19.2 |
16.8 |
|
20 |
1 |
1,267 |
2.4 |
9.6 |
7.2 |
|
50 |
1 |
574 |
2.4 |
3.8 |
1.4 |
|
100 |
1 |
385 |
2.4 |
3.8 |
1.4 |
|
2 |
10 |
1,998 |
10 |
87 |
77 |
|
5 |
10 |
872 |
4.8 |
38 |
33.2 |
|
10 |
10 |
441 |
2.4 |
19 |
16.8 |
|
20 |
10 |
240 |
2.4 |
9.6 |
7.2 |
|
50 |
10 |
135 |
2.4 |
3.8 |
1.4 |
|
100 |
10 |
105 |
2.4 |
3.8 |
1.4 |
The amount of space wasted was staggering when the payload was 2 rows or 5 rows.
For instance, SQL Server had to allocate a whopping 96GB in order to load the 3,000,000 million rows or about 3GB worth of data into the customer table when WriteToServer had to copy 2 rows at a time. Good news is that this appears to be an anomaly only when the payload was extremely low. As shown in the above two tables, as soon as the payload level was greater than 50, SQL Server did not need to allocate more than 3.8GB of storage space.
Number of threads made little difference in terms of how much space was wasted.
The extra space is quite easy to reclaim by creating a clustered index. The potential problem, however, is that you could run out of disk space during the bulk load if you didn't know about this behavior. In addition, you may not have planned to pre-allocate a lot of free space and bulk loading with a small payload can cause the database file to autogrow, thus further degrading the performance.
So why would SQL Server waste so much space in the case of a very small payload for WriteToServer? I don’t know enough about the internals of the SQL Server space allocation algorithm and/or SqlBulkCopy to offer an answer. But it’s revealing to check how the pages were allocated and used in some of these outrageous cases. For instance, when the payload was 2 rows, each database extent only had a single used page on average, and in each used page only 17% was occupied with the customer table rows.
The space usage picture was slightly better when the payload was 5 rows. While it remained the same that one page was used per extent, the average density of the page was higher at about 42%.
When the payload was bumped up to 50, the used page count per extent increased to 5 and the page density was respectable at about 85%.
No matter what may be the rationale for this behavior, SQL Server does seem to not handle space allocation efficiently when the WriteToServer payload is extremely small.
The solution? Well, just avoid small payload when using SqlBulkCopy. In my own test environment, that means I don’t want the WriteToServer method to keep copying fewer than several hundred rows.