THE SQL Server Blog Spot on the Web
Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Alberto Ferrari

Making Fast Load really fast on clustered indexed tables with SSIS

I have been so used to use int identity fields as primary key for any table and to believe it’s the fastest way to define a clustered primary key that I never thought this can cause serious performance problems to SSIS. Until I tested it and discovered that int identity primary keys are among the slowest way to insert huge amount of data with SSIS. In the post I’ll describe the technique that – from my tests – is the fastest way to insert data into tables with clustered index using SSIS.

As a test case I used a very simple table with only two fields:

CREATE TABLE [dbo].[FastLoadTests](
      [Id] [int] Identity NOT NULL,
      [TestString] [varchar](2048) NOT NULL
) ON [PRIMARY]

 

I filled it with a very simple task that generates one million of monotonically increasing Id and TestStrings of exactly 512 bytes each, throwing them into an OleDb destination adapter that uses FastLoad to fill the table.

If the table has no index at all the package runs in 18.5 seconds, pretty nice. Now, I created an index on the table like this:

CREATE UNIQUE CLUSTERED INDEX [FastLoadTests_ClusteredIndex] ON [dbo].[FastLoadTests] ([Id] ASC) ON [PRIMARY]

And run the package once more. The execution time is now 1.08.5 (one minute and eight seconds). It is more or less 4 times slower. Clearly, as SQL has to sort one million rows, I was expecting poor performances. The first trial has been that of reducing the number of rows to sort.

Looking at the various options in the OleDb destination adapter it is easy to find that it has a parameter (Maximun Insert Commit Size) that defaults to 0.

OleDB-MICS

Setting it to 10.000 reduced execution time to 44.5 seconds, a better result but still more than two times slower than the table with no clustered index. Is this the best result? Not at all! Let’s try something different.

Reading (carefully) the MS documentation about the OleDB destination adapter (http://msdn2.microsoft.com/en-us/library/ms141237.aspx) you’ll discover that you can gain performance if the input data is sorted accordingly to the clustered index on the table, specifying the ORDER option with the advanced editor.

After some trials I managed to set it with the advanced editor:

OleDB-ORDER

It has been surprising that, running the package, I got this error:

[Test Table [44]] Error: SSIS Error Code DTS_E_OLEDBERROR.  An OLE DB error has occurred. Error code: 0x80004005. An OLE DB record is available.  Source: "Microsoft SQL Native Client" Hresult: 0x80004005 Description: "Could not bulk load. The sorted column 'Id' is not valid. The ORDER hint is ignored.".

So, it seems that setting this option on an INT IDENTITTY field causes FastLoad to fail. Really surprising because it seems that SQL Server does not know that the keys generated by itself will be sorted! Anyway, removing the Identity setting from the table and using the package generated ID resulted in execution time of 46.23 seconds. No performance gain at all. Anyway as it seemed an interesting way to search into, I went more on trials.

The last and resolving trial test has been that of REMOVING the Maximum Insert Commit Size parameter from the destination adapter. Running the package again I got a result of 19.07 seconds execution time, more or less the same time as inserting the data with no clustered index at all. Got it! Now I have a really fast way to insert data.

The results are summarized in this table

  No Index
0 MICS
Index
10.000 MICS
Index
0 MICS
Int Identity key 18.85 41.15 1.08.57
SSIS generated key 18.59 44.54 1.10.03
SSIS key and ORDER option 18.85 46.23 19.07

*MICS = Maximum Insert Commit Size

You can easily see that using the ORDER option gives a dramatic speed improvement when you have a clustered index but ONLY if you are able to generate the sorting key in the SSIS package and DO NOT USE the MICS parameter.

I think that the documentation for this behavior should have been made far more accessible and that ORDER parameter is so important that should have been shown in the first page of the standard editor for OleDb destination, hope it will be done in the next release of Sql Server.

Moreover, as using MICS parameter leads to a dramatic performance degradation, I think that the OleDb destination adapter should issue a warning if both ORDER and MICS parameter are set so to make the programmer think twice before using them together.

Published Sunday, April 08, 2007 9:48 AM by AlbertoFerrari
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Alberto Ferrari said:

In a previous post I spoke about the advantages of having sorted flows in SSIS to greatly speed up data

July 23, 2007 3:26 AM
 

Alberto Ferrari said:

In a previous post I showed how to make bulk insert faster using SSIS. During Sql Server Conference 2007

July 23, 2007 3:31 AM
 

Bill said:

I plan on testing this, but I have one concern.  1 million rows really isn't that much.  It's fine for testing, but i have a real-world case of dealing with 350 million rows.  I will not leave the max commit size set to 0 becase 350 million rows is simply too much for the DB's transaciton log to handle.

April 25, 2008 4:47 PM
 

AlbertoFerrari said:

Bill,

AFAIK, setting MICS to 0 and using the sorted input, the whole operation is minimally logged, and the pressure on the log is really low.

I would try different values for MICS, including 0, giving it a chance. :)

Moreover, I'd like to know the results of your tests.

April 25, 2008 6:34 PM
 

Soci blog » Blog Archive » SSIS gyors adatbet??lt??s clustered indexes t??bl??ba said:

June 19, 2008 6:56 AM

Leave a Comment

(required) 
(optional)
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement