THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Jorg Klein

Jorg Klein, Microsoft-only BI consultant from the Netherlands, blogging about BI on SQL Server with a focus on SSIS.

SSIS - Decrease your fact table loading time up to 40%

Replace the multiple "Lookup Error Output" Derived Columns and Union All's with a single Derived Column and get a performance boost...

Almost every BI developer needs to perform lookups while loading the Data Warehouse. When loading the fact tables for example, lookups are used to receive the (surrogate) primary keys of the dimension tables that are connected to the fact table.

I, and I think most developers, perform the DWH fact-dimension lookups in the following way:
1. Each fact record contains business keys to the different dimensions.
2. Each dimension record contains a business key and a surrogate key (integer).
3. Each dimension contains one unknown member, with a surrogate key of value 0.
4. Match the dimension business key in the fact record to the business key in the dimension to receive the dimension surrogate key, using a Lookup Component. If the lookup gained no match, point to the unknown member in the dimension (surrogate key 0).
5. Store the fact record in the DWH fact table with the gained surrogate keys of the dimensions.

Loading a fact table in the way described above would typically look like this:


I have tested this solution with 313.341 records. This took 1 minute and 23 seconds.
This way of loading the fact table contains a lot of semi-blocking components: the Union All's. Read more about semi-blocking components in
this blog, posted by me a little while ago. These components cause an unnecessary negative impact on the performance, as you can read in my other blog.

There is a much more efficient way to load a fact table in a datawarehouse:
1. Set the all the Lookup Error Outputs to "Ignore Failure".
2. Delete all the Union All and Derived Column components.
3. Add a Derived Column component to the end of the flow and add the unknown surrogate keys like this:

 der

The data flow now looks like this:

df2

The loading of exactly the same amount of records (313.341) now took just 45 seconds! This is a performance boost of almost 40%. Loading your fact table like this does not only decrease your loading time, it also takes less development time. So a win-win situation!

A minor downside might be that you can't easily see how much records have failed a particular lookup component, but it's not so hard to make a simple (SSRS) report which gives you this overview. It might be a good opportunity to include the information of all your fact tables in one single report, it will give you a much better overview of lookup-failures during the load of your fact tables then all the separate SSIS dataflow tasks.

Published Wednesday, October 22, 2008 11:47 PM by jorg
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

hari said:

i want what are the synchronous and asynchronous trans

August 25, 2010 7:38 AM
 

Dimitris said:

Hello,

nice article.

I want to make a question.

You described the easy-fast solution.In what way do we handle the load of fact in ssis,considering that our dimension has more than 1 skey for one key?(meaning that we have diferrent sk for the same id for diferent time periods).

tnks in advance!

September 2, 2010 5:42 AM
 

Daniel Barkman said:

Do you have a download of this sample?

March 1, 2011 12:31 PM
 

Emine said:

great idea Jorg !!! simple ! yet it saves a lot of time

October 8, 2011 2:55 PM

Leave a Comment

(required) 
(optional)
(required) 
Submit

About jorg

Jorg Klein, Microsoft-only BI consultant from the Netherlands, blogging about BI on SQL Server with a focus on SSIS.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement