THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jorg Klein

Microsoft Data Platform MVP from the Netherlands

SSIS - Decrease your fact table loading time up to 40%

You can read this blog post at this link: https://jorgklein.com/2008/10/02/ssis-decrease-your-fact-table-loading-time-up-to-40

This blog has moved to www.jorgklein.com There will be no further posts on sqlblog.com. Please update your feeds accordingly.

You can follow me on twitter: http://twitter.com/jorg__klein

Published Wednesday, October 22, 2008 11:47 PM by jorg

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

hari said:

i want what are the synchronous and asynchronous trans

August 25, 2010 7:38 AM
 

Dimitris said:

Hello,

nice article.

I want to make a question.

You described the easy-fast solution.In what way do we handle the load of fact in ssis,considering that our dimension has more than 1 skey for one key?(meaning that we have diferrent sk for the same id for diferent time periods).

tnks in advance!

September 2, 2010 5:42 AM
 

Daniel Barkman said:

Do you have a download of this sample?

March 1, 2011 12:31 PM
 

Emine said:

great idea Jorg !!! simple ! yet it saves a lot of time

October 8, 2011 2:55 PM
 

Ritz said:

How do you handle Late Arriving Dimensions? Do you insert Inferred Dimensions during fact load?

August 17, 2012 1:35 AM
 

Ritz said:

To be able to get the most current surrogate key, need to use SQL query in the Lookup Item to select only the Active/current record, else you will start getting multiple surrogate keys if the records have historical data

August 17, 2012 1:37 AM
 

Gary said:

Just FYI...I haven't tested this exact number of rows nor do I have your table layout so can't repeat this test but I'd bet money that if you simply outer joined all thee dim tables to the fact query in the source of the dataflow using CASE WHEN statements to substitute the unknown dimension rows, that the runtime would drop to something like 1 second. I'm guessing your rows and dimensions are very small with that kind of throughput since you're driving those rows through memory of the ETL tool through multiple tasks. My tests of that with data I have was about as fast as watching paint dry. When I pushed it to the database it screamed.

Rule #1 I tell all new ETL developers is - never ever make the ETL tool do something that can be done exponentially faster in the database if you need scalability. That applies to all ETL tools. Course if you're only loading 300K rows, you can probably do anything you want and who cares?  If you're loading a billion rows like I do, the above scenario is going to take a mighty long time.

May 7, 2013 1:07 AM

Leave a Comment

(required) 
(required) 
Submit

About jorg

Jorg Klein, Microsoft Data Platform MVP from the Netherlands.
Privacy Statement