THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jorg Klein

Microsoft Data Platform MVP from the Netherlands

SSIS – Package design pattern for loading a data warehouse

You can read this blog post at this link: https://jorgklein.com/2010/01/02/ssis-package-design-pattern-for-loading-a-data-warehouse

This blog has moved to www.jorgklein.com There will be no further posts on sqlblog.com. Please update your feeds accordingly.

You can follow me on twitter: http://twitter.com/jorg__klein

Published Monday, January 4, 2010 3:41 PM by jorg

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Siddharth Mehta said:

Hi Jorg,

Nice article and examples. I agree with the design of master-child implementation and using package variables to control the execution. But I am slightly of a different opinion in implementing the T and L part.

I feel we do not need a separate package all together for T and L. With a proper design, for ex. using package variables to decide whether to stage data or directly load them after transform, and using variables values from config to pass hints whether to load data from staged area or the stream, same can be implemented in one package. More layering by creating more packages for a single ETL cycle of an entity, also means that if different developers are working on it, they need a standard interface for passing data. Also separting T & L so much for each and every package makes staging inevitable after transform.

Extract, Transform & Load, Cube processing definitely needs separate package and controlling the same thru a master package is definitely an effective design.

January 4, 2010 10:22 AM
 

jorg said:

Hi Siddharth,

Thanks for reading and commenting!

I agree with your comment and I believe we share the same ideas.

You say you have a slightly different opinion in implementing the T and L part, can you clarify this some more?

You say we don't need a separate package for T and L and I agree with that, that's why I want to use a single package with all phases in it. So I wonder on which specific part your opinion is different, as it looks like we say the same here.

Thanks,

Jorg

January 4, 2010 10:54 AM
 

Siddharth Mehta said:

Sorry, I missed the detail. I understand that you are including all the phases in one package, and executing the same package 4 times, and each time you execute a different phase from the same package.

The tricky part of this design is staging becomes inevitable to achieve flexibility of execution of each phase E/T/L at will. But this seems a valid trade-off, compared to the flexibility achieved.

Yes you are correct. We share the same ideas, it's just I thought it was a different package for E/T/L/P part for a single dim/fact i.e 4 packages per entity. Now I feel it's even better than I thought it was :)

January 4, 2010 11:48 AM
 

Iman said:

Hi Jorg,

Excellent design. I tried to find a problem with that but yet no luck!

Thanks for sharing.

Iman

January 4, 2010 11:47 PM
 

Pieter van Maasdam said:

Hi Jorg,

Nice article! I am only wondering about the Extract step. How do you implement the extract of a table that is used in multiple dimensions? Do you keep track of the tables you already extracted or do you simply extract them multiple times (which would mean doing things more than necessary)

Regards,

Pieter

January 5, 2010 3:24 AM
 

jorg said:

Hi Pieter,

Nice to hear that you like the article and thanks for the useful comment!

I would say that's typically logic that should be handled in the master package. So I would still create the dimension packages as stand alone "objects" that have all their logic in it and handle things like this from the master package. How I would handle it exactly depends on the specific situation, it can be as easy as running only the extract of one of the two dimensions. If it's more complicated you can create some more variables to handle this, just like Siddharth already suggested. I think there are many ways to solve challenges like that.

Thanks,

Jorg

January 5, 2010 4:38 AM
 

Jorg Klein said:

Since my last blog post about a SSIS package design pattern I’ve received quite some positive reactions

February 28, 2010 11:36 AM
 

pandkothia said:

Jorg,

ETL steps in all my child packages have a OLEDB source (E), a Data Conversion Task (T), and an ADO.Net Destination (L). I was trying to seperate them in to the ETLP blocks of the child package but couldn't figure it out. I looked in to your SQL2008 example but except for the blocks in the control flow section I couldn't find any tasks in the data flow section. I am new to this and totally confused. Do you have a complete example that I can refer to?

March 10, 2010 11:03 AM
 

pandkothia said:

Please disregard my previous question. I figured it out after reading the part-2.

The following statement should have been made in part-1 so it's very clear for starters like me. Overall I really appreciate your good work.

I personally always use four Sequence Containers in my SSIS packages:

- SEQ Extract (extract the necessary source tables to a staging database)

- SEQ Transform (transform these source tables to a dimension or fact table)

- SEQ Load (load this table into the data warehouse)

- SEQ Process (process the data warehouse table to the cube)

March 10, 2010 12:40 PM
 

jorg said:

Hi Pandkothia,

It's nice to hear that you, as a starter, benefit from my blog posts :-)

I have added a link to part-2 of the article in the first part so others won't be confused. Thanks for bringing this to my attention.

--Jorg

March 11, 2010 8:41 AM
 

Jesse said:

I like to use one package for all of my dimensions and facts.

April 26, 2010 6:29 PM
 

00 said:

Guys, could someone describe to me exactly what goes in the Transform part of this exercise? For me I get the data out of the source system and store in a cache table (extract) then I have a kimball method component which loads the data into the dimension/fact table (Load). The query that the kimball method component takes is in an sql query which effectively does the "transform" part, but this is transitory. I presume that to have a transform layer you would store the output of the kimball method component into a number of places and then in the load simply merge the data in. Could someone give me an idea what the benefit of this extra storage step is as for me I can't see any?

November 12, 2010 10:08 AM
 

sandy said:

Hi Jorg,

Great article. I was thinking how transactions/checkpoint can be handled in this design while addressing re-run, when there is a failure in E/T/L/P?

~Sandy.

November 24, 2010 9:40 AM
 

Daniel Barkman said:

The download links are broken! All I got was a corrupted zip file. BTW, I tried both 2005 and 2008.

March 1, 2011 12:25 PM
 

Daniel Barkman said:

Never mind! I was trying to "Save Link As". I clicked on the link and it took me to a Skydrive page, where the download was indeed available. Thanks Jorg!

March 1, 2011 12:33 PM
 

jeba said:

nice..

July 3, 2011 1:05 AM
 

Jorg Klein said:

Last week Microsoft released CTP3 of SQL Server Integration Services (SSIS), code name: Denali. In this

July 22, 2011 12:44 PM
 

maddy said:

Hi. I am unable to download the files .. i want to see running thoses packages..Can you please send me 2008 version to  maheshdare@gmail.com

November 11, 2011 9:02 PM
 

prash said:

Iam new to ssis,developing project me only so Please send source code to my mail id

prashanthk@clearpractice.com

thanks

prashanth

November 17, 2011 1:25 AM
 

khaled mousa said:

its really an interesting design , but how its one package per dimension and i have to make same package 4 times , Customer_Extract.dtsx , Customer_Transfer.dtsx , Customer_Load.dtsx

Customer_Process.dtsx , and if that right if i have 4 packages then i have to create 16 packages , do you have any idea how we can reduce these number of packages

November 30, 2011 9:55 AM
 

anarchipur said:

Hi here my question:

As far as I understand SSIS sequences in the model above, in every sequence (for examople TRANSFORM) the packages execute multanioously. Here you have just a couple of dimensions and facts. But what if you have 70 dimensions and 10 facts, does this reduce performance when all dims and facts are being processed simultaneously? And second, what about the readability of such a big SSIS project?

Thanks!

January 16, 2012 10:04 AM
 

Test User said:

Excellent

January 31, 2012 6:12 AM
 

Matt Warwick said:

I love the design with regards to seperating out the phases of the ETL process as well as the ability to control different aspects through configuration - my only concern would be the unwieldiness of deployments and configurations if the warehouse contains a large number of facts/dimensions

February 1, 2012 11:10 PM
 

Gayu said:

Hi Jorg,

The post is awesome. This is the first time I even heard about SSIS for the purpose of my project and I followed your method. I was able to easily do it considering this is my first time. Everything works well for me except when I use "Derived Column" component to see what is loaded into my Fact table the keys are "Missing Reference". My Master package structure is exactly like yours except that I have two different sets of Dim and Fact packages. I created the variables exactly like yours and connected the two child packages with the parent one. I am not sure where i went wrong. Will you be able to help me in this? Thanks a lot in advance

April 16, 2013 4:00 AM
 

Gayu said:

Oh i forgot to mention. This problem only occurs when i try to run individual sequences from the master package or run the entire sequence from the master package. When i run the child packages individually its perfect! Thanks

April 16, 2013 4:14 AM
 

Steve said:

How do you justify your comment that having a smaller number of packages improves source code control?  We have split up our packages to be very granular just because having larger packages makes it difficult for multiple developers to work on the same project at the same time (the file format for dtsx makes it almost impossible to logically merge two changed dtsx files in a source code control system)

May 9, 2013 6:00 PM
 

Pavel Janečka said:

Hi Jorg,

thank you for sharing this design pattern, I have begun implementing it with the project deployment model and so far it is a very feasible solution!

June 11, 2013 9:46 AM
 

Darius said:

I learned a lot . Thank you for your time and effort :).

September 18, 2013 12:46 PM
 

eveready said:

I've been involved in developing both 1 monster package and granulated packages. In my own experience based on enterprise projects, the granulated packages works best.

April 3, 2014 10:03 AM
 

Donald said:

The links are dead ... if anybody has 2008.zip please forward to forbes400@hotmail.com.  Thanks.

April 16, 2014 2:16 PM
 

K4TIMINI said:

Hi,

I love your package design suggestion.

However, what "INSERT/UPDATE" pattern that best fit this design to load data efficiently ? Using a stage table ?

There are different ways to do that in regards to the following article : http://bisherryli.com/2012/09/26/ssis-114-how-many-ways-can-you-do-insert-update/

So what do you suggest us.

In my case of studies, there are 20+ DimTables and approximatly 10 FactTables.

TY.

October 8, 2014 10:01 AM
 

JoeK said:

Jorg,

I have 180 packages in my solution. I have a Master package to call them all, but was concerned about that many in one container, so when I generated the Master package with BIML, I assigned them in sensible groups to different containers and sequenced those by dependency.

I am still worries about having up to 20 packages all running together in one container. Is there guidance of how to configure that to run robustly? Have not seen any so far ?

Tks for any help.

JK

June 4, 2015 11:49 AM
 

Chris said:

I'm in a similar situation as JoeK - we have 100+ packages that are called in each of the E-T-L-P phases of the master package.  We're using SQL Server 2014, with basic logging in SSISDB. Even with basic logging, this architecture slows down significantly when calling (and re-calling) this many sub-packages.  The bottleneck is in writing all of the messages to the SSISDB tables.

In 2014, unfortunately the only real work-around appears to be turning off the built in SSIS logging feature and capturing the errors by other means in your SSIS packages.  SQL Server 2016 allows custom logging levels to be set up (ex. only record errors and warnings) which should also alleviate the issue.

April 11, 2016 12:59 AM
 

Darren said:

Hi,

Really liked the post, how would this be applied with the new Project Deployment Model as i'm trying to integrate it into it.

I have tried converting the example package from 2008 but if i run the extract package all the packages of the child run not just the execute package.

July 18, 2017 11:22 AM
 

Terry Thillemann said:

Thanks for sharing!

During our ETL process, I'm redirecting failed rows from Extract (truncation) and Transform (truncation/data conversion) to staging tables and flat files.  I've not figured out how to manage cleansing/re-processing the data yet but I'd like to re-use existing package/code if possible by using your method; however, I'd like to be able to execute the package from command line, DTEXEC, with parameters rather than having to develop parent packages for all my interfaces.  

I've not been able to figure out how to pass a parameter as a parent variable ... any idea if this is possible?

August 3, 2017 7:28 PM

Leave a Comment

(required) 
(required) 
Submit

About jorg

Jorg Klein, Microsoft Data Platform MVP from the Netherlands.
Privacy Statement