THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Andy Leonard

Andy Leonard is an author and engineer who enjoys building and automating data integration solutions. Andy is co-host of the Data Driven podcast. Andy is no longer updating this blog. His current blog is

How Many SSIS Packages Should I Create?

I received the following questions (paraphrased) from a friend:
I'm building an ETL process with SSIS in which I pull from about 40 tables in DB2. I put the data into staging tables in a SQL Server database.
Is it better to have one package with 40 data flows, 40 packages with one data flow each or something in between? Or should there be one data flow with a lot of source->destination modules? What the advantages and disadvantages of each approach?
Excellent Questions!
   The answer is: "It depends." That's one of the reasons you're having trouble finding a definitive answer online. It's not a bad question, so let me walk through the factors that would drive my design decisions:
   First, are there dependencies? Are there foreign key relationships in the source database? Is referential integrity enforced (keep in mind there are non-database ways to enforce RI)? If there are dependencies or referential integrity exists (whether it's enforced or not), I load the parents first and then the children.
   Second, I think about Operations: How much visibility do I want into this process? If it's going to take three hours to load, I probably want to break that down into several packages so I can at least see steps in the process completing. This will also assist in troubleshooting: "Package1.dtsx failed" isn't a pleasant message to troubleshoot if it contains 40 Data Flow Tasks. "Load Orders and OrderDetails.dtsx failed" is a much better starting place.
   You've inspired another blog post - thank very much! Please keep the questions coming. And let me know what you decide.
:{> Andy
Published Monday, December 28, 2009 8:00 AM by andyleonard

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS



Steve said:

for DB2, this is what I do :

I have 2 packages, an incremental and full load. And then about 5-6 other packages that just execute the full/incremental and pass in the db2 table name and it pulls it down. I group the packages by db2 tables in JDE (that is the source system I pull from), so something like, Sales, Inventory, Purchasing, GL, etc

December 28, 2009 7:52 AM

ken ambrose said:

I would add that SSIS/ETL programming shares similiar design issues with conventional procedural language programming: Support reuse, architect for reliability, maintainability, testability.

For example, in the past I have had to create SSIS processing that I was certain could be reused with simple variable or config file changes. In those cases, I put that code in a seperate package so it can be reused easily in many ETL solutions.

December 28, 2009 4:42 PM

brucem said:


In the example above where you would have several packages, do you have a "main package" that runs all the other packages sequentially or do you schedule each package to run independently?



December 29, 2009 1:15 PM

andyleonard said:

Hi Bruce,

  That is an excellent question and the answer is "it depends." I definitely automate some way to execute the packages in a prescribed order. I sometimes use a Parent Package (main package) to execute the Child Packages, sometimes I use a scheduler like SQL Agent and call each Child Package in its own job step.

:{> Andy

December 29, 2009 2:35 PM

Bruce said:

Hi Andy .. I've almost similar question.

I need to extract data from 15 tables.Few tables are big tables (having millions of rows). There is no FK constraint between the tables in source DB. Do i need to add all the flows in one data flow task or should i break them for improved performance.


November 30, 2011 4:11 AM

Vikas said:

This is a very old page but still I want to understand what if my package with 8 data flow task(with some transformations) takes 10 mins to complete(as you have mentioned if it take 3 hrs). but still if I want to go ahead with parent child design then is it ok?. Personally I prefer modular and clean code approach so i would like to use single package per table approach.

What would you suggest?  

October 6, 2015 11:52 PM

andyleonard said:

Hi Vikas,

  Did operations and support, yep. Create a package for each data flow.


October 7, 2015 8:34 AM

Leave a Comment


This Blog



My Latest Book:

Community Awards

Friend of Red Gate

Contact Me


Privacy Statement