THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Andy Leonard

Andy Leonard is CSO of Linchpin People and SQLPeople, an SSIS Trainer, Consultant, and developer; a Business Intelligence Markup Language (Biml) developer; SQL Server database and data warehouse developer, community mentor, engineer, and farmer. He is a co-author of SQL Server 2012 Integration Services Design Patterns. His background includes web application architecture and development, VB, and ASP. Andy loves the SQL Server Community!
Note: Comments are moderated. Spam shall not pass! </GandalfVoice>

How Many SSIS Packages Should I Create?

Introduction
 
I received the following questions (paraphrased) from a friend:
 
I'm building an ETL process with SSIS in which I pull from about 40 tables in DB2. I put the data into staging tables in a SQL Server database.
Is it better to have one package with 40 data flows, 40 packages with one data flow each or something in between? Or should there be one data flow with a lot of source->destination modules? What the advantages and disadvantages of each approach?
 
Excellent Questions!
 
   The answer is: "It depends." That's one of the reasons you're having trouble finding a definitive answer online. It's not a bad question, so let me walk through the factors that would drive my design decisions:
 
   First, are there dependencies? Are there foreign key relationships in the source database? Is referential integrity enforced (keep in mind there are non-database ways to enforce RI)? If there are dependencies or referential integrity exists (whether it's enforced or not), I load the parents first and then the children.
 
   Second, I think about Operations: How much visibility do I want into this process? If it's going to take three hours to load, I probably want to break that down into several packages so I can at least see steps in the process completing. This will also assist in troubleshooting: "Package1.dtsx failed" isn't a pleasant message to troubleshoot if it contains 40 Data Flow Tasks. "Load Orders and OrderDetails.dtsx failed" is a much better starting place.
 
   You've inspired another blog post - thank very much! Please keep the questions coming. And let me know what you decide.
 
:{> Andy
Published Monday, December 28, 2009 8:00 AM by andyleonard

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Steve said:

for DB2, this is what I do :

http://blog.stevienova.com/2009/05/20/etl-method-fastest-way-to-get-data-from-db2-to-microsoft-sql-server/

I have 2 packages, an incremental and full load. And then about 5-6 other packages that just execute the full/incremental and pass in the db2 table name and it pulls it down. I group the packages by db2 tables in JDE (that is the source system I pull from), so something like, Sales, Inventory, Purchasing, GL, etc

December 28, 2009 7:52 AM
 

ken ambrose said:

I would add that SSIS/ETL programming shares similiar design issues with conventional procedural language programming: Support reuse, architect for reliability, maintainability, testability.

For example, in the past I have had to create SSIS processing that I was certain could be reused with simple variable or config file changes. In those cases, I put that code in a seperate package so it can be reused easily in many ETL solutions.

December 28, 2009 4:42 PM
 

brucem said:

Andy,

In the example above where you would have several packages, do you have a "main package" that runs all the other packages sequentially or do you schedule each package to run independently?

Thanks,

Bruce

December 29, 2009 1:15 PM
 

andyleonard said:

Hi Bruce,

  That is an excellent question and the answer is "it depends." I definitely automate some way to execute the packages in a prescribed order. I sometimes use a Parent Package (main package) to execute the Child Packages, sometimes I use a scheduler like SQL Agent and call each Child Package in its own job step.

:{> Andy

December 29, 2009 2:35 PM
 

Bruce said:

Hi Andy .. I've almost similar question.

I need to extract data from 15 tables.Few tables are big tables (having millions of rows). There is no FK constraint between the tables in source DB. Do i need to add all the flows in one data flow task or should i break them for improved performance.

Thanks

November 30, 2011 4:11 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

My Company


Other Blog

Check out my personal blog...
http://andyleonard.me

Contact Me

Twitter: @AndyLeonard
Email: andy.leonard@gmail.com

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement