THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jamie Thomson

This is the blog of Jamie Thomson, a freelance data mangler in London

Why is there no CROSS JOIN component in SSIS?

I saw an interesting question on the SSIS MSDN forum yesterday where someone was asking how to do a cartesian product (i.e. a CROSS JOIN) in a SQL Server Integration Services (SSIS) dataflow. Links were provided to various articles that cover this (see Performing a Cross Join (Cartesian Product) in SSIS which explains how you can fool the Merge Join component into doing a CROSS JOIN) after which the original poster replied saying:

Such a long procedure to perform a simple cartesian product. SSIS must really improve from this point of view.

Well apparently others agree because there are two requests on Connect for a Cross Join component (161154 & 127215) - both have been closed with status "Won't Fix". I don't know why this is but I can speculate.

Think about what is being asked for here. A CROSS JOIN operation potentially requires masses of memory (and would even more so for SSIS given that we are typically processing large data volumes). CROSS JOINs work in SQL Server because we have this helpful little thing called [tempdb] and there is no analogous mechanism in SSIS (save for disk spooling). SSIS is deliberately not optimized for mass data spooling - it is optimized for processing small batches of data at a time before passing them off to some destination.

I suspect this is why there is no Cross Join component to go alongside the Merge Join component. The capabilities to achieve a CROSS JOIN operation exist (see link above) but they want you to be deliberate about it - they're not going to give you enough rope to hang yourself with by providing a component that could bring the machine of an unwitting developer to its knees.

As I said this is speculation so don't assume this is the real reason for there being no Cross Join component.

What are your thoughts? Assuming its true is this justification enough for there being no Cross Join component? Should there be one? Do you even need one? Let me know in the comments.

@jamiet 

 

Published Friday, July 08, 2011 11:02 AM by jamiet

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Tim Mitchell said:

I don't think a separate component is required.  I've rarely encountered a need for this functionality in SSIS, and as you mentioned, the Merge Join can be fooled into doing this if necessary.

July 8, 2011 11:42 AM
 

MQ said:

Do other ETL tools like Informatica provide such functionality? If they do, then I think SSIS should as well.

I also think that developers using Cross Join do know its implications and use it with a small number of rows. If they don't, then maybe SSIS could have a little warning dialog. In SQL Server, even though it has tempdb, developers could still cause havoc. I remember in an older job some developers using old style joins but then forgetting to include the equal predicates. SQL Server interpreted that as a cross join and a couple of times brought the server to its knees.

July 8, 2011 12:40 PM
 

Frank Szendzielarz said:

Thoughts:

a) Apparently it already exists, but as a 3rd party component:

http://ssisctc.codeplex.com/

b) A quick google showed that Informatica, like Todd McD, recommends to (full outer) join on a constant.

c) A script transform could be written in C#

d) I think I've really needed to do a cross join about twice ever, since 1992

Total aside though: tell you what I *would* like to see - Control Flow replaced by WF4. Including rehostable designer - Data Flows being standard WF4 activities.I think SSIS should be an extension to WF4

July 8, 2011 1:49 PM
 

Marco Russo (SQLBI) said:

I use a MERGE JOIN to do a CROSS JOIN, by I had to add a column in each flow and I had to define the IsSorted property to true on that dummy column... it works, but a CROSS JOIN would be nice also to better "declare" the original programmer intent.

Marco

July 8, 2011 1:56 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement