I saw an interesting question on the SSIS MSDN forum yesterday where someone was asking how to do a cartesian product (i.e. a CROSS JOIN) in a SQL Server Integration Services (SSIS) dataflow. Links were provided to various articles that cover this (see Performing a Cross Join (Cartesian Product) in SSIS which explains how you can fool the Merge Join component into doing a CROSS JOIN) after which the original poster replied saying:
Such a long procedure to perform a simple cartesian product. SSIS must really improve from this point of view.
Well apparently others agree because there are two requests on Connect for a Cross Join component (161154 & 127215) - both have been closed with status "Won't Fix". I don't know why this is but I can speculate.
Think about what is being asked for here. A CROSS JOIN operation potentially
requires masses of memory (and would even more so for SSIS given that we
are typically processing large data volumes). CROSS JOINs work in SQL Server because we have
this helpful little thing called [tempdb] and there is no analogous mechanism in SSIS (save for disk spooling). SSIS is deliberately not optimized for mass data spooling - it is optimized for processing small
batches of data at a time before passing them off to
some destination.
I suspect this is why there is no Cross Join component to go alongside the Merge Join component. The capabilities to
achieve a CROSS JOIN operation exist (see link above) but they want you to be
deliberate about it - they're not going to give you enough rope to hang
yourself with by providing a component
that could bring the machine of an unwitting developer to its knees.
As I said this is speculation so don't assume this is the real reason for there being no Cross Join component.
What are your thoughts? Assuming its true is this justification enough for there being no Cross Join component? Should there be one? Do you even need one? Let me know in the comments.
@jamiet