THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jamie Thomson

This is the blog of Jamie Thomson, a freelance data mangler in London

Merge Join component sorted outputs [SSIS]

One question that I have been asked a few times of late in regard to performance tuning SSIS data flows is this:

Why isn’t the Merge Join output sorted (i.e.IsSorted=True)?

This is a fair question. After all both of the Merge Join inputs are sorted, hence why wouldn’t the output be sorted as well? Well here’s a little secret, the Merge Join output IS sorted! There’s a caveat though – it is only under certain circumstances and SSIS itself doesn’t do a good job of informing you of it.

Let’s take a look at an example. Here we have a dataflow that consumes data from the [AdventureWorks2008].[Sales].[SalesOrderHeader] & [AdventureWorks2008].[Sales].[SalesOrderDetail] tables then joins them using a Merge Join component:

image

Let’s take a look inside the editor of the Merge Join:

image

We are joining on the [SalesOrderId] field (which is what the two inputs just happen to be sorted upon). We are also putting [SalesOrderHeader].[SalesOrderId] into the output. Believe it or not the output from this Merge Join component is sorted (i.e. has IsSorted=True) but unfortunately the Merge Join component does not have an Advanced Editor hence it is hidden away from us. There are a couple of ways to prove to you that is the case; I could open up the package XML inside the .dtsx file and show you the metadata but there is an easier way than that – I can attach a Sort component to the output. Take a look:

image

Notice that the Sort component is attempting to sort on the [SalesOrderId] column. This gives us the following warning:

Validation warning. DFT Get raw data: {992B7C9A-35AD-47B9-A0B0-637F7DDF93EB}: The data is already sorted as specified so the transform can be removed.

The warning proves that the output from the Merge Join is sorted!

It must be noted that the Merge Join output will only have IsSorted=True if at least one of the join columns is included in the output.

So there you go, the Merge Join component can indeed produce a sorted output and that’s very useful in order to avoid unnecessary expensive Sort operations downstream. Hope this is useful to someone out there!

@Jamiet 

P.S. Thank you to Bob Bojanic on the SSIS product team who pointed this out to me!

Published Monday, January 31, 2011 8:40 PM by jamiet

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

sajkkasjdjas said:

very bad

February 26, 2013 2:22 AM
 

ismailoismail said:

hi ,

my problem is about the incremental load with odbc source and sql server destination .when i update the source , no result is detected  with the merge join and conditionnal split have you please any idea about this task thx.

October 7, 2013 6:27 AM
 

JohnST said:

ismailoismail...

I think I can shed some light on that, it has caught me out more times than I am going to admit...

when the source columns to a merge join change the outputs are unaffected; you need to open the merge join editor again and the new column(s) will be listed under the appropriate side columns but with the checkbox unchecked. To get them to come through you simply check the checkbox and optionally define a new alias for the output column.

JST

btw, Thanks Jamie, very useful to know

March 18, 2014 3:56 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement