THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jamie Thomson

This is the blog of Jamie Thomson, a freelance data mangler in London

Dynamic Unpivot : SSIS Nugget

A question on the SSIS forum earlier today asked:

I need to dynamically unpivot some set of columns in my source file. Every month there is one new column and its set of Values. I want to unpivot it without editing my SSIS packages that is deployed

Let’s be clear about what we mean by Unpivot. It is a normalisation technique that basically converts columns into rows. By way of example it converts something like this:

AccountCode Jan Feb Mar
AC1 100.00 150.00 125.00
AC2 45.00 75.50 90.00

into something like this:

AccountCode Month Amount
AC1 Jan 100.00
AC1 Feb 150.00
AC1 Mar 125.00
AC2 Jan 45.00
AC2 Feb 75.50
AC2 Mar 90.00

The Unpivot transformation in SSIS is perfectly capable of carrying out the operation defined in this example however in the case outlined in the aforementioned forum thread the problem was a little bit different. I interpreted it to mean that the number of columns could change and in that scenario the Unpivot transformation (and indeed the SSIS dataflow in general) is rendered useless because it expects that the number of columns will not change from what is specified at design-time.

There is a workaround however. Assuming all of the columns that CAN exist will appear at the end of the rows, we can (1) import all of the columns in the file as just a single column, (2) use a script component to loop over all the values in that “column” and (3) output each one as a column all of its own. Let’s go over that in a bit more detail.

I’ve prepared a data file that shows some data that we want to unpivot which shows some customers and their mythical shopping lists (it has column names in the first row):

clip_image002

We use a Flat File Connection Manager to specify the format of our data file to SSIS:

clip_image004

and a Flat File Source Adapter to put it into the dataflow (no need a for a screenshot of that one – its very basic).

Notice that the values that we want to unpivot all exist in a column called [Groceries]. Now onto the script component where the real work goes on, although the code is pretty simple:

clip_image006

Here I show a screenshot of this executing along with some data viewers. As you can see we have successfully pulled out all of the values into a row all of their own thus accomplishing the Dynamic Unpivot that the forum poster was after.

clip_image008

If you want to run the demo for yourself then I have uploaded the demo package and source file up to my SkyDrive: http://cid-550f681dad532637.skydrive.live.com/self.aspx/Public/BlogShare/20100529/Dynamic%20Unpivot.zip The package was builton SSIS 2008.

Simply extract the two files into a folder, make sure the Connection Manager is pointing to the file, and execute!

Hope this is useful.

@Jamiet

Published Saturday, May 29, 2010 8:10 AM by jamiet

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Siddharth Mehta said:

Hey Jamie,

World is very small :)

Check the below link. I think Andy's solution (which is the same as yours) suits more to the problem your described.

http://sqlblog.com/blogs/andy_leonard/archive/2010/05/18/ssis-design-pattern-loading-variable-length-rows.aspx

May 29, 2010 12:45 PM
 

jamiet said:

Hi Siddharth,

Good to hear from you!

They're for two different scenarios aren't they? Andy's is for when there is some sort of row type identifier and you're splitting the rows into seperate data paths based on that identifier. That's not an Unpivot operation.

Sorry if I've misunderstood.

Thoughts?

-Jamie

May 29, 2010 12:55 PM
 

Siddharth Mehta said:

Hey Jamie,

Sorry for commenting a riddle :)

I meant to say that some data processing might require 2 field unpivoting and some might require more. A param can be included in the design, which can read number of cols required for unpivoting. Then based on the string split logic that you explained, and using the andy's soln considering param (the num of unpivot cols) as the identifier, we can route data to corresponding data path.

So when unpivot is done for 2 cols, data wud come out in that particular data path and when done for 3 cols the same wud come out in corresponding data path. This wud provide a dynamic nature in deciding num of cols to unpivot.

I hope I am more clear this time :)

May 29, 2010 7:27 PM
 

Curtis said:

The link to the demo package and source file is returning a bad file.

The zip file cannot be opened.

Can you investigate please?

June 10, 2010 1:05 PM
 

Curtis said:

I was able to get the file...

I had to go to your blog and navigate to the actual zip file in your 'blogShare/20100529' folder and then use the 'Download' button in Windows Live.

Any other attempt to get the file either thru the url you provided or even right-clicking the zip file on your Windows Live site and trying to save it resulted in a 'bad' zip file. Ugh!

June 10, 2010 1:13 PM
 

jamiet said:

Curtis,

I'm sorry but I can't reproduce the problem. The link works OK for me and once there, yes, hit the Download button.

-Jamie

June 13, 2010 8:04 AM
 

F1_Please said:

Hi Jamie,

How can this be achieved if I have an excel source?

November 18, 2010 1:27 PM
 

jamiet said:

The type of source is irrelevant. The technique will still work.

November 18, 2010 1:41 PM
 

Santiago said:

Hi Jamiet,

Please let me do a constructive critique, as per my understanding, your post doesn't actually solve the situation mentioned on top "Every month there is one new column and its set of Values".

According to it, one month the datasource will look like this:

Customer, Apples, Oranges

Henry,        12,      10

July,          5,       9

Michael,       6,       3

and the next month the datasource will look like this:

Customer, Apples, Oranges, Coffe

Henry,         2,      12,     6

July,          4,      19,     2

Dany,          6,      13,    10

As can be seen there, the columns dinamically change from month to month. In your example, the column is always 'Groceries' and there is no reference about what to do with its set of values.

I am actually trying to get a solution to a slightly different case. In my case, at one point I might have 2 columns (apples and oranges) and the following time I might have only one column (coffe).

As far as I have investigated, the only thing I can do is to use 2 different packages for each case (Given that I can know in advance which case I will need to handle every time).

Santiago

December 2, 2010 9:32 AM
 

Packdude said:

Thanks!  After banging my head on the SSIS unpivot component and then realizing it was too rigid to handle my situation, I stubmled across this post.  It works great and I think I'll be using more of the C# Script Component in the future!

April 4, 2011 7:52 PM
 

Levente said:

Hi Jamie,

Could you suggest something similar for an Excel file?

Thanks

April 18, 2012 9:15 AM
 

Levente said:

I meant how you put multiple source Excel columns into a single destination?

April 18, 2012 9:24 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement