THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Andy Leonard

Andy Leonard is CSO of Linchpin People and SQLPeople, an SSIS Trainer, Consultant, and developer; a Business Intelligence Markup Language (Biml) developer; SQL Server database and data warehouse developer, community mentor, engineer, and farmer. He is a co-author of SQL Server 2012 Integration Services Design Patterns. His background includes web application architecture and development, VB, and ASP. Andy loves the SQL Server Community!
Note: Comments are moderated. Spam shall not pass! </GandalfVoice>

T-SQL Tuesday: Aggregations in SSIS

 

Introduction

Jes Borland (Blog | @grrl_geek) is hosting this month's T-SQL Tuesday - started by SQLBlog's own Adam Machanic (Blog | @AdamMachanic) - and it is about aggregation.

I thought I'd show a couple ways to do aggregation using SSIS.

The Aggregate Transformation in SSIS

The Aggregate transform in SSIS is fast. I built an SSIS package (AggregateScripts.dtsx) with two Data Flow Tasks (Using the Aggregate Transform and Using a Script Component).  Using the Aggregate Transform looks like this:

 

The OLE DB Source is pulling data from the AdventureWorks (2005) Sales.SalesOrderDetail table. The Aggregate transform is configured to Sum the LineTotal amounts:

When the package executes, the Sum is the only output from the Aggregate Transformation:

And the package executes in 0.421 seconds in BIDS. This rocks if all I'm after is the Sum but let's face it - I can get the Sum of the LineTotal column pretty quickly in T-SQL: SELECT Sum(LineTotal) FROM Sales.SalesOrderDetail.

Also, what if I want a running total? One way to achieve this is...

Using a Script Component

For the Using a Script Component data flow task, I select from the same source (AdventureWorks.Sales.SalesOrderDetail). I use a data flow path to pipe the rows into a Script Component. Accepting the default (synchronous) relationship between the input and output buffers, I select the LineTotal on the Inputs page:

On the Inputs and Outputs page, I create a new output column named RunningTotal (Decimal, 38, 6):

On to the script! I see folks struggle with scripting in SSIS. Variable scope can be tricky. Let's see what happens when we put all the code in the ProcessInputRow method:

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

        Dim dTotal As Decimal

        dTotal = dTotal + Row.LineTotal

        Row.RunningTotal = dTotal

    End Sub

Let's see what we get when we execute in BIDS:

The LineTotal value is merely copied into the RunningTotal column each time. That's not a running total. What's the problem? Variable scope. The script variable dTotal is being created and populated for each row flowing through the Script Component.

We need for it to be created once and aggregated each time a row flows through the Script Component. To accomplish this, we need to move the Dim statement from inside the buffer's ProcessInputRows method - we need to change the scope from ProcessInputRows to the ScriptMain class:

Public Class ScriptMain
    Inherits UserComponent

    Dim dTotal As Decimal

...

    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

        dTotal = dTotal + Row.LineTotal

        Row.RunningTotal = dTotal

    End Sub

Now when we execute it, the script variable dTotal is created once at the class-scope and poopulated (repeatedly) for each row that flows through the buffer's ProcessInputRow method. The results:

A running total!

Conclusion

As always, there are lots of cool ways to get at the data you seek. This isn't the only way and I'm sure there are better ways to achieve the same results.

Happy T-SQL Tuesday!

:{>

Published Tuesday, March 08, 2011 3:00 PM by andyleonard
Filed under: , ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

jonmcrawford said:

"...is created once at the class-scope and poopulated (repeatedly) for each row that flows through ..."

That sounds...messy

March 8, 2011 2:20 PM
 

Naomi said:

Nice!!!!

March 8, 2011 10:08 PM
 

Peso said:

Can you guarantee the order of records from the OLD DB Source, or do we have to put a SORT in between?

March 9, 2011 7:33 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

My Company


Other Blog

Check out my personal blog...
http://andyleonard.me

Contact Me

Twitter: @AndyLeonard
Email: andy.leonard@gmail.com

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement