THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jamie Thomson

This is the blog of Jamie Thomson, a freelance data mangler in London

[SSIS] OnPipelineRowsSent event

Once upon a time I blogged at http://consultingblogs.emc.com/jamiethomson but that ended in August 2009 when I left EMC. There is a lot of valuable content over there however certain events in the past leave me concerned that that content is not well cared for and I don't have any confidence that it will still exist in the long term. Hence, I have taken the decision to re-publish some of that content here at SQLBlog so over the coming weeks and months you may find re-published content popping up here from time-to-time.

This is the first such blog post in this series in which I talk about the little-known OnPipelineRowsSent event.


An interesting discussion today on the SSIS Forum made me realise that perhaps not all SSIS developers are aware of the OnPipelineRowsSent event in SSIS and how valuable it can be, certainly when debugging data-flows. This isn't all that surprising because Books Online is very sparse in this area. Hence I thought it would make a useful blog topic.

OnPipelineRowsSent is a type of event in SSIS which means the information that it provides can be logged by whatever log provider you happen to be using. In order to understand the information that OnPipelineRowsSent provides it is critical to understand the buffer architecture of the SSIS data-flow.

To explain that VERY simply, a buffer is an area of memory that contains rows of data as they pass through the pipeline. Each data-path in the data-flow will contain one of more buffers and each of those buffers (except for the last one) will contain the same number of rows. If you want to understand more about the buffer architecture of the data-flow then Kirk Haselden's book has a chapter devoted wholly to it.

Here is what Books Online DOES say about OnPipelineRowsSent:

OnPipelineRowsSent

Reports the number of rows provided to a component input by a call to the ProcessInput method. The log entry includes the component name.

In slightly plainer english that means that for every buffer of data that is consumed by a component, you get a log entry. So, if you have (e.g.) a Derived Column Component that receives 105000 rows and the size of the buffer is 10000 rows, you'll get 11 OnPipelineRowsSent log entries for that component (there will only be 5000 rows in the last buffer). Basically it enables you to know how many rows are passing through each data-path in the data-flow.

The available information is the same as you get for any eventhandler.  i.e.

  • event
  • computer
  • operator
  • source
  • sourceid
  • executionid
  • starttime
  • endtime
  • datacode
  • databytes
  • message

and here is a portion of a logfile containing those records:

OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data Flow Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007 20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow component as input. :  : 1228 : Merge Join Output : 1225 : Union All 1 : 1226 : Union All Input 1 : 9936
OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data Flow Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007 20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow component as input. :  : 1475 : Union All Output 1 : 1470 : Sort 3 : 1471 : Sort Input : 9936
OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data Flow Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007 20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow component as input. :  : 1228 : Merge Join Output : 1225 : Union All 1 : 1226 : Union All Input 1 : 2085
OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data Flow Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007 20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow component as input. :  : 1475 : Union All Output 1 : 1470 : Sort 3 : 1471 : Sort Input : 9936
OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data Flow Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007 20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow component as input. :  : 1475 : Union All Output 1 : 1470 : Sort 3 : 1471 : Sort Input : 2085
OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data Flow Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007 20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow component as input. :  : 2121 : Sort Output : 2113 : Merge Join 2 : 2114 : Merge Join Left Input : 9936

The important stuff is what comes in the last field, the message so let's break that down. Here is an example message:

Rows were provided to a data flow component as input. :  : 1030 : OLE DB Source Output : 1025 : Sort 2 : 1026 : Sort Input : 9972

And here's what it is constituted of: 

  • Rows were provided to a data flow component as input. - That's consistent in each message, so to be honest they could have left it out
  • 1030 - ID of the data-path providing the buffer
  • OLE DB Source Output - Name of the data-path providing the buffer
  • 1025 - ID of the component receiving the buffer
  • Sort 2 - Name of the component receiving the buffer
  • 1026 - ID of the input receiving the buffer
  • Sort Input - Name of the input receiving the buffer
  • 9972 - Number of rows in the buffer

As you can see, this information will enable you to determine exactly how many rows are output from each component. Thus, if you are not receiving the expected number of rows at a destination this event will help you to deduce where you are losing them from.

You may not know this but you actually encounter OnPipelineRowsSent every time you run a data-flow within the SSIS Designer in BIDS. Notice how the designer shows the number of rows passing through the pipeline and through each component:

Photobucket

 

How does the designer know how many rows are passing through the data-flow? Simple! Its because the SSIS Designer consumes the OnPipelineRowsSent event that the executing package "throws" up and then presents that information visually as the numbers that you see flashing in front of your eyes as the data-flow executes.

I think that just about covers the simple stuff around OnPipelineRowsSent. If you have any questions then post them here as a comment.

@Jamiet

 

 

Published Friday, August 19, 2011 1:48 PM by jamiet

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

@billinkc said:

James Nix has a handy query that rips the OnPipelineRowsSent apart to show timings between components. It's built for 2005 log but it's a simple replace of sysdtslog90 to sysssislog to make it work with 2008+ http://weblogs.sqlteam.com/jamesn/archive/2008/02/11/60504.aspx

August 19, 2011 11:07 AM
 

jamiet said:

Awesome stuff, thanks Bill.

August 19, 2011 11:17 AM
 

Davide Mauri said:

Another option is to use the DataProfiling option with DTLoggedExec :)

August 20, 2011 4:57 AM
 

SSIS Junkie said:

Once upon a time I blogged at http://consultingblogs.emc.com/jamiethomson but that ended in August 2009

October 25, 2011 4:04 AM
 

SSIS Junkie said:

Once upon a time I blogged at http://consultingblogs.emc.com/jamiethomson but that ended in August 2009

January 29, 2012 1:09 PM
 

satish from india said:

fantastic information.....

October 4, 2012 7:07 AM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement