Once upon a time I blogged at http://consultingblogs.emc.com/jamiethomson but that ended in August 2009 when I left EMC. There is a lot of valuable content over there however certain events in the past leave me concerned that that content is not well cared for and I don't have any confidence that it will still exist in the long term. Hence, I have taken the decision to re-publish some of that content here at SQLBlog so over the coming weeks and months you may find re-published content popping up here from time-to-time.
This is the first such blog post in this series in which I talk about the little-known OnPipelineRowsSent event.
An interesting discussion today on the SSIS Forum
made me realise that perhaps not all SSIS developers are aware of the
OnPipelineRowsSent event in SSIS and how valuable it can be, certainly
when debugging data-flows. This isn't all that surprising because Books Online is very sparse in this area. Hence I thought it would make a useful blog topic.
OnPipelineRowsSent
is a type of event in SSIS which means the information that it provides
can be logged by whatever log provider you happen to be using. In
order to understand the information that OnPipelineRowsSent provides it
is critical to understand the buffer architecture of the SSIS data-flow.
To
explain that VERY simply, a buffer is an area of memory that contains
rows of data as they pass through the pipeline. Each data-path in the
data-flow will contain one of more buffers and each of those buffers
(except for the last one) will contain the same number of rows. If you
want to understand more about the buffer architecture of the data-flow
then Kirk Haselden's book has a chapter devoted wholly to it.
Here is what Books Online DOES say about OnPipelineRowsSent:
|
OnPipelineRowsSent
|
Reports the number of rows provided to a component input by a call to the ProcessInput method. The log entry includes the component name.
|
In
slightly plainer english that means that for every buffer of data that
is consumed by a component, you get a log entry. So, if you have (e.g.) a
Derived Column Component that
receives 105000 rows and the size of the buffer is 10000 rows, you'll
get 11 OnPipelineRowsSent log entries for that component (there will
only be 5000 rows in the last buffer). Basically it enables you to know
how many rows are passing through each data-path in the data-flow.
The available information is the same as you get for any eventhandler. i.e.
- event
- computer
- operator
- source
- sourceid
- executionid
- starttime
- endtime
- datacode
- databytes
- message
and here is a portion of a logfile containing those records:
OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data
Flow
Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007
20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow
component as input. : : 1228 : Merge Join Output : 1225 : Union All 1 :
1226 : Union All Input 1 : 9936
OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data
Flow
Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007
20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow
component as input. : : 1475 : Union All Output 1 : 1470 : Sort 3 :
1471 : Sort Input : 9936
OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data
Flow
Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007
20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow
component as input. : : 1228 : Merge Join Output : 1225 : Union All 1 :
1226 : Union All Input 1 : 2085
OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data
Flow
Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007
20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow
component as input. : : 1475 : Union All Output 1 : 1470 : Sort 3 :
1471 : Sort Input : 9936
OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data
Flow
Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007
20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow
component as input. : : 1475 : Union All Output 1 : 1470 : Sort 3 :
1471 : Sort Input : 2085
OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data
Flow
Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007
20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow
component as input. : : 2121 : Sort Output : 2113 : Merge Join 2 : 2114
: Merge Join Left Input : 9936
The important stuff is what comes in the last field, the message so let's break that down. Here is an example message:
Rows
were provided to a data flow component as input. : : 1030 : OLE DB
Source Output : 1025 : Sort 2 : 1026 : Sort Input : 9972
And here's what it is constituted of:
- Rows
were provided to a data flow component as input. - That's consistent in
each message, so to be honest they could have left it out
1030 - ID of the data-path providing the buffer
OLE DB Source Output - Name of the data-path providing the buffer
1025 - ID of the component receiving the buffer
Sort 2 - Name of the component receiving the buffer
1026 - ID of the input receiving the buffer
Sort Input - Name of the input receiving the buffer
9972 - Number of rows in the buffer
As
you can see, this information will enable you to determine exactly how
many rows are output from each component. Thus, if you are not receiving
the expected number of rows at a destination this event will help you
to deduce where you are losing them from.
You may not know this
but you actually encounter OnPipelineRowsSent every time you run a
data-flow within the SSIS Designer in BIDS. Notice how the designer
shows the number of rows passing through the pipeline and through each
component:
How does the designer know how many rows are passing through the
data-flow? Simple! Its because the SSIS Designer consumes the
OnPipelineRowsSent event that the executing package "throws" up and
then presents that information visually as the numbers that you see
flashing in front of your eyes as the data-flow executes.
I think
that just about covers the simple stuff around OnPipelineRowsSent. If
you have any questions then post them here as a comment.
@Jamiet