<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tags 'SQL Server' and 'dataflow'</title><link>http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=SQL+Server,dataflow&amp;orTags=0</link><description>Search results matching tags 'SQL Server' and 'dataflow'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>[SSIS] OnPipelineRowsSent event</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2011/08/19/ssis-onpipelinerowssent-event.aspx</link><pubDate>Fri, 19 Aug 2011 11:48:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:37927</guid><dc:creator>jamiet</dc:creator><description>&lt;blockquote&gt;
&lt;p&gt;&lt;i&gt;Once upon a time I blogged at &lt;a href="http://consultingblogs.emc.com/jamiethomson" target="_blank"&gt;http://consultingblogs.emc.com/jamiethomson&lt;/a&gt; but that ended in August 2009 when I left EMC. There is a lot of valuable content over there however certain events in the past leave me concerned that that content is not well cared for and I don't have any confidence that it will still exist in the long term. Hence, I have taken the decision to re-publish some of that content here at SQLBlog so over the coming weeks and months you may find re-published content popping up here from time-to-time.&lt;/i&gt;&lt;/p&gt;&lt;p&gt;&lt;i&gt;This is the first such blog post in &lt;a href="http://sqlblog.com/blogs/jamie_thomson/archive/tags/repost/default.aspx" target="_blank"&gt;this series&lt;/a&gt; in which I talk about the little-known OnPipelineRowsSent event.&lt;/i&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;/blockquote&gt;

&lt;p&gt;An &lt;a href="http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1317964&amp;amp;SiteID=1"&gt;interesting discussion&lt;/a&gt; today on the &lt;a href="http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=80&amp;amp;SiteID=1"&gt;SSIS Forum&lt;/a&gt;
 made me realise that perhaps not&amp;nbsp;all SSIS developers&amp;nbsp;are aware of the 
OnPipelineRowsSent event in SSIS and how valuable it can be, certainly 
when debugging data-flows. This isn't all that surprising because &lt;a href="http://msdn2.microsoft.com/en-us/library/ms141026.aspx"&gt;Books Online&lt;/a&gt;&amp;nbsp;is very sparse in this area. Hence I thought it would make a useful blog topic.&lt;/p&gt;

&lt;p&gt;OnPipelineRowsSent
 is a type of&amp;nbsp;event in SSIS which means the&amp;nbsp;information that it provides
 can be logged&amp;nbsp;by whatever log provider you happen to be using.&amp;nbsp;In 
order&amp;nbsp;to understand the information that OnPipelineRowsSent provides it 
is critical to understand the buffer architecture of the &lt;a href="http://msdn2.microsoft.com/en-us/library/ms141122.aspx"&gt;SSIS data-flow&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To
 explain&amp;nbsp;that&amp;nbsp;VERY simply, a buffer is an area of memory that contains 
rows of data as they pass through the pipeline. Each data-path in the 
data-flow&amp;nbsp;will contain one of more buffers and each of those buffers 
(except for the last one) will contain the same number of rows. If you 
want to understand more about the buffer architecture of the data-flow 
then &lt;a href="http://www.amazon.com/Microsoft-Server-2005-Integration-Services/dp/0672327813"&gt;Kirk Haselden's book&lt;/a&gt; has a chapter devoted wholly to it.&lt;/p&gt;

&lt;p&gt;Here is what Books Online &lt;a href="http://msdn2.microsoft.com/en-us/library/ms141122.aspx"&gt;DOES say&lt;/a&gt; about OnPipelineRowsSent:&lt;/p&gt;

&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;OnPipelineRowsSent&lt;/b&gt; &lt;/p&gt;
&lt;/td&gt;

&lt;td&gt;
&lt;p&gt;Reports the number of rows provided to a component input by a call to the &lt;b&gt;ProcessInput&lt;/b&gt; method. The log entry includes the component name. &lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;In
 slightly plainer english that means that for every buffer of data that 
is consumed by a component, you get a log entry. So, if you have&amp;nbsp;(e.g.) a
 &lt;a href="http://msdn2.microsoft.com/en-us/library/ms141069.aspx"&gt;Derived Column Component&lt;/a&gt;&amp;nbsp;that
 receives 105000&amp;nbsp;rows and the size of the buffer is 10000 rows, you'll 
get 11 OnPipelineRowsSent log entries for that&amp;nbsp;component&amp;nbsp;(there will 
only be 5000 rows in the last buffer).&amp;nbsp;Basically it enables you to know 
how many rows are passing through each data-path in the data-flow.&lt;/p&gt;

&lt;p&gt;The available information is the same as you get for any eventhandler. &amp;nbsp;i.e. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;event&lt;/li&gt;

&lt;li&gt;computer&lt;/li&gt;

&lt;li&gt;operator&lt;/li&gt;

&lt;li&gt;source&lt;/li&gt;

&lt;li&gt;sourceid&lt;/li&gt;

&lt;li&gt;executionid&lt;/li&gt;

&lt;li&gt;starttime&lt;/li&gt;

&lt;li&gt;endtime&lt;/li&gt;

&lt;li&gt;datacode&lt;/li&gt;

&lt;li&gt;databytes&lt;/li&gt;

&lt;li&gt;message&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and here is a portion of a logfile containing those records:&lt;/p&gt;

&lt;p&gt;&lt;font face="courier new"&gt;&lt;font color="#ff0000"&gt;OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data
 Flow 
Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007
 20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow 
component as input. :&amp;nbsp; : 1228 : Merge Join Output : 1225 : Union All 1 :
 1226 : Union All Input 1 : 9936&lt;br&gt;&lt;/font&gt;&lt;font color="#0000ff"&gt;OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data
 Flow 
Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007
 20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow 
component as input. :&amp;nbsp; : 1475 : Union All Output 1 : 1470 : Sort 3 : 
1471 : Sort Input : 9936&lt;br&gt;&lt;/font&gt;&lt;font color="#ff0000"&gt;OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data
 Flow 
Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007
 20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow 
component as input. :&amp;nbsp; : 1228 : Merge Join Output : 1225 : Union All 1 :
 1226 : Union All Input 1 : 2085&lt;br&gt;&lt;/font&gt;&lt;font color="#0000ff"&gt;OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data
 Flow 
Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007
 20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow 
component as input. :&amp;nbsp; : 1475 : Union All Output 1 : 1470 : Sort 3 : 
1471 : Sort Input : 9936&lt;br&gt;&lt;/font&gt;&lt;font color="#ff0000"&gt;OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data
 Flow 
Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007
 20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow 
component as input. :&amp;nbsp; : 1475 : Union All Output 1 : 1470 : Sort 3 : 
1471 : Sort Input : 2085&lt;br&gt;&lt;/font&gt;&lt;font color="#0000ff"&gt;OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data
 Flow 
Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007
 20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow 
component as input. :&amp;nbsp; : 2121 : Sort Output : 2113 : Merge Join 2 : 2114
 : Merge Join Left Input : 9936 &lt;/font&gt;&lt;/font&gt;&lt;/p&gt;

&lt;p&gt;The important stuff is what comes in the last field, the message so&amp;nbsp;let's break that down. Here is an example message:&lt;/p&gt;

&lt;p&gt;&lt;font color="#0000ff" face="Courier New"&gt;Rows
 were provided to a data flow component as input. :&amp;nbsp; : 1030 : OLE DB 
Source Output : 1025 : Sort 2 : 1026 : Sort Input : 9972&lt;/font&gt;&lt;/p&gt;

&lt;p&gt;And here's what it is constituted of:&amp;nbsp;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rows
 were provided to a data flow component as input. - That's consistent in
 each message, so to be honest they could have left it out&lt;/li&gt;

&lt;li&gt;&lt;div&gt;1030&amp;nbsp;- ID of the data-path providing the buffer&lt;/div&gt;&lt;/li&gt;

&lt;li&gt;&lt;div&gt;OLE DB Source Output - Name of the&amp;nbsp;data-path providing the buffer&lt;/div&gt;&lt;/li&gt;

&lt;li&gt;&lt;div&gt;1025 - ID of the component receiving the buffer&lt;/div&gt;&lt;/li&gt;

&lt;li&gt;&lt;div&gt;Sort 2 - Name of the component receiving the buffer&lt;/div&gt;&lt;/li&gt;

&lt;li&gt;&lt;div&gt;1026 - ID of the input receiving the buffer&lt;/div&gt;&lt;/li&gt;

&lt;li&gt;&lt;div&gt;Sort Input - Name of the input receiving the buffer&lt;/div&gt;&lt;/li&gt;

&lt;li&gt;&lt;div&gt;9972&amp;nbsp;- Number of rows in the buffer&lt;/div&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As
 you can see, this information will enable you to determine exactly how 
many rows are output from each component. Thus, if you are not receiving
 the expected number of rows at a destination this event will help you 
to deduce where you are losing them from.&lt;/p&gt;

&lt;p&gt;You may not know this 
but&amp;nbsp;you actually encounter OnPipelineRowsSent every time you run a 
data-flow within the SSIS Designer in&amp;nbsp;BIDS. Notice how the designer 
shows the number of rows passing through the pipeline and through each 
component:&lt;/p&gt;
&lt;img src="http://i1177.photobucket.com/albums/x349/jamiekth/df.jpg" alt="Photobucket" border="0"&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;How&amp;nbsp;does the designer know how many rows are passing through the 
data-flow? Simple! Its because the SSIS Designer consumes the 
OnPipelineRowsSent event that the executing package "throws" up and 
then&amp;nbsp;presents that information visually&amp;nbsp;as the numbers that you see 
flashing in front of your eyes as the data-flow executes.&lt;/p&gt;

&lt;p&gt;I think 
that just about covers the simple stuff around OnPipelineRowsSent. If 
you have any questions then post them here as a comment.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://twitter.com/jamiet" target="_blank"&gt;@Jamiet&lt;/a&gt; &lt;br&gt;&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description></item><item><title>Always use dtexec.exe to test performance of your dataflows. No exceptions.</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2011/01/12/always-use-dtexec-exe-to-test-performance-of-your-dataflows-no-exceptions.aspx</link><pubDate>Wed, 12 Jan 2011 21:51:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:32572</guid><dc:creator>jamiet</dc:creator><description>&lt;P&gt;Earlier this evening I posted a blog post entitled &lt;A href="http://sqlblog.com/blogs/jamie_thomson/archive/2011/01/12/investigation-can-different-combinations-of-components-effect-dataflow-performance.aspx" target=_blank&gt;Investigation: Can different combinations of components effect Dataflow performance?&lt;/A&gt; where I compared the performance of three different dataflows all working to the same overall goal. I wanted to make one last point related to the results but I thought it warranted a blog post all of its own.&lt;/P&gt;
&lt;P&gt;Here is a screenshot of one of the dataflows that I was testing:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;A href="http://sqlblog.com/blogs/jamie_thomson/image_4B0ECC80.png"&gt;&lt;IMG style="BACKGROUND-IMAGE:none;BORDER-BOTTOM:0px;BORDER-LEFT:0px;PADDING-LEFT:0px;PADDING-RIGHT:0px;DISPLAY:inline;BORDER-TOP:0px;BORDER-RIGHT:0px;PADDING-TOP:0px;" title=image border=0 alt=image src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_1DAD36B0.png" width=758 height=337&gt;&lt;/A&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Pretty complicated I’m sure you’ll agree. Now, when I executed this dataflow in the test it was executing in ~19seconds however in that case I was executing using the command-line tool &lt;A href="http://msdn.microsoft.com/en-us/library/ms162810.aspx" target=_blank&gt;dtexec&lt;/A&gt;. I also tried executing inside the BIDS development environment and in that case it took &lt;B&gt;much &lt;/B&gt;longer – 139seconds. That’s more than seven times as long.&lt;/P&gt;
&lt;P&gt;The point I want to make is very simple. If you are testing your dataflows for performance &lt;B&gt;please&lt;/B&gt; use dtexec. Nothing else will suffice.&lt;/P&gt;
&lt;P&gt;&lt;A href="http://twitter.com/jamiet"&gt;@Jamiet&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;UPDATE: Matt Masson from the SSIS team has posted a great&amp;nbsp;blog post&amp;nbsp;explaining exactly &lt;EM&gt;why&lt;/EM&gt; dtexec is quicker. Go read it at &lt;A href="http://blogs.msdn.com/b/mattm/archive/2011/01/15/why-does-my-package-run-slower-in-bids-than-dtexec.aspx?wa=wsignin1.0"&gt;Why does my package run slower in BIDS than DTEXEC?&lt;/A&gt;&lt;/P&gt;</description></item><item><title>Investigation: Can different combinations of components affect Dataflow performance?</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2011/01/12/investigation-can-different-combinations-of-components-effect-dataflow-performance.aspx</link><pubDate>Wed, 12 Jan 2011 20:28:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:32571</guid><dc:creator>jamiet</dc:creator><description>&lt;h3&gt;Introduction&lt;/h3&gt;
&lt;p&gt;The Dataflow task is one of the core components (if not &lt;i&gt;the&lt;/i&gt; core component) of SQL Server Integration Services (SSIS) and often the most misunderstood. This is not surprising, its an incredibly complicated beast and we’re abstracted away from that complexity via some boxes that go yellow red or green and that have some lines drawn between them.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_04FD06F1.png"&gt;&lt;img src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_1B6FA578.png" style="background-image:none;border-width:0px;padding-left:0px;padding-right:0px;display:inline;padding-top:0px;" title="SSIS Dataflow" alt="Example Dataflow" border="0" width="641" height="200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;i&gt;Example dataflow&lt;/i&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In this blog post I intend to look under that facade and get into some of the nuts and bolts of the Dataflow Task by investigating how the decisions we make when building our packages can affect performance. I will do this by comparing the performance of three dataflows that all have the same input, all produce the same output, but which all operate slightly differently by way of having different transformation components.&lt;/p&gt;

&lt;p&gt;I also want to use this blog post to challenge a common held opinion that I see perpetuated over and over again on the &lt;a href="http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/threads" target="_blank"&gt;SSIS forum&lt;/a&gt;. That is, that people assume adding components to a dataflow will be detrimental to overall performance. Its not surprising that people think this –it is intuitive to think that more components means more work- however this is not a view that I share. I have always been of the opinion that there are many factors affecting dataflow duration and the number of components is actually one of the less important ones; having said that I have never proven that assertion and that is one reason for this investigation. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;i&gt;I have actually seen evidence that some people think dataflow duration is simply a function of number of rows and number of components. I’ll happily call that one out as a myth even without any investigation!&lt;/i&gt;&amp;nbsp;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;The Setup&lt;/h3&gt;
&lt;p&gt;I have a 2GB datafile which is a list of 4731904 (~4.7million) customer records with various attributes against them and it contains 2 columns that I am going to use for categorisation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[YearlyIncome] &lt;/li&gt;

&lt;li&gt;[BirthDate] &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The data file is a SSIS raw format file which I chose to use because it is the quickest way of getting data into a dataflow and given that I am testing the transformations, not the source or destination adapters, I want to minimise external influences as much as possible.&lt;/p&gt;

&lt;p&gt;In the test I will split the customers according to month of birth (12 of those) and whether or not their yearly income is above or below 50000 (2 of those); in other words I will be splitting them into 24 discrete categories and in order to do it I shall be using different combinations of SSIS’ Conditional Split and Derived Column transformation components. The 24 datapaths that occur will each input to a rowcount component, again because this is the least resource intensive means of terminating a datapath.&lt;/p&gt;

&lt;p&gt;The test is being carried out on a Dell XPS Studio laptop with a quad core (8 logical Procs) Intel Core i7 at 1.73GHz and Samsung SSD hard drive. Its running SQL Server 2008 R2 on Windows 7.&lt;/p&gt;
&lt;h3&gt;The Variables&lt;/h3&gt;
&lt;p&gt;Here are the three combinations of components that I am going to test:&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;b&gt;One Conditional Split&lt;/b&gt; - A single &lt;a href="http://msdn.microsoft.com/en-us/library/ms137886.aspx" target="_blank"&gt;Conditional Split&lt;/a&gt; component &lt;i&gt;CSPL Split by Month of Birth and income category&lt;/i&gt; that will use expressions on [YearlyIncome] &amp;amp; [BirthDate] to send each row to one of 24 outputs. 
&lt;blockquote&gt;
&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_41AA7BA2.png"&gt;&lt;img src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_5593E86B.png" style="background-image:none;border:0px none;padding-left:0px;padding-right:0px;display:inline;padding-top:0px;" title="Dataflow 1 - Raw Source and Conditional Split" alt="Dataflow 1 - Raw Source and Conditional Split" border="0" width="244" height="131"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This next screenshot displays the expression logic in use:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_70D3C49A.png"&gt;&lt;img src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_36784EB9.png" style="background-image:none;border-width:0px;padding-left:0px;padding-right:0px;display:inline;padding-top:0px;" title="image" alt="image" border="0" width="578" height="222"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;

&lt;li&gt;&lt;b&gt;Derived Column &amp;amp; Conditional Split&lt;/b&gt; - A &lt;a href="http://msdn.microsoft.com/en-us/library/ms141069.aspx" target="_blank"&gt;Derived Column&lt;/a&gt; component &lt;i&gt;DER Income Category&lt;/i&gt; that adds a new column [IncomeCategory] which will contain one of two possible text values {“LessThan50000”,”GreaterThan50000”} and uses [YearlyIncome] to determine which value each row should get. A Conditional Split component &lt;i&gt;CSPL Split by Month of Birth and Income Category&lt;/i&gt; then uses that new column in conjunction with [BirthDate] to determine which of the same 24 outputs to send each row to. Put more simply, I am separating the Conditional Split of #1 into a Derived Column and a Conditional Split. 
&lt;blockquote&gt;
&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_7FBAF3B4.png"&gt;&lt;img src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_1A8719C1.png" style="background-image:none;border-width:0px;padding-left:0px;padding-right:0px;display:inline;padding-top:0px;" title="Dataflow 2 - Raw Source, Derived Column and Conditional Split" alt="Dataflow 2 - Raw Source, Derived Column and Conditional Split" border="0" width="244" height="217"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The next screenshots display the expression logic in use:&lt;/p&gt;

&lt;table cellpadding="2" cellspacing="0"&gt;

&lt;tr&gt;
&lt;td&gt;&lt;i&gt;DER Income Category&lt;/i&gt;&lt;/td&gt;

&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_323E5127.png"&gt;&lt;img src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_4A61BB82.png" style="background-image:none;border-width:0px;padding-left:0px;padding-right:0px;display:inline;padding-top:0px;" title="image" alt="image" border="0" width="625" height="103"&gt;&lt;/a&gt;&lt;/td&gt;

&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;

&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;&lt;i&gt;CSPL Split by Month of Birth and Income Category&lt;/i&gt;&lt;/td&gt;

&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_491D22A3.png"&gt;&lt;img src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_358FC302.png" style="background-image:none;border-width:0px;padding-left:0px;padding-right:0px;display:inline;padding-top:0px;" title="image" alt="image" border="0" width="570" height="183"&gt;&lt;/a&gt;&lt;/td&gt;

&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;

&lt;li&gt;&lt;b&gt;Three Conditional Splits &lt;/b&gt;- A Conditional Split component that produces two outputs based on [YearlyIncome], one for each Income Category. Each of those outputs will go to a further Conditional Split that splits the input into 12 outputs, one for each month of birth (identical logic in each). In this case then I am separating the single Conditional Split of #1 into three Conditional Split components. 
&lt;blockquote&gt;
&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_580451BD.png"&gt;&lt;img src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_5DDEF556.png" style="background-image:none;border-width:0px;padding-left:0px;padding-right:0px;display:inline;padding-top:0px;" title="Dataflow 3 - 3 Conditional Splits" alt="Dataflow 3 - 3 Conditional Splits" border="0" width="450" height="197"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The next screenshots display the expression logic in use:&lt;/p&gt;

&lt;table cellpadding="2" cellspacing="0"&gt;

&lt;tr&gt;
&lt;td&gt;&lt;i&gt;CSPL Split by Income Category&lt;/i&gt;&lt;/td&gt;

&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;
&lt;blockquote style="margin-right:0px;" dir="ltr"&gt;
&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_387501E8.png"&gt;&lt;img src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_77668283.png" style="background-image:none;border-width:0px;padding-left:0px;padding-right:0px;display:inline;padding-top:0px;" title="image" alt="image" border="0" width="556" height="139"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/td&gt;

&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;&amp;nbsp;&lt;/td&gt;

&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;&lt;i&gt;CSPL Split by Month of Birth 1&amp;amp; 2&lt;/i&gt;&lt;/td&gt;

&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_4EE7A06F.png"&gt;&lt;img src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_148C2A8E.png" style="background-image:none;border-width:0px;padding-left:0px;padding-right:0px;display:inline;padding-top:0px;" title="image" alt="image" border="0" width="607" height="205"&gt;&lt;/a&gt;&lt;/td&gt;

&lt;td&gt;&amp;nbsp;&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;br&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each of these combinations will provide an input to one of the 24 rowcount components, just the same as before. For illustration here is a screenshot of the dataflow containing three Conditional Split components:&lt;/p&gt;

&lt;blockquote&gt; 
&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_4B0ECC80.png"&gt;&lt;img src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_1DAD36B0.png" style="background-image:none;border:0px none;padding-left:0px;padding-right:0px;display:inline;padding-top:0px;" title="image" alt="image" border="0" width="746" height="331"&gt;&lt;/a&gt;&lt;/p&gt;
 &lt;/blockquote&gt;

&lt;p&gt;As you can these dataflows have a fair bit of work to do and remember that they’re doing that work for 4.7million rows.&lt;/p&gt;

&lt;p&gt;I will execute each dataflow 10 times and use the average for comparison. I foresee three possible outcomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The dataflow containing just one Conditional Split (i.e. #1) will be quicker &lt;/li&gt;

&lt;li&gt;There is no significant difference between any of them &lt;/li&gt;

&lt;li&gt;One of the two dataflows containing multiple transformation components will be quicker &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Regardless of which of those outcomes come to pass we will have learnt something and that makes this an interesting test to carry out. Note that I will be executing the dataflows using dtexec.exe rather than hitting F5 within BIDS.&lt;/p&gt;
&lt;h3&gt;The Results and Analysis&lt;/h3&gt;
&lt;p&gt;The table below shows all of the executions, 10 for each dataflow. It also shows the average for each along with a standard deviation.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_32C6BBB7.png"&gt;&lt;img src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_17ADECA9.png" style="background-image:none;border:0px none;padding-left:0px;padding-right:0px;display:inline;padding-top:0px;" title="image" alt="image" border="0" width="663" height="273"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;i&gt;All durations are in seconds.&lt;/i&gt; &lt;br&gt;&lt;i&gt;I’m pasting a screenshot because I frankly can’t be bothered with the faffing about needed to make a presentable HTML table.&lt;/i&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It is plain to see from the average that the dataflow containing three conditional splits is significantly faster, the other two taking 43% and 52% longer respectively. This seems strange though, right? Why does the dataflow containing the most components outperform the other two by such a big margin? The answer is actually quite logical when you put some thought into it and I’ll explain that below.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;i&gt;Before progressing, a side note. The standard deviation for the “Three Conditional Splits” dataflow is orders of magnitude smaller – indicating that performance for this dataflow can be predicted with much greater confidence too.&lt;/i&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;The Explanation&lt;/h3&gt;
&lt;p&gt;I refer you to the screenshot above that shows how &lt;i&gt;CSPL Split by Month of Birth and salary category&lt;/i&gt; in the first dataflow is setup. Observe that there is a case for each combination of Month Of Date and Income Category – 24 in total. These expressions get evaluated in the order that they appear and hence if we assume that Month of Date and Income Category are uniformly distributed in the dataset we can deduce that the expected number of expression evaluations for each row is &lt;b&gt;12.5&lt;/b&gt; i.e. 1 (the minimum) + 24 (the maximum) divided by 2 = 12.5.&lt;/p&gt;

&lt;p&gt;Now take a look at the screenshots for the second dataflow. We are doing one expression evaluation in &lt;i&gt;DER Income Category&lt;/i&gt; and we have the same 24 cases in &lt;i&gt;CSPL Split by Month of Birth and Income Category&lt;/i&gt; as we had before, only the expression differs slightly. In this case then we have 1 + 12.5 = &lt;b&gt;13.5&lt;/b&gt; expected evaluations for each row – that would account for the slightly longer average execution time for this dataflow.&lt;/p&gt;

&lt;p&gt;Now onto the third dataflow, the quick one. &lt;i&gt;CSPL Split by Income Category&lt;/i&gt; does a maximum of 2 expression evaluations thus the expected number of evaluations per row is 1.5. &lt;i&gt;CSPL Split by Month of Birth 1&lt;/i&gt; &amp;amp; &lt;i&gt;CSPL Split by Month of Birth 2&lt;/i&gt; both have less work to do than the previous Conditional Split components because they only have 12 cases to test for thus the expected number of expression evaluations is 6.5 There are two of them so total expected number of expression evaluations for this dataflow is 6.5 + 6.5 + 1.5 = &lt;b&gt;14.5&lt;/b&gt;.&lt;/p&gt;

&lt;p&gt;14.5 is still more than 12.5 &amp;amp; 13.5 though so why is the third dataflow so much quicker? Simple, the conditional expressions in the first two dataflows have two boolean predicates to evaluate – one for Income Category and one for Month of Birth; the expressions in the Conditional Split in the third dataflow however only have one predicate thus they are doing a lot less work. To sum up, the difference in execution times can be attributed to the difference between:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;font face="Consolas"&gt;MONTH(BirthDate) == 1 &amp;amp;&amp;amp; &lt;/font&gt;&lt;b&gt;&lt;font face="Consolas"&gt;YearlyIncome &amp;lt;= 50000 &lt;br&gt;&lt;/font&gt;&lt;/b&gt;and &lt;br&gt;&lt;font face="Consolas"&gt;MONTH(BirthDate) == 1&lt;/font&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In the first two dataflows &lt;b&gt;YearlyIncome &amp;lt;= 50000&lt;/b&gt; gets evaluated an average of 12.5 times for every row whereas in the third dataflow it is evaluated once and once only. Multiply those 11.5 extra operations by 4.7million rows and you get a significant amount of extra CPU cycles – that’s where our duration difference comes from.&lt;/p&gt;
&lt;h3&gt;The Wrap-up&lt;/h3&gt;
&lt;p&gt;The obvious point here is that adding new components to a dataflow isn’t necessarily going to make it go any slower, moreover you may be able to achieve significant improvements by splitting logic over multiple components rather than one. Performance tuning is all about reducing the amount of work that needs to be done and that doesn’t necessarily mean use less components, indeed sometimes you may be able to reduce workload in ways that aren’t immediately obvious as I think I have proven here.&lt;/p&gt;

&lt;p&gt;Of course there are many variables in play here and your mileage will most definitely vary. I encourage you to &lt;a href="http://cid-550f681dad532637.office.live.com/self.aspx/Public/BlogShare/20110112/20110111%20Chaining%20Expression%20components.zip" target="_blank"&gt;download the package&lt;/a&gt; and see if you get similar results – let me know in the comments. The package contains all three dataflows plus a fourth dataflow that will create the 2GB raw file for you (you will also need the &lt;a href="http://sqlserversamples.codeplex.com/releases/view/45923" target="_blank"&gt;[AdventureWorksDW2008] sample database&lt;/a&gt; from which to source the data); simply disable all dataflows except the one you want to test before executing the package and remember, execute using &lt;a href="http://msdn.microsoft.com/en-us/library/ms162810.aspx" target="_blank"&gt;dtexec&lt;/a&gt;, not within BIDS.&lt;/p&gt;

&lt;p&gt;If you want to explore dataflow performance tuning in more detail then here are some links you might want to check out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www2.sqlblog.com/blogs/jamie_thomson/archive/2010/05/14/inequality-joins-asynchronous-transformations-and-lookups-ssis.aspx" target="_blank"&gt;Inequality joins, Asynchronous transformations and Lookups&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="http://www2.sqlblog.com/blogs/jamie_thomson/archive/2010/03/02/destination-adapter-comparison-ssis-video-nugget.aspx" target="_blank"&gt;Destination Adapter Comparison&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="http://www2.sqlblog.com/blogs/jamie_thomson/archive/2010/08/31/don-t-turn-the-dataflow-into-a-cursor-ssis.aspx" target="_blank"&gt;Don’t turn the dataflow into a cursor&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href="http://pragmaticworks.com/Resources/webinars/Month.aspx/23" target="_blank"&gt;SSIS Dataflow – Designing for performance (webinar)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Any comments? Let me know!&lt;/p&gt;

&lt;p&gt;&lt;a href="http://twitter.com/jamiet"&gt;@Jamiet&lt;/a&gt;&lt;/p&gt;</description></item><item><title>Don’t turn the dataflow into a cursor [SSIS]</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2010/08/31/don-t-turn-the-dataflow-into-a-cursor-ssis.aspx</link><pubDate>Tue, 31 Aug 2010 21:22:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:28477</guid><dc:creator>jamiet</dc:creator><description>&lt;p&gt;I saw a thread on the SSIS forum today that went something like this:&lt;/p&gt;  &lt;hr&gt;  &lt;blockquote&gt;   &lt;p&gt;I have the following dataset:&lt;/p&gt;    &lt;table cellspacing="0" cellpadding="2"&gt;       &lt;tr&gt;         &lt;td&gt;AccountNo&lt;/td&gt;          &lt;td&gt;Date&lt;/td&gt;          &lt;td&gt;DailyMovement&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td&gt;00000001&lt;/td&gt;          &lt;td&gt;28/08/2010&lt;/td&gt;          &lt;td&gt;10&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td&gt;00000001&lt;/td&gt;          &lt;td&gt;29/08/2010&lt;/td&gt;          &lt;td&gt;5&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td&gt;00000001&lt;/td&gt;          &lt;td&gt;30/08/2010&lt;/td&gt;          &lt;td&gt;7&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td&gt;00000002&lt;/td&gt;          &lt;td&gt;28/08/2010&lt;/td&gt;          &lt;td&gt;8&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td&gt;00000002&lt;/td&gt;          &lt;td&gt;29/08/2010&lt;/td&gt;          &lt;td&gt;6&lt;/td&gt;       &lt;/tr&gt;     &lt;/table&gt;    &lt;p&gt;for which I want to compute a running total per [AccountNo] &amp;amp; [Date] like so:&lt;/p&gt;    &lt;table cellspacing="0" cellpadding="2"&gt;       &lt;tr&gt;         &lt;td&gt;AccountNo&lt;/td&gt;          &lt;td&gt;Date&lt;/td&gt;          &lt;td&gt;DailyMovement&lt;/td&gt;          &lt;td&gt;RunningTotal&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td&gt;00000001&lt;/td&gt;          &lt;td&gt;28/08/2010&lt;/td&gt;          &lt;td&gt;10&lt;/td&gt;          &lt;td&gt;10&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td&gt;00000001&lt;/td&gt;          &lt;td&gt;29/08/2010&lt;/td&gt;          &lt;td&gt;5&lt;/td&gt;          &lt;td&gt;15&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td&gt;00000001&lt;/td&gt;          &lt;td&gt;30/08/2010&lt;/td&gt;          &lt;td&gt;7&lt;/td&gt;          &lt;td&gt;22&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td&gt;00000002&lt;/td&gt;          &lt;td&gt;28/08/2010&lt;/td&gt;          &lt;td&gt;8&lt;/td&gt;          &lt;td&gt;8&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td&gt;00000002&lt;/td&gt;          &lt;td&gt;29/08/2010&lt;/td&gt;          &lt;td&gt;6&lt;/td&gt;          &lt;td&gt;14&lt;/td&gt;       &lt;/tr&gt;     &lt;/table&gt;    &lt;p&gt;How do I do that using a script component?&lt;/p&gt; &lt;/blockquote&gt;  &lt;hr&gt;  &lt;p&gt;That last comment:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;i&gt;How do I do that using a script component?&lt;/i&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;is a fairly common question. People assume that if a calculated value is dependant on prior rows then a script component needs to be involved because that running total needs to be tracked somewhere, but that isn’t necessarily the case. Ask yourself, how would you do this if the data were residing in a database table, would you do this?:&lt;code style="font-size:12px;"&gt;&lt;span style="color:black;"&gt;&lt;/span&gt;      &lt;/code&gt;&lt;/p&gt;&lt;code style="font-size:12px;"&gt;&lt;blockquote&gt;&lt;font size="2"&gt;&lt;span style="color:blue;"&gt;SELECT &lt;/span&gt;&lt;span style="color:red;"&gt;'00000001' &lt;/span&gt;&lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;span style="color:black;"&gt;[AccountNo]&lt;/span&gt;&lt;span style="color:gray;"&gt;, &lt;/span&gt;&lt;span style="color:magenta;"&gt;CONVERT&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;&lt;span style="color:black;"&gt;date&lt;/span&gt;&lt;span style="color:gray;"&gt;,&lt;/span&gt;&lt;span style="color:red;"&gt;'20080828'&lt;/span&gt;&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;span style="color:black;"&gt;[Date]&lt;/span&gt;&lt;span style="color:gray;"&gt;,&lt;/span&gt;&lt;span style="color:black;"&gt;10 &lt;/span&gt;&lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:black;"&gt;[DailyMovement]            &lt;br&gt;&lt;/span&gt;&lt;span style="color:blue;"&gt;INTO&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:#434343;"&gt;#balances            &lt;br&gt;&lt;/span&gt;&lt;span style="color:blue;"&gt;UNION&amp;nbsp; &lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:gray;"&gt;ALL            &lt;br&gt;&lt;/span&gt;&lt;span style="color:blue;"&gt;SELECT &lt;/span&gt;&lt;span style="color:red;"&gt;'00000001' &lt;/span&gt;&lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;span style="color:black;"&gt;[AccountNo]&lt;/span&gt;&lt;span style="color:gray;"&gt;, &lt;/span&gt;&lt;span style="color:magenta;"&gt;CONVERT&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;&lt;span style="color:black;"&gt;date&lt;/span&gt;&lt;span style="color:gray;"&gt;,&lt;/span&gt;&lt;span style="color:red;"&gt;'20080829'&lt;/span&gt;&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;span style="color:black;"&gt;[Date]&lt;/span&gt;&lt;span style="color:gray;"&gt;,&lt;/span&gt;&lt;span style="color:black;"&gt;5 &lt;/span&gt;&lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:black;"&gt;[DailyMovement]            &lt;br&gt;&lt;/span&gt;&lt;span style="color:blue;"&gt;UNION&amp;nbsp; &lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:gray;"&gt;ALL            &lt;br&gt;&lt;/span&gt;&lt;span style="color:blue;"&gt;SELECT &lt;/span&gt;&lt;span style="color:red;"&gt;'00000001' &lt;/span&gt;&lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;span style="color:black;"&gt;[AccountNo]&lt;/span&gt;&lt;span style="color:gray;"&gt;, &lt;/span&gt;&lt;span style="color:magenta;"&gt;CONVERT&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;&lt;span style="color:black;"&gt;date&lt;/span&gt;&lt;span style="color:gray;"&gt;,&lt;/span&gt;&lt;span style="color:red;"&gt;'20080830'&lt;/span&gt;&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;span style="color:black;"&gt;[Date]&lt;/span&gt;&lt;span style="color:gray;"&gt;,&lt;/span&gt;&lt;span style="color:black;"&gt;7 &lt;/span&gt;&lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:black;"&gt;[DailyMovement]            &lt;br&gt;&lt;/span&gt;&lt;span style="color:blue;"&gt;UNION&amp;nbsp; &lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:gray;"&gt;ALL            &lt;br&gt;&lt;/span&gt;&lt;span style="color:blue;"&gt;SELECT &lt;/span&gt;&lt;span style="color:red;"&gt;'00000002' &lt;/span&gt;&lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;span style="color:black;"&gt;[AccountNo]&lt;/span&gt;&lt;span style="color:gray;"&gt;, &lt;/span&gt;&lt;span style="color:magenta;"&gt;CONVERT&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;&lt;span style="color:black;"&gt;date&lt;/span&gt;&lt;span style="color:gray;"&gt;,&lt;/span&gt;&lt;span style="color:red;"&gt;'20080828'&lt;/span&gt;&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;span style="color:black;"&gt;[Date]&lt;/span&gt;&lt;span style="color:gray;"&gt;,&lt;/span&gt;&lt;span style="color:black;"&gt;8 &lt;/span&gt;&lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:black;"&gt;[DailyMovement]            &lt;br&gt;&lt;/span&gt;&lt;span style="color:blue;"&gt;UNION&amp;nbsp; &lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:gray;"&gt;ALL            &lt;br&gt;&lt;/span&gt;&lt;span style="color:blue;"&gt;SELECT &lt;/span&gt;&lt;span style="color:red;"&gt;'00000002' &lt;/span&gt;&lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;span style="color:black;"&gt;[AccountNo]&lt;/span&gt;&lt;span style="color:gray;"&gt;, &lt;/span&gt;&lt;span style="color:magenta;"&gt;CONVERT&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;&lt;span style="color:black;"&gt;date&lt;/span&gt;&lt;span style="color:gray;"&gt;,&lt;/span&gt;&lt;span style="color:red;"&gt;'20080829'&lt;/span&gt;&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;span style="color:black;"&gt;[Date]&lt;/span&gt;&lt;span style="color:gray;"&gt;,&lt;/span&gt;&lt;span style="color:black;"&gt;6 &lt;/span&gt;&lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:black;"&gt;[DailyMovement]            &lt;br&gt;            &lt;br&gt;&lt;/span&gt;&lt;span style="color:blue;"&gt;SELECT &lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:black;"&gt;[AccountNo]            &lt;br&gt;&lt;/span&gt;&lt;span style="color:gray;"&gt;,&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:black;"&gt;[Date]            &lt;br&gt;&lt;/span&gt;&lt;span style="color:gray;"&gt;,&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:black;"&gt;[DailyMovement]            &lt;br&gt;&lt;/span&gt;&lt;span style="color:gray;"&gt;,&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="color:blue;"&gt;SELECT&amp;nbsp; &lt;/span&gt;&lt;span style="color:magenta;"&gt;SUM&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;&lt;span style="color:black;"&gt;DailyMovement&lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:gray;"&gt;)            &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="color:blue;"&gt;FROM&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="color:#434343;"&gt;#balances &lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:black;"&gt;b2            &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="color:blue;"&gt;WHERE&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="color:black;"&gt;b1.[AccountNo] &lt;/span&gt;&lt;span style="color:blue;"&gt;= &lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:black;"&gt;b2.[AccountNo]            &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="color:gray;"&gt;AND&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="color:black;"&gt;b1.[Date] &lt;/span&gt;&lt;span style="color:gray;"&gt;&amp;gt;= &lt;/span&gt;&lt;span style="color:black;"&gt;b2.[Date]&lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:gray;"&gt;)            &lt;br&gt;&lt;/span&gt;&lt;span style="color:blue;"&gt;FROM&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="color:#434343;"&gt;#balances &lt;/span&gt;&lt;span style="color:black;"&gt;b1;&lt;/span&gt;&lt;/font&gt;&lt;/blockquote&gt;   &lt;/code&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;  &lt;p&gt;OK that works, and here’s a screenshot to prove it:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_57F5D682.png"&gt;&lt;img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="image" border="0" alt="image" width="469" height="200" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_556CA4C4.png"&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;But really, would any of you actually &lt;i&gt;do&lt;/i&gt; this? Hopefully not, the use of the &lt;a target="_blank" href="http://en.wikipedia.org/wiki/Correlated_subquery"&gt;correlated subquery&lt;/a&gt; has simply turned what should be a nice set operation into a cursor-in-disguise (something I have &lt;a target="_blank" href="http://sqlblog.com/blogs/jamie_thomson/archive/2010/05/14/inequality-joins-asynchronous-transformations-and-lookups-ssis.aspx"&gt;talked about before&lt;/a&gt;) because that subquery will be getting executed for every row in the table. Instead you could run the following on that same dataset:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;code style="font-size:12px;"&gt;&lt;span style="color:blue;"&gt;&lt;font size="2"&gt;SELECT &lt;/font&gt;&lt;/span&gt;&lt;font size="2"&gt;&lt;span style="color:black;"&gt;b1.[AccountNo]            &lt;br&gt;&lt;/span&gt;&lt;span style="color:gray;"&gt;,&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:black;"&gt;b1.[Date]            &lt;br&gt;&lt;/span&gt;&lt;span style="color:gray;"&gt;,&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:black;"&gt;b1.[DailyMovement]            &lt;br&gt;&lt;/span&gt;&lt;span style="color:gray;"&gt;,&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="color:magenta;"&gt;SUM&lt;/span&gt;&lt;span style="color:gray;"&gt;(&lt;/span&gt;&lt;span style="color:black;"&gt;b2.[DailyMovement]&lt;/span&gt;&lt;span style="color:gray;"&gt;) &lt;/span&gt;&lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:black;"&gt;[RunningTotal]            &lt;br&gt;&lt;/span&gt;&lt;span style="color:blue;"&gt;FROM&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="color:#434343;"&gt;#balances &lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:black;"&gt;b1            &lt;br&gt;&lt;/span&gt;&lt;span style="color:blue;"&gt;INNER&amp;nbsp; JOIN &lt;/span&gt;&lt;span style="color:#434343;"&gt;#balances &lt;/span&gt;&lt;span style="color:black;"&gt;b2 &lt;/span&gt;&lt;span style="color:blue;"&gt;ON&amp;nbsp; &lt;/span&gt;&lt;span style="color:black;"&gt;b1.[AccountNo] &lt;/span&gt;&lt;span style="color:blue;"&gt;= &lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:black;"&gt;b2.[AccountNo]            &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="color:gray;"&gt;AND &lt;/span&gt;&lt;span style="color:black;"&gt;b1.[Date] &lt;/span&gt;&lt;span style="color:gray;"&gt;&amp;gt;= &lt;/span&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;span style="color:black;"&gt;b2.[Date]            &lt;br&gt;&lt;/span&gt;&lt;span style="color:blue;"&gt;GROUP&amp;nbsp; BY &lt;/span&gt;&lt;span style="color:black;"&gt;b1.[AccountNo]&lt;/span&gt;&lt;span style="color:gray;"&gt;,&lt;/span&gt;&lt;span style="color:black;"&gt;b1.[Date]&lt;/span&gt;&lt;span style="color:gray;"&gt;,&lt;/span&gt;&lt;span style="color:black;"&gt;b1.[DailyMovement]&lt;/span&gt;&lt;span style="color:gray;"&gt;;&lt;/span&gt;&lt;/font&gt;&lt;/code&gt; &lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;and you get the same result but with a much more amenable execution plan (execute with &lt;code style="font-size:12px;"&gt;&lt;font size="2"&gt;&lt;span style="color:blue;"&gt;SET STATISTICS &lt;/span&gt;&lt;span style="color:black;"&gt;IO &lt;/span&gt;&lt;span style="color:blue;"&gt;ON&lt;/span&gt;&lt;/font&gt;&lt;/code&gt; if you don’t believe me)!&lt;/p&gt;  &lt;p&gt;The same principle applies in a SSIS dataflow. Often there is no need to resort to a script component, the same result can be achieved using some smart dataflow design. You can use, for example, the same approach as in the second SQL query (above) using a combination of SSIS’ Sort, Aggregate, Merge Join and Conditional Split components. That is not to say that one will be quicker than the other but at least you’ll have something that is more intuitive and arguably more maintainable. Of course if performance gain is your primary goal then the correct advice is, as always, “test and measure, test and measure, test and measure”!!!&lt;/p&gt;  &lt;p&gt;Ask yourself “If I could, how would I solve this using T-SQL?” and see if that influences your dataflow design at all. Invariably script components should be the &lt;i&gt;last&lt;/i&gt; choice, not the first!&lt;/p&gt;  &lt;p&gt;&lt;a target="_blank" href="http://twitter.com/jamiet"&gt;@jamiet&lt;/a&gt;&lt;/p&gt;</description></item></channel></rss>