<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Search results matching tags 'Video', 'dataflow', and 'ssis'</title><link>http://sqlblog.com/search/SearchResults.aspx?o=DateDescending&amp;tag=Video,dataflow,ssis&amp;orTags=0</link><description>Search results matching tags 'Video', 'dataflow', and 'ssis'</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>SSIS gotcha: Beware of multiple outputs from a synchronous script component</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2010/07/04/ssis-gotcha-beware-of-multiple-outputs-from-a-synchronous-script-component.aspx</link><pubDate>Sun, 04 Jul 2010 20:43:12 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:26727</guid><dc:creator>jamiet</dc:creator><description>&lt;p&gt;A few days ago I was &lt;a href="http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/c0f2b376-66a3-49d8-8925-ada9a362ba53/?prof=required" target="_blank"&gt;alerted&lt;/a&gt; to a peculiarity of SSIS’s Script Component that I believe people need to be aware of. Its a peculiarity that can rear its head when your script component is &lt;strong&gt;synchronous and has multiple outputs&lt;/strong&gt;. Here’s an example of a dataflow that contains such a component:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_71055C17.png"&gt;&lt;img style="border-bottom:0px;border-left:0px;display:inline;border-top:0px;border-right:0px;" class="wlDisabledImage" title="ssis dataflow with a script component with multiple synchronous outputs" border="0" alt="ssis dataflow with a script component with multiple synchronous outputs" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_12A184E9.png" width="596" height="312" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Some things to note about this dataflow that you can’t tell from this screenshot:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;The source component will create a 1-row,1-column dataset with the value “1” in it&lt;/li&gt;    &lt;li&gt;Even though it has three outputs “SCR Send data to multiple outputs” is still a synchronous component*&lt;/li&gt; &lt;/ul&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;em&gt;*If you don’t understand how a synchronous component can have multiple outputs and/or have never heard of an Exclusion Group then you should go and read my blog post from 2005 &lt;/em&gt;&lt;a href="http://consultingblogs.emc.com/jamiethomson/archive/2005/09/05/SSIS-Nugget_3A00_-Multiple-outputs-from-a-synchronous-script-transform.aspx" target="_blank"&gt;&lt;em&gt;Multiple outputs from a synchronous script transform&lt;/em&gt;&lt;/a&gt;&lt;em&gt; – the blog post you’re reading right now won’t make much sense without it! If you have ever used the Multicast component then it shouldn’t surprise you that a synchronous component can have multiple outputs – because that’s a pretty good definition of what the Multicast component does!&lt;/em&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Here’s the important code inside “SCR Send data to multiple outputs”:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_055AFBD6.png"&gt;&lt;img style="border-bottom:0px;border-left:0px;display:inline;border-top:0px;border-right:0px;" class="wlDisabledImage" title="script component code multiple outputs" border="0" alt="script component code multiple outputs" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_5D484CB6.png" width="592" height="198" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;The code:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;increments the incoming value and puts it into Output0&lt;/li&gt;    &lt;li&gt;increments the value again and puts it into Output1&lt;/li&gt;    &lt;li&gt;increments the value again and puts it into Output2&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;Given that out starting value is “1” you might expect that our three outputs would contain the values “2”, “3” &amp;amp; “4” respectively but in fact that is not the case. Here is what we actually see in those three outputs when we execute the dataflow:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_0984D39B.png"&gt;&lt;img style="border-bottom:0px;border-left:0px;display:inline;border-top:0px;border-right:0px;" class="wlDisabledImage" title="SSIS data viewers, multiple outputs" border="0" alt="SSIS data viewers, multiple outputs" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_07D407C7.png" width="475" height="175" /&gt;&lt;/a&gt;&lt;/p&gt;    &lt;p&gt;&lt;a href="http://sqlblog.com/blogs/jamie_thomson/image_054AD609.png"&gt;&lt;img style="border-bottom:0px;border-left:0px;display:inline;border-top:0px;border-right:0px;" class="wlDisabledImage" title="ssis executed dataflow" border="0" alt="ssis executed dataflow" src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_428B8AD0.png" width="595" height="302" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Each of our outputs has the same value “4”. Why is that? The trick is in understanding something fundamental about synchronous components, &lt;em&gt;they only ever output the same number of rows as are input&lt;/em&gt;.&amp;#160; The fact that in the data flow above it appears as though three rows have been output is simply an illusion that is best explained by former SSIS Development Manager Kirk Haselden who left the following comment on my &lt;a href="http://consultingblogs.emc.com/jamiethomson/archive/2005/09/05/SSIS-Nugget_3A00_-Multiple-outputs-from-a-synchronous-script-transform.aspx" target="_blank"&gt;blog post&lt;/a&gt; that I linked to earlier:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;em&gt;The additional or duplicate rows are an illusion. The Dataflow Task actually tracks what buffer columns and rows are visible to the downstream transforms, but doesn't copy any buffers or rows. It simply &amp;quot;exposes&amp;quot; them with row and column views.        &lt;br /&gt;Truly, the synchronous outputs only send the same number of columns* as they receive on their inputs.         &lt;br /&gt;&lt;/em&gt;&lt;em&gt;- Kirk Haselden, 15th March 2006       &lt;br /&gt;&lt;font size="1"&gt;*This is a typo. Kirk meant to say “rows” not “columns”&lt;/font&gt;&lt;/em&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;In our case the script component has simply incremented the same value three times and we are looking at that same incremented value of “4” in each of our outputs.&lt;/p&gt;  &lt;p&gt;I have produced a short video that demonstrates this behaviour using the dataflow pictured above. Embedding videos here on SQLBlog is however a fiddly experience so for ease I’ll just direct you to view it on Vimeo at &lt;a href="http://vimeo.com/13081087" target="_blank"&gt;Multiple Outputs from a synchronous script component&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;Hope this helps. Any questions please put them in the comments.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://twitter.com/jamiet" target="_blank"&gt;@JamieT&lt;/a&gt;&lt;/p&gt;</description></item><item><title>Inequality joins, Asynchronous transformations and Lookups : SSIS</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2010/05/14/inequality-joins-asynchronous-transformations-and-lookups-ssis.aspx</link><pubDate>Fri, 14 May 2010 21:13:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:25178</guid><dc:creator>jamiet</dc:creator><description>&lt;P&gt;It is pretty much accepted by SQL Server Integration Services (SSIS) developers that synchronous transformations are generally quicker than asynchronous transformations (for a description of synchronous and asynchronous transformations go read &lt;A href="http://consultingblogs.emc.com/jamiethomson/archive/2005/02/11/SSIS_3A00_-Asynchronous-and-synchronous-data-flow-components.aspx" target=_blank&gt;Asynchronous and synchronous data flow components&lt;/A&gt;). Notice I said “generally” and not “always”; there are circumstances where using asynchronous transformations can be beneficial and in this blog post I’ll demonstrate such a scenario, one that is pretty common when building data warehouses.&lt;/P&gt;
&lt;P&gt;Imagine I have a &lt;FONT face="Courier New"&gt;[Customer]&lt;/FONT&gt; dimension table that manages information about all of my customers as a slowly-changing dimension. If that is a type 2 slowly changing dimension then you will likely have multiple rows per customer in that table. Furthermore you might also have datetime fields that indicate the effective time period of each member record. &lt;/P&gt;
&lt;P&gt;Here is such a table that contains data for four dimension members {Terry, Max, Henry, Horace}:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;A href="http://sqlblog.com/blogs/jamie_thomson/clip_image002_75E15D51.jpg"&gt;&lt;IMG style="BORDER-RIGHT-WIDTH:0px;DISPLAY:inline;BORDER-TOP-WIDTH:0px;BORDER-BOTTOM-WIDTH:0px;BORDER-LEFT-WIDTH:0px;" title="customer dimension table" border=0 alt="customer dimension table" src="http://sqlblog.com/blogs/jamie_thomson/clip_image002_thumb_129AD267.jpg" width=564 height=303&gt;&lt;/A&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Notice that we have multiple records per customer and that the &lt;FONT face="Courier New"&gt;[SCDStartDate] &lt;/FONT&gt;of a record is equivalent to the &lt;FONT face="Courier New"&gt;[SCDEndDate]&lt;/FONT&gt; of the record that preceded it (if there was one). (Note that &lt;A href="http://sqlblog.com/blogs/jamie_thomson/archive/2009/11/28/debunking-kimball-effective-dates.aspx"&gt;I am on record as saying I am not a fan of this technique&lt;/A&gt; of storing an &lt;FONT face="Courier New"&gt;[SCDEndDate] &lt;/FONT&gt;but for the purposes of clarity I have included it here.)&lt;/P&gt;
&lt;P&gt;Anyway, the idea here is that we will have some incoming data containing &lt;FONT face="Courier New"&gt;[CustomerName]&lt;/FONT&gt; &amp;amp; &lt;FONT face="Courier New"&gt;[EffectiveDate]&lt;/FONT&gt; and we need to use those values to lookup &lt;FONT face="Courier New"&gt;[Customer].[CustomerId]&lt;/FONT&gt;. The logic will be:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;I&gt;Lookup a [CustomerId] WHERE [CustomerName]=[CustomerName] AND [SCDStartDate] &amp;lt;= [EffectiveDate] AND [EffectiveDate] &amp;lt;= [SCDEndDate]&lt;/I&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;The conventional approach to this would be to use a full cached lookup but that isn’t an option here because we are using inequality conditions. The obvious next step then is to use a non-cached lookup which enables us to change the SQL statement to use inequality operators:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;A href="http://sqlblog.com/blogs/jamie_thomson/clip_image004_4FDB872E.jpg"&gt;&lt;IMG style="BORDER-RIGHT-WIDTH:0px;DISPLAY:inline;BORDER-TOP-WIDTH:0px;BORDER-BOTTOM-WIDTH:0px;BORDER-LEFT-WIDTH:0px;" title="no cache lookup" border=0 alt="no cache lookup" src="http://sqlblog.com/blogs/jamie_thomson/clip_image004_thumb_532CF909.jpg" width=689 height=486&gt;&lt;/A&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Let’s take a look at the dataflow:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;A href="http://sqlblog.com/blogs/jamie_thomson/clip_image006_50A3C74B.jpg"&gt;&lt;IMG style="BORDER-RIGHT-WIDTH:0px;DISPLAY:inline;BORDER-TOP-WIDTH:0px;BORDER-BOTTOM-WIDTH:0px;BORDER-LEFT-WIDTH:0px;" title="dataflow lookup" border=0 alt="dataflow lookup" src="http://sqlblog.com/blogs/jamie_thomson/clip_image006_thumb_1A529F3C.jpg" width=264 height=404&gt;&lt;/A&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Notice these are all synchronous components. This approach works just fine however it does have the limitation that it has to issue a SQL statement against your lookup set for every row thus we can expect the execution time of our dataflow to increase linearly in line with the number of rows in our dataflow; that’s not good.&lt;/P&gt;
&lt;P&gt;OK, that’s the obvious method. Let’s now look at a different way of achieving this using an asynchronous Merge Join transform coupled with a Conditional Split. I’ve shown it post-execution so that I can include the row counts which help to illustrate what is going on here:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;A href="http://sqlblog.com/blogs/jamie_thomson/clip_image008_1FC10FE0.jpg"&gt;&lt;IMG style="BORDER-RIGHT-WIDTH:0px;DISPLAY:inline;BORDER-TOP-WIDTH:0px;BORDER-BOTTOM-WIDTH:0px;BORDER-LEFT-WIDTH:0px;" title="dataflow merge join" border=0 alt="dataflow merge join" src="http://sqlblog.com/blogs/jamie_thomson/clip_image008_thumb_16F10794.jpg" width=508 height=443&gt;&lt;/A&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Notice that there are more rows output from our Merge Join component than on the input. That is because we are joining on &lt;FONT face="Courier New"&gt;[CustomerName] &lt;/FONT&gt;and, as we know, we have multiple records per &lt;FONT face="Courier New"&gt;[CustomerName] &lt;/FONT&gt;in our lookup set. Notice also that there are two asynchronous components in here (the Sort and the Merge Join).&lt;/P&gt;
&lt;P&gt;I have embedded a video below that compares the execution times for each of these two methods. The video is just over 8minutes long.&lt;/P&gt;
&lt;OBJECT&gt;
&lt;embed style="WIDTH:600px;HEIGHT:450px;" src="http://vimeo.com/moogaloop.swf?clip_id=11750066&amp;amp;server=vimeo.com&amp;amp;show_title=1&amp;amp;show_byline=1&amp;amp;show_portrait=0&amp;amp;color=00ADEF&amp;amp;fullscreen=1" allowfullscreen="true"&gt;&lt;/embed&gt;&lt;/OBJECT&gt;
&lt;BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;&lt;A href="http://vimeo.com/11750066" target=_blank&gt;View on Vimeo&lt;/A&gt;&lt;/EM&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For those that can’t be bothered watching the video I’ll tell you the results here. The dataflow that used the Lookup transform took &lt;STRONG&gt;36 seconds &lt;/STRONG&gt;whereas the dataflow that used the Merge Join took &lt;STRONG&gt;less than two seconds&lt;/STRONG&gt;. An illustration in case it is needed:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;A href="http://sqlblog.com/blogs/jamie_thomson/image_34EF1588.png"&gt;&lt;IMG style="BORDER-BOTTOM:0px;BORDER-LEFT:0px;DISPLAY:inline;BORDER-TOP:0px;BORDER-RIGHT:0px;" title=image border=0 alt=image src="http://sqlblog.com/blogs/jamie_thomson/image_thumb_13FBA2E1.png" width=392 height=158&gt;&lt;/A&gt; &lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Pretty conclusive proof that in some scenarios it may be quicker to use an asynchronous component than a synchronous one. Your mileage may of course vary.&lt;/P&gt;
&lt;P&gt;The scenario outlined here is analogous to performance tuning procedural SQL that uses cursors. It is common to eliminate cursors by converting them to set-based operations and that is effectively what we have done here. Our non-cached lookup is performing a discrete operation for every single row of data, exactly like a cursor does. By eliminating this cursor-in-disguise we have dramatically sped up our dataflow.&lt;/P&gt;
&lt;P&gt;I hope all of that proves useful. You can download the package that I demonstrated in the video from my SkyDrive at &lt;A title=http://cid-550f681dad532637.skydrive.live.com/self.aspx/Public/BlogShare/20100514/20100514%20Lookups%20and%20Merge%20Joins.zip href="http://cid-550f681dad532637.skydrive.live.com/self.aspx/Public/BlogShare/20100514/20100514%20Lookups%20and%20Merge%20Joins.zip"&gt;http://cid-550f681dad532637.skydrive.live.com/self.aspx/Public/BlogShare/20100514/20100514%20Lookups%20and%20Merge%20Joins.zip&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Comments are welcome as always.&lt;/P&gt;
&lt;P&gt;&lt;A href="http://twitter.com/jamiet" target=_blank&gt;@Jamiet&lt;/A&gt;&lt;/P&gt;</description></item><item><title>Destination Adapter Comparison : SSIS Video Nugget</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2010/03/02/destination-adapter-comparison-ssis-video-nugget.aspx</link><pubDate>Tue, 02 Mar 2010 12:11:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:22728</guid><dc:creator>jamiet</dc:creator><description>
&lt;p&gt;In this, my latest video nugget, I demonstrate and compare three ways that you can insert data into a SQL Server table from the SSIS DataFlow:&lt;/p&gt;
  
&lt;ul&gt;   
&lt;li&gt;OLE DB Destination without Fast Load &lt;/li&gt;
    
&lt;li&gt;OLE DB Destination with Fast Load &lt;/li&gt;
    
&lt;li&gt;SQL Server Destination &lt;/li&gt;
 &lt;/ul&gt;
  
&lt;p&gt;The demo peels back the covers to see how these three methods actually operate against the database. The video is embedded below however at the time of writing we are still having issues with getting videos to show up on SQLBlog meaning that the video may appear too small to view hence in the meantime I’ll refer you to view the video on Vimeo at &lt;a href="http://vimeo.com/9806121" title="http://vimeo.com/9806121"&gt;http://vimeo.com/9806121&lt;/a&gt; (it is 7m19s long).&lt;/p&gt;
  
&lt;p&gt;If after watching this you want to know any more then you may glean some useful information from my blog post &lt;a href="http://consultingblogs.emc.com/jamiethomson/archive/2006/08/14/SSIS_3A00_-Destination-Adapter-Comparison.aspx" target="_blank"&gt;SSIS: Destination Adapter Comparison&lt;/a&gt; from August 2006 or feel free to post any questions in the comments below and I’ll do my best to answer them! &lt;/p&gt;
  
&lt;p&gt;Hope this is useful!&lt;/p&gt;
  
&lt;p&gt;&lt;a href="http://twitter.com/jamiet" target="_blank"&gt;@Jamiet&lt;/a&gt;&lt;/p&gt;
 
&lt;embed src="http://vimeo.com/moogaloop.swf?clip_id=9806121&amp;amp;server=vimeo.com&amp;amp;show_title=1&amp;amp;show_byline=1&amp;amp;show_portrait=0&amp;amp;color=00ADEF&amp;amp;fullscreen=1" allowfullscreen="true" style="width:600px;height:450px;"&gt;&lt;a href="http://vimeo.com/moogaloop.swf?clip_id=9806121&amp;amp;server=vimeo.com&amp;amp;show_title=1&amp;amp;show_byline=1&amp;amp;show_portrait=0&amp;amp;color=00ADEF&amp;amp;fullscreen=1" class="onxldjuucmcshidmscok"&gt;&lt;/a&gt;&lt;a href="http://vimeo.com/moogaloop.swf?clip_id=9806121&amp;amp;server=vimeo.com&amp;amp;show_title=1&amp;amp;show_byline=1&amp;amp;show_portrait=0&amp;amp;color=00ADEF&amp;amp;fullscreen=1" class="onxldjuucmcshidmscok"&gt;&lt;/a&gt;&lt;a href="http://sqlblog.com/controlpanel/blogs/" class="onxldjuucmcshidmscok"&gt;&lt;/a&gt;&lt;a href="http://sqlblog.com/controlpanel/blogs/" class="onxldjuucmcshidmscok"&gt;&lt;/a&gt;&lt;a href="http://vimeo.com/moogaloop.swf?clip_id=9806121&amp;amp;server=vimeo.com&amp;amp;show_title=1&amp;amp;show_byline=1&amp;amp;show_portrait=0&amp;amp;color=00ADEF&amp;amp;fullscreen=1" class="onxldjuucmcshidmscok"&gt;&lt;/a&gt;&lt;a href="http://vimeo.com/moogaloop.swf?clip_id=9806121&amp;amp;server=vimeo.com&amp;amp;show_title=1&amp;amp;show_byline=1&amp;amp;show_portrait=0&amp;amp;color=00ADEF&amp;amp;fullscreen=1" class="onxldjuucmcshidmscok"&gt;&lt;/a&gt;&lt;a href="http://vimeo.com/moogaloop.swf?clip_id=9806121&amp;amp;server=vimeo.com&amp;amp;show_title=1&amp;amp;show_byline=1&amp;amp;show_portrait=0&amp;amp;color=00ADEF&amp;amp;fullscreen=1" class="onxldjuucmcshidmscok"&gt;&lt;/a&gt;&lt;a href="http://vimeo.com/moogaloop.swf?clip_id=9806121&amp;amp;server=vimeo.com&amp;amp;show_title=1&amp;amp;show_byline=1&amp;amp;show_portrait=0&amp;amp;color=00ADEF&amp;amp;fullscreen=1" class="onxldjuucmcshidmscok"&gt;&lt;/a&gt;&lt;a href="http://sqlblog.com/controlpanel/blogs/" class="onxldjuucmcshidmscok"&gt;&lt;/a&gt;&lt;a href="http://sqlblog.com/controlpanel/blogs/" class="onxldjuucmcshidmscok"&gt;&lt;/a&gt;</description></item></channel></rss>