<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Eric Johnson : Slowly Changing Dimension</title><link>http://sqlblog.com/blogs/eric_johnson/archive/tags/Slowly+Changing+Dimension/default.aspx</link><description>Tags: Slowly Changing Dimension</description><dc:language>en</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>Slowly Changing Dimensions - Duplicate Data Issues</title><link>http://sqlblog.com/blogs/eric_johnson/archive/2009/05/05/slowly-changing-dimensions-duplicate-data-issues.aspx</link><pubDate>Tue, 05 May 2009 21:14:23 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:13801</guid><dc:creator>ejohnson2010</dc:creator><slash:comments>1</slash:comments><comments>http://sqlblog.com/blogs/eric_johnson/comments/13801.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/eric_johnson/commentrss.aspx?PostID=13801</wfw:commentRss><description>&lt;p&gt;The Slowly Changing Dimension (SCD) transformations are pretty handy in SSIS packages. They let you compare a new set of data to an existing table and insert or update as needed. Alternatively, you can have them insert new rows and mark old rows as &amp;quot;Expired&amp;quot; instead of updating rows when a change occurs. This is great if you want to maintain history. Now these are meant for dimensions in a data warehouse, but you can use them against any table in your database. The reason for this post is to look at an issue that can occur with SCDs when you have duplicate data in your incoming data set. &lt;/p&gt;  &lt;p&gt;When an SCD analyzes a set of data it looks at each row and compares it to the destination. It decides whether the row will be inserted or updated (or inserted with the old row being marked inactive). Notice I said this comparison is with the incoming data and the destination. The SCD does check for duplicates that may exist in the incoming data. So if you have duplicates, that also happen to be new rows when compared to the destination, all the duplicate rows get &amp;quot;flagged&amp;quot; for insertion. You can probably guess what happens next. The first row gets inserted but the second and subsequent duplicates cause a Primary Key violation when they attempt to insert the now duplicate record. &lt;/p&gt;  &lt;p&gt;Now this might seem like a glitch to some, but this functionality is &amp;quot;by design &amp;quot;. The button line is that the data being inserted should be de-duplicated prior to being inserted into the destination. In a perfect world, the data would come from a source where it lived in a normalized, and therefore de-duplicated, state. If you don't have the luxury of living in a perfect world, you can build a manual data de-dup process or use a Sort transformation. One of the options of the Sort transformation is to &amp;quot;Remove Rows with Duplicate Sort Values&amp;quot;. Just drop one of these in, check the box, and you have magically de-dup your values.&lt;/p&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=13801" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/eric_johnson/archive/tags/SSIS/default.aspx">SSIS</category><category domain="http://sqlblog.com/blogs/eric_johnson/archive/tags/Slowly+Changing+Dimension/default.aspx">Slowly Changing Dimension</category><category domain="http://sqlblog.com/blogs/eric_johnson/archive/tags/Sort+Transformation/default.aspx">Sort Transformation</category></item></channel></rss>