<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Considering surrogate keys for Unknown members</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2009/10/14/considering-surrogate-keys-for-unknown-members.aspx</link><description>All data warehouses have the notion of an Unknown member somewhere in their make-up. That is, a dimension member to which we can map fact records when the member that we need to map to is, well, unknown. The canonical example of where the Unknown member</description><dc:language>en</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>re: Considering surrogate keys for Unknown members</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2009/10/14/considering-surrogate-keys-for-unknown-members.aspx#17757</link><pubDate>Wed, 14 Oct 2009 21:45:18 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:17757</guid><dc:creator>Mladen</dc:creator><description>&lt;p&gt;the only really great reason i was given for not using int min value for the unknown is that the data compresion in 2008 is useless on it.&lt;/p&gt;
&lt;p&gt;if you start at -1, 0 you get at least great compression for first &amp;quot;few&amp;quot; rows.&lt;/p&gt;
</description></item><item><title>re: Considering surrogate keys for Unknown members</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2009/10/14/considering-surrogate-keys-for-unknown-members.aspx#17766</link><pubDate>Wed, 14 Oct 2009 22:48:56 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:17766</guid><dc:creator>SkullKiller</dc:creator><description>&lt;p&gt;Hi.&lt;/p&gt;
&lt;p&gt;You posted a very valid issue. With the &amp;quot;unknown member SK&amp;quot; we are really giving a meaning to something meaningless. But is it really true? It can mean so many things. It's like the null value: null != null, null = null, null = &amp;quot;nothing else&amp;quot;, null != &amp;quot;nothing else&amp;quot;. It means so many things and means nothing at the same time. We are always trying to give it a meaning (absence of value, no value exists, ...). The key to this is definition: it means what we define.&lt;/p&gt;
&lt;p&gt;So I see nothing wrong with the &amp;quot;unknown member SK&amp;quot;. On the other hand, when we use the &amp;quot;unknown member SK&amp;quot; in hierarquies sometimes we are identifying it as the &amp;quot;unknown member&amp;quot; of something, let's say, &amp;quot;unknown member of laptop category&amp;quot;. This way we sometimes use &amp;quot;-[0,9]*&amp;quot; member.&lt;/p&gt;
&lt;p&gt;To ease things a bit when I use this kind of SK I rather call it &amp;quot;unindentified @atributemember&amp;quot; instead of &amp;quot;unknown @atributemember&amp;quot;, this way I don't even think of the meaning issue given that it has a meaning.&lt;/p&gt;
</description></item><item><title>re: Considering surrogate keys for Unknown members</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2009/10/14/considering-surrogate-keys-for-unknown-members.aspx#17785</link><pubDate>Thu, 15 Oct 2009 09:39:47 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:17785</guid><dc:creator>Ben E</dc:creator><description>&lt;p&gt;Hear!Hear! &amp;nbsp;There is a reasonable rationale for this in OLAP frameworks that don't natively support the concept of an unknown member, but SSAS doesn't have this limitation. &amp;nbsp;I accept that in 10+ years of working with relational databases I have never once seen a negative integer surrogate key; but then neither have I seen SQL injection attempted against any websites I've developed, cross-site scripting attacks, et al.&lt;/p&gt;
&lt;p&gt;It is about inferring meaning, which in modern era of not having business logic enforced in the database is even more important.&lt;/p&gt;
</description></item><item><title>re: Considering surrogate keys for Unknown members</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2009/10/14/considering-surrogate-keys-for-unknown-members.aspx#17788</link><pubDate>Thu, 15 Oct 2009 13:02:44 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:17788</guid><dc:creator>Eric Wisdahl</dc:creator><description>&lt;p&gt;Great question! &amp;nbsp;I have thought about the negative SKs before and must say that the main reason why I believe that people don't use them is that they don't expect to ever need a SK larger than the max int value. &amp;nbsp;&lt;/p&gt;
&lt;p&gt;I suspect as more an more data is generated and examined we will begin to see a trend towards using &amp;quot;the other half&amp;quot; of the int region for SKs as well.&lt;/p&gt;
&lt;p&gt;As to the actual -1 for an unknown member? &amp;nbsp;Well, I would say that there is little harm in the consistent use of this value. &amp;nbsp;The main risk is from lazy programming...&lt;/p&gt;
</description></item><item><title>re: Considering surrogate keys for Unknown members</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2009/10/14/considering-surrogate-keys-for-unknown-members.aspx#17830</link><pubDate>Fri, 16 Oct 2009 13:52:30 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:17830</guid><dc:creator>Dave</dc:creator><description>&lt;p&gt;I'll just be happy if I can get my team to stop hardcoding SK values generally in our production code. &amp;nbsp;For example in a report, rather than joining to the lookup table and filtering on the business descriptive fields all too often the code will filter on an SK.&lt;/p&gt;
&lt;p&gt;Where &lt;/p&gt;
&lt;p&gt; Store = 5&lt;/p&gt;
&lt;p&gt;Apparently it's fine, the SK will never change. &amp;nbsp;Well we better hope it doesn't as if it did we'd have to recode half our system. &amp;nbsp;&lt;/p&gt;
&lt;p&gt;I agree that there is no need to use -1, but there is no great need to not use it either. &amp;nbsp;Making use of -ve numbers seems to make sense though, especially as we recently hit the upper limit on a table with a few hundred million rows! &amp;nbsp;Big Int ftw \o/&lt;/p&gt;
</description></item><item><title>re: Considering surrogate keys for Unknown members</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2009/10/14/considering-surrogate-keys-for-unknown-members.aspx#17832</link><pubDate>Fri, 16 Oct 2009 14:29:22 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:17832</guid><dc:creator>jamiet</dc:creator><description>&lt;p&gt;Dave,&lt;/p&gt;
&lt;p&gt;Ouch, hardcoding SK values in REPORTS? Jeez...that's a recipe for disaster. Does the report work in different environments? Surely the SKs aren't guaranteed?&lt;/p&gt;
&lt;p&gt;-Jamie&lt;/p&gt;
</description></item><item><title>re: Considering surrogate keys for Unknown members</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2009/10/14/considering-surrogate-keys-for-unknown-members.aspx#17985</link><pubDate>Tue, 20 Oct 2009 01:48:01 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:17985</guid><dc:creator>Everest</dc:creator><description>&lt;p&gt;I not only used negative SKs recently, but purposefully let the mainframe creating + surrogates so SQL Server could assign negative surrogates whenever the key was generated from the internet. In either case it's as easy to go forwards and backwards and avoided collision.&lt;/p&gt;
&lt;p&gt;Good stuff, Jamie! Thanks.&lt;/p&gt;
&lt;p&gt;Lee&lt;/p&gt;
</description></item><item><title>re: Considering surrogate keys for Unknown members</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2009/10/14/considering-surrogate-keys-for-unknown-members.aspx#18181</link><pubDate>Fri, 23 Oct 2009 15:03:35 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:18181</guid><dc:creator>Ralph Wilson</dc:creator><description>&lt;p&gt;I haven't thought about using the -1 (or other negative values) for surrogate keys before reading this post. &amp;nbsp;Now, though, I am wondering about using zero (0) instead of the -1. &amp;nbsp;It seems to me that there could be a psychological and &amp;quot;reasonable&amp;quot; basis for using zero . . . since you have zero information about the Unkown. ;-)&lt;/p&gt;
</description></item><item><title>re: Considering surrogate keys for Unknown members</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2009/10/14/considering-surrogate-keys-for-unknown-members.aspx#18613</link><pubDate>Sat, 07 Nov 2009 15:53:32 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:18613</guid><dc:creator>Kristian Wedberg</dc:creator><description>&lt;p&gt;Hey Jamie, on the meaninglessness: &lt;/p&gt;
&lt;p&gt;IMHO a so called key decision in a data warehouse with SKs is whether Surrogate Keys are static, or if Business Keys are static, i.e. the warehouse should guarantee that one or the other won't change over time (without also changing its meaning.) This way the warehouse users (e.g. reports) know whether to hard code Surrogate Keys or Business Keys in filters etc. The decision doesn't have to be the same across the whole warehouse, but consistency is obviously preferable.&lt;/p&gt;
&lt;p&gt;Examples:&lt;/p&gt;
&lt;p&gt;* If an organisation (or country, or person, or...) changes its name (assuming this is the Business Key), and I still want to treat it as the same organisation, I should make the Surrogate Keys static and hard code them in reports, but allow Business Keys to vary. This way the reports will still filter on the correct entry, even though the displayed name has changed.&lt;/p&gt;
&lt;p&gt;* If I want the Business Key change to imply this is a separate organisation, then reports should hard code business keys instead.&lt;/p&gt;
&lt;p&gt;In the first example we're adding even more meaning to SKs (beyond the -1 meaning), and as long as it's a conscious and communicated decision I believe it's a very good one.&lt;/p&gt;
&lt;p&gt;Tossing in even more meaning into the SK mix, instead of just using -1 = (Unknown), it's often good to use -1 = (Not Available) and -2 = (Not Applicable) to better show _why_ there is missing data.&lt;/p&gt;
&lt;p&gt;Thoughts?&lt;/p&gt;
</description></item><item><title>re: Considering surrogate keys for Unknown members</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2009/10/14/considering-surrogate-keys-for-unknown-members.aspx#18647</link><pubDate>Sun, 08 Nov 2009 10:19:13 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:18647</guid><dc:creator>jamiet</dc:creator><description>&lt;p&gt;Hi willshak,&lt;/p&gt;
&lt;p&gt;Thanks for the comments.&lt;/p&gt;
&lt;p&gt;Regarding static SKs or business keys...yes, one of them should definitely stay static/stable/be immutable. Generally I use business keys in my filter predicates as it reinforces the point that SKs are supposed to be meaningless. I would expect this to be the norm though readily accept that there may be other circumstances (possible unforeseen) where the business key changes.&lt;/p&gt;
&lt;p&gt;Regarding breaking down the reason for something being Unknown - yes, I think this is a good idea. On my most recent engagement most dimensions had an 'Unknown' &amp;amp; a 'Not applicable' member.&lt;/p&gt;
&lt;p&gt;Of course, if you're using the &amp;quot;Not Applicable&amp;quot; member then it suggests that the data you're storing against it is being stored at the wrong fact grain - however there may be good justifications for doing this.&lt;/p&gt;
&lt;p&gt;Thanks again for the cvomments, good thought provoking stuff.&lt;/p&gt;
&lt;p&gt;-Jamie&lt;/p&gt;
</description></item><item><title>re: Considering surrogate keys for Unknown members</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2009/10/14/considering-surrogate-keys-for-unknown-members.aspx#18724</link><pubDate>Tue, 10 Nov 2009 18:18:16 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:18724</guid><dc:creator>JamesH</dc:creator><description>&lt;p&gt;I have been using 'Unknown' -1 : and 'Uncategorized' -2 for many years but only when the project owner (or profiling) states that it is necessary to balance or keep track of these types of data. &amp;nbsp;Unknown has primarily been used in the past (even before SSAS) to facilitate auditing and correcting the master data, if it's ignored in the onset of a DW/DM project, you can bet that over time a manager that is concerned about his bonus will eventually want to know what the heck 'Unknown' means and how it's impacting his performance rating (sales and margins). &amp;nbsp;Uncategorized is a bit different and is usually provided by the source system but can have the same effect (positive/negative) on a system. &amp;nbsp;The navigational &lt;/p&gt;
&lt;p&gt;aspect of these comes into play with balancing and without them can create the 'blackhole of death' for a BI project. &amp;nbsp;&lt;/p&gt;
&lt;p&gt;Concerning SK's, it is and always has been necessary for a warehouse to 'shield' itself from the inconsistencies and changes from an underlying system(s); moreover, carrying natural keys might be useful with a single source but when you start integrating multiple systems you will inevitably run into the same problems as a relational database unless you use a completely generic key. &amp;nbsp;I could care less if it is a string or numeric as long as it can be controlled. &amp;nbsp;A good ETL process is, IMHO, the proper place for assignment and control of SK's.&lt;/p&gt;
</description></item><item><title>re: Considering surrogate keys for Unknown members</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2009/10/14/considering-surrogate-keys-for-unknown-members.aspx#28591</link><pubDate>Sun, 05 Sep 2010 19:12:49 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:28591</guid><dc:creator>JustinS707</dc:creator><description>&lt;p&gt;Is it too much to ask SSIS to create an inferred member surrogate key in the dimensions when the fact arrives too early? &amp;nbsp;Use of the &amp;quot;-1&amp;quot; value for the Unknown member is inefficient:&lt;/p&gt;
&lt;p&gt;1. You must keep your natural keys in the fact table to be able to update the surrogate key at some point. &amp;nbsp;Effectively, you defeat the purpose of using the surrogate key because you must keep the natural key column for every dimension that has an unknown member.&lt;/p&gt;
&lt;p&gt;2. You must have a process to check the dimension for all Unknown member natural keys in the fact table and update the fact table with the correct surrogate key.&lt;/p&gt;
&lt;p&gt;I believe this convention is antiquated and a lazy coding practice.&lt;/p&gt;
</description></item><item><title>Data Warehouse modelling deliberations – foreign keys and unknown members</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2009/10/14/considering-surrogate-keys-for-unknown-members.aspx#48739</link><pubDate>Tue, 16 Apr 2013 21:36:39 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:48739</guid><dc:creator>SSIS Junkie</dc:creator><description>&lt;p&gt;Earlier today I posted the following question on Twitter: Foreign keys in a data warehouse. Yes or no?&lt;/p&gt;
</description></item><item><title>re: Considering surrogate keys for Unknown members</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2009/10/14/considering-surrogate-keys-for-unknown-members.aspx#48753</link><pubDate>Wed, 17 Apr 2013 10:01:52 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:48753</guid><dc:creator>James Snape</dc:creator><description>&lt;p&gt;Hi Jamie,&lt;/p&gt;
&lt;p&gt;I read this blog post when you originally posted it and I used to be with the -1 crowd but in my last two projects I've decided to try &amp;quot;letting null be null&amp;quot;... We do apply some rules but overall it was most liberating.&lt;/p&gt;
&lt;p&gt;1. If Null or &amp;lt;empty string&amp;gt; have meaning in the source system then replace the null or empty string with the meaning (e.g. NULL &amp;gt; N/A, '' &amp;gt; Empty) BEFORE the surrogate key mapping stage. Then these values will be mapped using the same mapper.&lt;/p&gt;
&lt;p&gt;2. We consider null to be unknown once it reaches the data warehouse. We configure SSAS Unknown member handling properly.&lt;/p&gt;
&lt;p&gt;I was worried about doing this but in reality we have far fewer issues than we ever did with -1s and -2s. It even helps us maintain a ROLAP realtime partition because if rows get added but the dimension hasn't caught up then the cube still works - we just get some unknown data for a while which magically fixes itself when the dimension is processed.&lt;/p&gt;
&lt;p&gt;I do start keys at 1 because of a) data compression and b) until sequences came about it was really useful to be able to create an identity in the opposite direction when your modelling needs change without worrying about clashes - particularly around inheritance hierarchies.&lt;/p&gt;
&lt;p&gt;Oh yeah - FK's are a must in the DW too. Otherwise none of the tools work and new starters have no idea how your model fits together. I consider it unprofessional to not have them although if there is a proven performance hit I might keep them as metadata only FK's.&lt;/p&gt;
</description></item><item><title>re: Considering surrogate keys for Unknown members</title><link>http://sqlblog.com/blogs/jamie_thomson/archive/2009/10/14/considering-surrogate-keys-for-unknown-members.aspx#48754</link><pubDate>Wed, 17 Apr 2013 10:06:38 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:48754</guid><dc:creator>jamiet</dc:creator><description>&lt;p&gt;Great comment, thank you James. Definitely some food for thought there.&lt;/p&gt;
&lt;p&gt;The &amp;quot;data compression&amp;quot; argument for starting keys at 1 is something I've heard before. I'm wrestling with whether that's a good enough justification or not. Everything's a trade-off.&lt;/p&gt;
&lt;p&gt;cheers&lt;/p&gt;
&lt;p&gt;Jamie&lt;/p&gt;
</description></item></channel></rss>