THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Microsoft OLAP by Mosha Pasumansky

What are the natural hierarchies and why they are a good thing

Analysis Services 2005 Service Pack 2 is just around the corner. Microsoft released the list of improvements made in SP2. While many of them are self-explanatory, others could benefit from more explanations. I already wrote about two very important improvements - interaction between WHERE and axes, and Visual Totals changes. Today, this thread raised a question about the following improvement:

  • A warning message now appears when a user-defined hierarchy is not defined as a natural hierarchy.

The user was wondering, what are the natural hierarchies, how can user-defined hierarchy be defined as natural or unnatural, and why is it so bad to have user-defined hierarchy unnatural that the warning message has to appear. So, I realized, that this probably was one of the vaguer items on the list and offered my explanation. This post is a short translation of the forum thread to English.

First, let's find out what natural hierarchies are. While there were many (justified!) complains about product documentation, the concept of natural hierarchies is in fact documented on MSDN. To find this information on MSDN, we can use Live Search from Microsoft restricting on msdn.microsoft.com domain (until recently, Live Search didn't index MSDN well, so use of another search engine worked better. But this issue is now resolved, and in my experience, Live Search now returns more, better and more relevant results while searching MSDN then any other search engine). The very first link is an Introduction to MDX Scripting whitepaper, which contains the following definition:

A natural hierarchy is composed of attributes where each attribute is a member property of the attribute below. For example, the Geography hierarchy Country, State, City and Name is a natural hierarchy if City is a member property of Name; State is for City; and Country is for State. The hierarchy Gender-Age is not a natural hierarchy because Gender is not a member property of Age.

So the concept of natural hierarchies is tightly related to the concept of related attributes. Obviously, all the attribute hierarchies are already natural. With respect to number of levels, attribute hierarchy can have

  • Single level - when attribute is marked as non-aggregatable. One level hierarchies are always natural, because there is nothing to relate to !
  • Two levels - when attribute is marked as aggregatable. But the first level is 'All' level, and it is related to any attribute
  • Variable number of levels for Parent attribute hierarchy in Parent-Child. This is an unusual case, when single attribute unfolds into multiple levels, but the way Parent-Child hierarchy is build, is by respecting relationships between members, therefore this hierarchy is natural as well

Now that we know what natural hierarchies are, how can we make user-defined hierarchy natural or not. Again, Live Search is our friend, and we get the following BOL article:  "Specifying Attribute Relationships Between Attributes in a User-Defined Hierarchy", which walks step by step through the process in UI.

Now, to the most important question of all - why natural hierarchies are so important. Why SP2 bothered to introduce a special warning if user-defined hierarchy turned out unnatural. Well, there is plenty of materials published on importance of attribute relationships. It simply cannot be underestimated both from the performance point of view and , in the presence of calculations, from correctness point of view. I will add two more reasons which are often missed, but which are more closely related specifically to natural hierarchies and less to the general subject of attribute relationships:

1. Due to the fact that natural hierarchies always maintain one-to-many relationship between levels, they can be used internally for decoding of attribute values. Therefore, natural hierarchies are always processed into materialized hierarchies in AS2005. I.e. the entire hierarchy is computed and optimized for querying. On the other hand, unnatural hierarchies are kept as non-materialized, which means that any query to unnatural hierarchy will involve SE query (it can be spotted in Profiler as Dimension Query Subcube event) which is much more costly then going to the prebuilt data structures of materialized natural hierarchies.

2. Unnatural hierarchies tend to produce infamous arbitrary shape subcubes. The problem with arbitrary shapes is that they cannot be used in all contexts (for example SCOPE's cannot deal with them and raise an error), and in the contexts where they are accepted, there is a huge performance penalty associated with dealing with arbitrary shapes. Other then the fact that the code which deals with arbitrary shape subcubes is much more complex (and therefore slower), there are more significant ramifications of them:

  • Bitmap indexes cannot be utilized as efficiently as with normally shaped subcubes
  • Most of the caching subsystem have difficulties finding arbitrary shaped subcubes in the cache. This point actually leads us to another improvement in SP2 which probably needs a little bit more explanation. The document says "MDX query performance has been improved with respect to ..., arbitrary shapes, ...". This refers to the fact, that the caching subsystem has been improved to detect the arbitrary shaped caches within the execution context of single MDX query - something that wasn't possible before SP2. However, such arbitrary shaped caches still cannot be easily detected and therefore reused between the queries.

It should be clear by now, that whenever possible, unnatural hierarchies should be avoided. But this doesn't mean that unnatural hierarchies are always bad ! Any guidance should be considered within its reasoning. For example, in my article about Time Calculations in UDM, I showed how using unnatural hierarchies in the form Year -> QuarterOfYear -> MonthOfQuarter -> DayOfMonth, actually simplifies a lot writing time relation calculations. I guess my rule of thumb is following:

If there are two attributes in the dimension, attribute A with cardinality N and attribute B with cardinality M, then

  • If cardinality of CrossJoin(A,B) is equal to M, then A is related to B, the relationship should be defined, and A -> B hierarchy is natural
  • If cardinality of CrossJoin(A,B) is equal to (or very close to) N*M, then A and B are fully (or very close to fully) independent, and A -> B hierarchy is a "good" unnatural hierarchy
  • If cardinality of CrossJoin(A,B) is only a little bit greater then M, then A is almost related to B, and perhaps the key of B can be modified (for example by making it composite key to include key of A as well) to make them really related. If this is not possible, then A -> B hierarchy is unnatural, and it is "bad" unnatural hierarchy

Again, this rule of thumb should not be taken literally, but always applied in the context of the actual scenario.

Published Thursday, November 09, 2006 9:12 PM by mosha
Filed under:
Anonymous comments are disabled
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement