THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Microsoft OLAP by Mosha Pasumansky

Slicer and axes interaction in MDX Part 1 - coordinate overwrites

OLAP is often described as 'slice-and-dice', which reflects the fact that slicing is the most fundamental OLAP operation. Traditionally, in MDX, the slicing was done by using the WHERE clause. While Analysis Services 2005 introduced the notion of subselects, there are still very few client tools which take advantage of them (Excel 2007 and ReportBuilder are the only ones known to me.) The majority of client tools (and the list grows almost weekly at http://www.mosha.com/msolap/util.htm) use the good ol' WHERE clause, and this will be the focus of this article. Prior to Analysis Services 2005, the behavior of WHERE clause was pretty straightforward - the tuple specified in it was set into the current coordinate, and then the query axes and cells were evaluated in that context. Analysis Services 2005 introduced two major changes: First, it became possible to use sets in the WHERE clause, i.e. what is commonly known as multiselect; Second, since AS2005 allows same attributes to be used across multiple hierarchies, it became possible to have same attribute both in the axis and in the WHERE clause. Multiselect deserves it own deep discussion, here we will focus on interaction of attributes between axis and WHERE clause.

Let's imagine, that we are using AdventureWorks cube, and we want to analyze Internet sales of Helmets. So we are moving [Product].[Subcategory].[Helmets] into the slicer (also known as Page Field in some tools), and we look at [Measures].[Internet Sales Amount] for measures. The typical query generated by normal tool would be something like (for clarity I will use fully qualified names as opposed to member unique names in all examples)

SELECT
 {[Measures].[Internet Sales Amount]} ON COLUMNS
FROM [Adventure Works]
WHERE [Product].[Subcategory].[Helmets]

We get back single cell with value of $225,335.60 - and we know that this represents total of Internet Sales for Caps across all dimensions in the measure group. Now we want to get more insight into it, so we can analyze sales of caps against other dimensions, in a normal OLAP fashion. To see the distribution of sales by Ship Day of Week, we would use the following query

SELECT
 {[Measures].[Internet Sales Amount]} ON COLUMNS
 ,[Ship Date].[Day of Week].MEMBERS ON ROWS
FROM [Adventure Works]
WHERE [Product].[Subcategory].[Helmets]

We get a nice breakdown by day of week, notice that most sales happen on Sunday, and the total is of course the same - $225,335.60. Now, instead of day of week analysis, we want to see breakdown by color. The query to do that would be

SELECT
 {[Measures].[Internet Sales Amount]} ON COLUMNS
 ,[Product].[Color].MEMBERS ON ROWS
FROM [Adventure Works]
WHERE [Product].[Subcategory].[Helmets]

If we execute it, the results first seem to be OK - Red helmets account for $78,027.70, Blue helmets for $74,353.75 and Black helmets for $72,954.15. And the total is, of course, still the same - $225,335.60. But wait a second - Color attribute contains more then just 3 colors. There is also Grey and Silver and Yellow and so on. Where did all the other colors go ? We didn't use NON EMPTY clause in the query, so we cannot explain that the other colors were eliminated by it. What happens here, is that Analysis Services performs an implicit Exists operation between each of the query axes and the WHERE clause. We can easily confirm that by checking the result of this query

SELECT Exists([Product].[Color].MEMBERS,[Product].[Subcategory].[Helmets]) ON COLUMNS, {} ON ROWS
FROM [Adventure Works]

Indeed, only Red, Blue and Black show up. The reason why Analysis Services performs implicit Exists between axes and WHERE clause is one of the incarnations of autoexist behavior. The philosophy behind it is that only the tuple coordinate which show up in the dimension table exist in the multidimensional space, therefore there should be no way to get to the tuples (and therefore to cells) which have non-existing coordinates. The only place where Analysis Services doesn't perform autoexist is between different axes in the SELECT query, because it is impossible to do in general case. However, the cells at the intersection on non-existing tuple coordinates are always empty, it is not possible to assign any value to them.

OK, so far so good. Now let's try to see breakdown by product categories. We will put the [Product Model Categories] on the axis expanded up to the [Category] level, so the query will look like following:

SELECT
 {[Measures].[Internet Sales Amount]} ON COLUMNS
 ,Descendants([Product].[Product Model Categories].[All Products], [Product].[Product Model Categories].[Category], SELF_AND_BEFORE) ON ROWS
FROM [Adventure Works]
WHERE [Product].[Subcategory].[Helmets]

Let's run this query and... Wowowow - what did just happen ??? The results look absolutely unrelated to the question we were trying to ask. For the total we get $29,358,677.22 - that is 29 million dollars instead of 225 thousand that we were getting with all the queries above. And the only category showing up is Accessories (which makes sense because Helmet is in Accessory category), but the value for it is $700,759.96 - way below the total of 29 millions, but still more then three times above the expected $225,335.60 ! The first normal reaction to such results is to declare this a bug. Now, before I proceed, let me say upfront, that I absolutely agree that this behavior is broken and unusable. And at the end of this article you will find a solution how to get good and expected results without changing single thing about the query or the cube etc. The following paragraph is not an attempt to defend the clearly wrong results, it is an attempt to explain the logic behind the code that generated it.

To understand what happened in the last query, first thing we will notice that the hierarchy [Product Model Categories] used in the axis, and hierarchy [Subcategory] used in the WHERE clause share [Subcategory] attribute. The WHERE clause set this attribute to the [Helmets] coordinate, while both [All Products] and [Assessories] members of [Product Model Categories] hierarchy set it to the [All Subcategories] coordinate. So what should happen in case of such a conflict ? Normally, in calculations, if the current context has [Helmets] as current coordinate, and calculation specifies [All Subcategories], the calculation always overwrites the context. This is essential for MDX to work properly. I.e.

WITH MEMBER Measures.RatioToParent
  AS [Measures].[Internet Sales Amount] / ([Measures].[Internet Sales Amount],[Product].[Product Model Categories].Parent)
  , FORMAT_STRING = 'Percent'
SELECT Measures.RatioToParent ON COLUMNS
FROM [Adventure Works]
WHERE [Product].[Subcategory].[Helmets]

The reason why this formula works and returns correct value of 32.16% is because after WHERE clause sets the current coordinate to (Category.Accessories, Subcategory.Helmets), the formula can overwrite it with [Product].[Product Model Categories].Parent, i.e. erasing the Helmets and leaving tuple to be just (Category.Accessories, Subcategory.[All Subcategories]).

Therefore, the thinking in design was that axes should behave similarly. I.e. MDX dictates the order in which parts of query are applied - WHERE clause is done first and set into current context. Then axes are executed in the context of WHERE clause. Therefore, if axes want to overwrite some attribute coordinates in the current context - they should be able to do so, just like calculations do ! This sounds like a logical thing to do, until we realize consequences. Basically, in our troubled query, the WHERE sets the context to (Category.Accessories, Subcategory.Helmets), but the ROWS axis overwrites it to coordinates (Category.Accessories, Subcategory.[All Subcategories]) and (Category.[All Categories], Subcategory.[All Subcategories]). Therefore the huge numbers that we saw - 29 million and 700 thousands are simply values for Internet Sales Amount for all sales and sales of Accessories respectively. And what makes the matters even worse is the fact, that even if there were no attribute overlap between axes and WHERE clause, like in the following query

SELECT
 {[Measures].[Internet Sales Amount]} ON COLUMNS
 ,[Product].[Category].MEMBERS ON ROWS
FROM [Adventure Works]
WHERE [Product].[Subcategory].[Helmets]

Here, there is no attribute overlap, since rows axis only contains Category attribute, and WHERE clause only contains Subcategories attribute. Yet the results are exactly the same (wrong) as in the previous query. The reason is because Subcategories and Category attributes are related, and Analysis Services makes sure that changing one affects the other. So now this seems like complete nightmare. If in the previous query example, perhaps client tools could do something and not allow queries with overlapping attributes (although this would be a serious limitation), for this query the client tool would need to analyze the attribute relationships, something that definitely should be transparent. So all this seems to be pretty broken, but the explanation above did sound reasonable - where is the hole in the logic ? There is a hole in the logic, and it stems from what we stated just a few paragraphs above. Remember that there is an implicit Exists between the axes and WHERE clause. This is different from how calculations apply to the current context. I.e. in the following query

WITH MEMBER Measures.CapsSales AS ([Measures].[Internet Sales Amount],[Product].[Product Categories].[Caps])
SELECT Measures.CapsSales ON COLUMNS
FROM [Adventure Works]
WHERE [Product].[Subcategory].[Helmets]
Calculated member overwrites Helmets with Caps and it's fine, but in this query
SELECT ([Measures].[Internet Sales Amount],[Product].[Product Categories].[Caps]) ON COLUMNS
FROM [Adventure Works]
WHERE [Product].[Subcategory].[Helmets]

We get empty set, because Caps don't exists together with Helmets. So it is clear that axes interact with current context differently from how calculations interact with it. Calculation overwrite current coordinates, and it is the desired behavior. But axes should not overwrite the current context !

So all this good and fine, but what are users to do ? Where is the solution ? Well, the solution here is simple. We realized that the way this shipped in AS2005 is clearly broken and inconsistent, and this is why this is fixed in SP2. SP2 should become available really soon now, and the change to the interaction between axes and WHERE is one of the most important changes there. Therefore, both of the queries below return the expected value of $225,335.60:

SELECT
 {[Measures].[Internet Sales Amount]} ON COLUMNS
 ,Descendants([Product].[Product Model Categories].[All Products], [Product].[Product Model Categories].[Category], SELF_AND_BEFORE) ON ROWS
FROM [Adventure Works]
WHERE [Product].[Subcategory].[Helmets]

SELECT
 {[Measures].[Internet Sales Amount]} ON COLUMNS
 ,[Product].[Category].MEMBERS ON ROWS
FROM [Adventure Works]
WHERE [Product].[Subcategory].[Helmets]

Unfortunately, the fix only applies to the case of single member slicer, and doesn't quite work with multiselect, but as we noted at the beginning of the article, multiselect is a separate issue which should be discussed in a separate article.

Published Wednesday, November 01, 2006 11:02 PM by mosha
Anonymous comments are disabled
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement