THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Microsoft OLAP by Mosha Pasumansky

Writing multiselect friendly MDX calculations

One very common problem with authoring MDX calculations historically has been the behavior of these calculations when the user who browses the cube selects multiple members in the slicer as opposed to single one. Of course this functionality (usually called multiselect) is present in all of the major AS client tools - Excel, OWC, Panorama, Proclarity, various Excel Add-Ins, Data Analyzer etc. It often happens that MDX calculations which work fine in single member slice mode don't work that well when multiselect is used, or do work with multiselect by some dimensions, but not by others. The problems manifest themselves either as cell errors, or, worse, as wrong (i.e. non-intended) results. This problem is common, yet, to the best of my knowledge, it has not been addressed before. So calculation authors are left struggling to resolve it, and usual resolution is to ignore the problem, and hope that users won't do multiselect over offending dimensions. In this article I will try to address some of the typical scenarios where this problem can occur and recommend possible approaches for solution. I will use Foodmart 2000 database for examples, but most of the content applies to Analysis Services 2005 only.

Our first scenario will be calculation which requires scan of children/descendants of the current member in some hierarchy. Typical examples include

  1. Count number of stores with declining sales compared to the previous time period:
    Count(Filter(Descendants(Store.CurrentMember, Store.[Store Name]), [Store Sales] < ([Store Sales],Time.PrevMember)))
  2. Average sales count for products in the current category
    Avg(Descendants(Product.CurrentMember,Product.[Product Name]), [Sales Count])
  3. Maximum store square footage
    Max(Descendants(Store.CurrentMember, Store.[Store Name]), Val(Store.CurrentMember.Properties("Store SQFT")))

When we use any of these calculated measures in Excel - they work fine. We can take the first one - Declining Stores Count - and put it into Excel PivotTable and do analysis by products. Here is how PivotTable looks like when we slice by stores in WA separately and stores in CA separately

Time Q3
Store WA
   
Declining Stores Count  
Product Family Total
Drink 4
Food 5
Non-Consumable 4
Grand Total 5
Time Q3
Store CA
   
Declining Stores Count  
Product Family Total
Drink 2
Food 1
Non-Consumable 2
Grand Total 2

But as soon as we try to analyze WA and CA together - in Analysis Services 2000 we get wrong results - counts are 0 everywhere, no matter what we choose:

Time Q3
Store (Multiple Items)
   
Declining Stores Count  
Product Family Total
Drink 0
Food 0
Non-Consumable 0
Grand Total 0

To understand what happens behind the scenes, we need to investigate the MDX query that was generated by Excel when multiselect happens. The MDX query for the last report is following:

WITH MEMBER 
  [Store].[XL_QZX] AS 'Aggregate ( { [Store].[All Stores].[USA].[WA] , [Store].[All Stores].[USA].[CA] } )' 
SELECT 
  NON EMPTY HIERARCHIZE(AddCalculatedMembers({DrillDownLevel({[Product].[All Products]})})) 
    DIMENSION PROPERTIES PARENT_UNIQUE_NAME ON COLUMNS 
FROM [Sales] 
WHERE ([Measures].[Declining Stores Count], [Time].[1998].[Q3], [Store].[XL_QZX])
The key for the understanding is the query calculated member [XL_QZX] that is generated by Excel. It is Aggregate over WA and CA, and it is put into the WHERE clause of MDX query. Therefore, when AS needs to compute calculated measure [Declining Stores Count] - it applies the formula which refers to Store.CurrentMember. But Store.CurrentMember will be the XL_QZX calculated member - and trying to get its descendants at [Store Name] level will produce an empty set. Count of empty set is zero, this is why we are getting zeros everywhere - obviously not a desired answer. Other client tools generate similar MDX, in a sense that all of them create query calculated member in the Store hierarchy which does Aggregate over selected members - therefore same issue applies to all the client tools.

Now to make things more interesting, let's do the same exercise in Analysis Services 2005. If we migrate Foodmart 2000 cube to AS2005, define exactly same calculated measure for Declining Stores Count, and use same Excel - it will send exactly same query, but the result will be different. Instead of zeros, each cell will contain an error (represented as #VALUE! in Excel). The exact error message for the error says:

The MDX function CURRENTMEMBER failed because the coordinate for 'Store State' attribute contains a set.

Wow - this is pretty interesting, what does this error means ? To explain what is going on, we first need to go to the OLEDB for OLAP specification, the section about WHERE clause. From the very first version, OLEDB for OLAP had a provision for provider to specify not just a single tuple in the WHERE clause, but a set. The thinking was that by using set in the WHERE clause client applications could implement multiselect functionality (actually sets in WHERE clause allow more then that, but this would be a subject for another article). I.e. the following SQL query

SELECT [Product Family], Sum(Sales) FROM Sales WHERE [Store State] = 'WA' OR [Store State] = 'CA' GROUP BY [Product Family]

would be written in MDX as

SELECT Measures.Sales ON COLUMNS, Product.[Product Family].MEMBERS ON ROWS FROM Sales WHERE {Store.WA, Store.CA}

However, neither OLAP Services 7.0 nor Analysis Services 2000 ever implemented sets in the WHERE clause, and as far as I know neither did any other OLEDB for OLAP or XMLA provider. Therefore, client applications were forced to use Aggregate query calculated member in the WHERE clause technique described above. Analysis Services 2005 has completed most of OLEDB for OLAP compliance, and properly implements sets in the WHERE clause. So the preferred technique for client applications developed for AS2005 for implementing multiselect is to use sets in the WHERE clause. It gives both performance benefits over using calculated members as well as allows some additional interesting functionality (which is subject for another article). But obviously we wanted AS2005 to work well with existing client tools too. Therefore, AS's query engine recognizes the shape of the queries where there is query calculated member doing Aggregate over constant single grain set, and this calculated member (or members if there are multiple multiselects in different hierarchies) is in the WHERE clause. And when AS detects this situation, it replaces the calculated member in the WHERE clause with the corresponding set. This leads to one of the most fascinated consequences for the calculation model: Current coordinate in the cube in fact is not limited by single value in each attribute. Some attributes can have multiple values - i.e. sets. (Again, it is actually even more generic then that, since sets can freely cross attribute and dimension boundaries, but for the subject of multiselects the above is enough). The sets in the coordinate are preserved throughout all the calculation execution. Many MDX functions don't have problems with it, however when it comes to things which rely on hierarchy current member - such as Store.CurrentMember expression - it doesn't know what to do - since there are multiple current members for that coordinate. While theoretically (and perhaps in some future version) CurrentMember can be generalized to work over sets - in AS2005 it simply cannot work, and therefore returns abovementioned error - current coordinate contains a set.

Now, equipped with all this knowledge we can start thinking how to rewrite the expression for our calculated measure in such a way that it will continue to work and work correctly in the presence of sets in the coordinates on Store hierarchy. The rewrite is actually quite easy, once we notice, that Exists function and EXISTING operator are generalization of hierarchy related functions such as Descendants. Indeed, the process of building an hierarchy is nothing more then applying Exists between attributes which build individual levels of that hierarchy. Therefore all members of the hierarchy exist with each other almost by definition. But the same step can be done in runtime as well - so all we need to do is to replace Descendants function with EXISTING operator. Therefore the rewritten expression for Declining Stores Count will look the following, and the multiselect report in Excel which failed before now is going to succeed and return correct results.

Count(Filter(EXISTING Store.[Store Name], [Store Sales] < ([Store Sales],Time.PrevMember)))

Now, one word of caution: In this scenario such rewrite might not be for free. It seems like it comes with the cost of doing EXISTING operator every time the calculation applies. Obviously, Descendants function appears to be more optimal, since all the Exists happened once during hierarchy processing and Descendants just navigates through precomputed hierarchy. This may or may not be noticeable performance difference. In general, if dimension has good attribute relationships defined, doing Exists is very fast. Also it is less of a problem when dimension is not big. So it seems that one needs to weight the usefulness of multiselect working properly vs. potential slowdown of the calculation. However, the performance perspective has another twist - it may happen that instead of slowing things down - it can actually speed them up ! This can be best explained in AS2000 terms. If we will take the AS2000 version of the calculation, and put one of the virtual dimensions off the Store, for example [Store Type], into slicer and select single member, say Supermarket - what would happen in AS2000 ? Well, since Store hierarchy wasn't changed, the formula will get all of the descendants of [All Stores] member, which would be every single store, and run Filter over them. Of course, through the magic of virtual dimensions the [Store Sales] values on the stores which aren't Supermarket will end up being NULLs - so the answer will be correct - but it will still scan all the stores. Even more significant - for our last formula - maximum of store square footage - slicing on [Store Type] will have no effect whatsoever, since while we are iterating over stores, we are not looking at cell values, but rather at member property - therefore the answer in this scenario won't be correct ! Now enter AS2005 - which automatically applies EXISTING while iterates over the set. What it means is that it won't even look at the stores which aren't Supermarket, because they don't exist with Supermarket - therefore the set to iterate over will be smaller, and performance will be better. (AS2005 query optimizer of course is smart about not applying EXISTING if it is not necessary, i.e. when navigating the hierarchy while all other attributes are at All etc.)

OK, now let's review another typical scenario, which doesn't involve getting descendants of a member, but rather looks at the current member and examine some property. One example would be to define the following calculated measure to be used perhaps in financial reports

CREATE MEMBER VAmount = Iif(Account.CurrentMember.Properties("Account Type") = "Expense", -Measures.Amount, Measures.Amount);
What this expression does is inverts the sign for the Amount measure for the Expense accounts. Now it is easy to see, that if multiple accounts will be selected by the user - both AS2000 and AS2005 will raise an error (but for different reasons - AS2000 will complain that "Account Type" property is not defined, and AS2005 about set in the current coordinate). In this example, it is very easy to rewrite the definition of the calculated measure not only to solve the multiselect issue, but also to avoid expensive .Properties call as well as redundant Iif checks. The technique will be easy again - we simply need to get rid of .CurrentMember references:
CREATE MEMBER VAmount = Measures.Amount;
(Measures.VAmount, Account.[Account Type].Expense) = -Measures.Amount;

This rewrite is more efficient and clear. In fact, Currency Conversion Wizard uses exactly same approach to define different conversion rules for different account types. One word of warning though - This approach will work flawlessly if Account is not Parent-Child. If Account is Parent-Child, then usually it is also OK, as long as there are no accounts of type Expense which have children/descendants of type Income or Flow etc or vise versa. Since in real-life Account dimensions it is usually true - this technique is safe to apply to Parent-Child.

In conclusion - we saw few examples, but what is more important, we understood what happens under the cover both in client tools query generation and in AS engine. Of course there are more scenarios which weren't reviewed here, but given the knowledge how things work - it shouldn't be a challenge to apply this knowledge to them.

Published Friday, November 18, 2005 2:09 PM by mosha
Filed under:
Anonymous comments are disabled
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement