THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Microsoft OLAP by Mosha Pasumansky

How to detect subselect inside MDX calculations (aka Multiselect in Excel 2007)

The subject of multiselect friendly calculations is a popular one in forums. I have written about it before here, here and here. In February 2008 I will write a blog which will give a definite answer on how AS2008 will deal with it (which is follow up on this post). But in the meantime, questions about AS2005 keep coming. By now, most people realize that it is possible to detect presence of set in WHERE clause by using EXISTING operator, which takes care of multiselect queries generated by Excel 2003 and other tools. However, Excel 2007 for multiselect uses subselects which are stealthier, their presence cannot be detected neither by EXISTING nor by any other MDX function. The standard answer for detecting subselects always was to use query scoped named sets, because they are subject of implicit autoexist with subselects. However, this approach isn't applicable to Excel 2007, since it is up to Excel 2007 to generate MDX queries, and the user can only control the expression for the calculation inside MDX Script.

So the question that gets asked is "Is it possible in AS2005 to detect and obtain subselect restriction from the MDX calculation?". Up until now the answer to this question was "No". I have thought about it, and I found a way how to make the answer to this question "Yes". I must warn, that the solution is neither elegant nor efficient, and probably is mostly of the theoretical interest. I wouldn't recommend using it in the real implementation. Still, it is interesting one in my opinion.

Let's start with the assumption, that the lowest attribute on which multiselect is possible has only few members. More precisely it should have 63 or less members. We will see how we can lift this limitation later, but for now let's go with it. For our example, we will choose [Ship Date].[Calendar Year] attribute, which only has 4 members. For the calculation itself, let's pick example of calculating the maximum yearly sales. The classic formula for this is below:

Max(Descendants([Ship Date].[Calendar].CurrentMember, [Ship Date].[Calendar].[Calendar Year]), [Measures].[Internet Sales Amount])

Indeed, if we run the following query

WITH 
 MEMBER Measures.[Max Yearly Sales] AS Max(Descendants([Ship Date].[Calendar].CurrentMember, [Ship Date].[Calendar].[Calendar Year]), [Measures].[Internet Sales Amount])
SELECT
{Measures.[Max Yearly Sales]} ON 0
FROM (
 SELECT  { [Ship Date].[Calendar].[Calendar Year].&[2001], [Ship Date].[Calendar].[Calendar Year].&[2003] } ON 0
 FROM [Adventure Works])

When we run this query, the result is $10,158,562.38. Quick check query:

SELECT {[Measures].[Internet Sales Amount]} ON 0
, [Ship Date].[Calendar].[Calendar Year] ON 1
FROM [Adventure Works]

it shows that the number we got was actually for year 2004. But it wasn't even included by our subselect restriction which covered 2001 and 2003 only. So how does the solution works. First, we create another small table (or named query in DSV) which looks like following:

Year Encoding
2001 1
2002 2
2003 4
2004 8

In general case we want to list every member in the lowest attribute by which the multiselect is expected, and assign values which are powers of 2. The idea here is that each member then gets assigned a bit in the 64-bit representation of the integer. Adding Encoding values for different members will be equivalent for OR'ing their representative bits.

Next we build a small measure group off this table, by linking Year column to the [Calendar Year] attribute in the [Ship Date] dimension, and Encoding is used as a measure with Aggregation Funciton Sum. But due to a special nature of Encoding values, the Aggregation Function here really behaves as OR.

This is the root of the idea. When there is a subselect restriction, all physical measures get automatic Visual Totals applied to them if the coordinate [Calendar Year] hasn't been overwritten by formula. Applying Visual Totals to the Encoding measure will mean that it will be equal to a number, which has bits set to 1 only for those members which participated in subselect ! Now all we need to do is to decode back from the bitmask to the MDX set. It may seem that the easiest way to do it is to run Filter over [Calendar Year] and do filtering by AND'ing bitmasks. However, this won't work, since using Filter over [Calendar Year] will cause [Calendar Year] attribute to be overwritten, and Visual Totals won't get applied - the trick won't work. Therefore, we need to do something a little more sophisticated - write a special purpose stored procedure which will behave exactly like Filter, only without coordinate overwrite. Such stored procedure is simple to write, below is the full source code for it:

        public Set DecodeSet(Set set, System.UInt64 Encoding)
        {
            SetBuilder sb = new SetBuilder();
            System.UInt64 bit = 1;
            foreach (Tuple t in set.Tuples)
            {
                if ((bit & Encoding) != 0)
                    sb.Add(t);

                bit *= 2;
            }

            return sb.ToSet();
        }

Now, the expression for the calculated member will look like

Max(ASSP.ASStoredProcs.Util.DecodeSet([Ship Date].[Calendar].[Calendar Year], [Measures].[Encoding]), [Measures].[Internet Sales Amount])

And we can verify that it works correctly by running the following query:

WITH 
 MEMBER Measures.[Max Yearly Sales] AS Max(ASSP.ASStoredProcs.Util.DecodeSet([Ship Date].[Calendar].[Calendar Year], [Measures].[Encoding]), [Measures].[Internet Sales Amount])
SELECT
{Measures.[Max Yearly Sales]} ON 0
FROM (
 SELECT  { [Ship Date].[Calendar].[Calendar Year].&[2001], [Ship Date].[Calendar].[Calendar Year].&[2003] } ON 0
 FROM [Adventure Works])

This is nice, now back to the case when we have more than 63 members for the attribute. For example, we want to go over days, not over years, and in Adventure Works we have 4 years worth of data, which translates to 1158 days. 1158 > 63. Well, the only solution in this case is to break the attribute into groups of 63 members and assign dedicated measure to each group. In our case this makes 19 new measures. It is not pretty, but at least it can be automated. The sproc will have to change to accept set as a second parameter, where the formula will pass MeasureGroupMeasures("ShipDateMultiselect"), and sproc will have to make nested loops in order to move to the right member for the right measure. The exact code is left as exercise to the reader, if the reader is still interested.

Published Wednesday, September 26, 2007 8:38 PM by mosha
Filed under:
Anonymous comments are disabled
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement