THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Microsoft OLAP by Mosha Pasumansky

MDX answer to NextAnalytics challenge

NextAnalytics is a young BI company founded by Ward Yaternick of OLAP@Work fame. He has a blog dedicated to his new product, and while I was reading it, I ran across the post named “Can a business intelligence product be used to answer analytic questions?”. The main premises of this post was that real-life non-trivial calculations are very difficult with traditional BI tools using SQL or MDX, but can be done in much simpler fashion using NextAnalytics product. Ward then gives very concrete example:

That’s the crux of the problem:  The nature of analytics often means that the need evolves.  It’s either because the analytics identified the problem or the analytic led to other questions.  By their nature, analytics are iterative and sequential. For example:

  • I want to see the fastest growing products but it's more complicated than just telling me one measurement.

  • for each region, I want to see which products were growing faster than the rolling six month average, at least six times out of the last N periods.

  • next, again for each region, of those products that were just discovered, which ones grew by more than twenty five percent from the start of the period to the end of the period.

  • I want to know the intersection of those two sets of products across all regions. 

  • By the way, I won’t need these questions answered until next year at this time, although I might change them a bit.

Certainly this looks like something that can be done with basic MDX. Ward, however, stated that

That’s an example of one analytic combined with another, iteratively and sequentially. Neither of which would be possible in MDX or SQL without the creation of a complex underlying data structure.  Notice that this isn’t simply an OLAP  “drill down” or a “slice and dice”. These are distinct questions, with the answers needing to be merged and intersected.  This kind of query would take only a few hours in nextanalytics, or a few months in a BI tool.

I left a comment to the blog saying that I disagreed with Ward’s statement, and unless by “complex underlying data structure” he meant regular OLAP cube, I thought this example can be easily done in MDX. Ward took his time to reply to my comment, going into detail how this would be done with nextanalytics:

Allow me to describe the processing a bit better in a pseudocode style:
// the column axis has time periods, the row axis has regions and products.
for each region
// part 1
calculate the rolling six month average
calcualte the growth of that
count each "growth" calc to the row average
keep only rows above average six or more times
// part 2
remove all time periods except the first and last
calculate growth
remove any below 25 %
// part 3
intersect part 1 and part 2

And then he threw a gauntlet, challenging me to do the same in MDX as easily as the pseudocode above.

Note that in part 1, the output of the rolling average becomes the input to the growth calculations.
Now that I've been more clear, do you still think this is achievable in MDX? Is it fair of me to qualify your claim by saying "In a single MDX expression?" "someone with a normal amount of MDX education" and "not having to create intermediate data structures in a server or disk"? Because, if it takes any of that, then my blog's point still stands IMO.
To do all that with nextanalytics is about just as many lines as what I just typed above. No external tables, no external cubes. And, the nextanalytics "code" is still readable a year later and easily modified.

Well, I decided to pick up this challenge, and write complete implementation of this problem in MDX using Adventure Works sample database. I also decided to time myself to see how much time it will actually take – to have real answer to the claim that it would take “few months in a BI tool”. This weekend it was mix of rain and clouds in Seattle, so my family didn’t go backpacking, and I thought that I would use 2 hour window between teaching my son to bike on two wheels and going to watch “Chronicles of Narnia: Prince Caspian”, to implement this. This blog presents the results of this experiment.

I used MDX Studio to develop all of the code. One nice thing about it, is that it allows mixing MDX Script type of statement, which I used to define calculated measures with SELECT statements. In order to do it, I just specified “Cube=Adventure Works” in the connection string, and everything worked fine. (Of course, I also could’ve used more traditional syntax of CREATE MEMBER statement, but I am too used to MDX Scripts now).

I went line by line over the Ward’s pseudocode writing MDX for it. I didn’t try to write the super-optimized MDX, and I didn’t do any smart tricks – I just wanted to write natural, most straightforward MDX, that average MDX developer would’ve done in my place.

calculate the rolling six month average

There are several techniques how to approach it, and I wrote both chapter in my “Fast Track to MDX” book and blog entry about it, but let’s take the simplest approach:

// calculate the rolling six month average 
CREATE Rolling6MonthAverage;
(Rolling6MonthAverage, [Date].[Month Name].[Month Name]) 
  = Avg([Date].[Month Name].Lag(5):[Date].[Month Name], [Measures].[Internet Sales Amount]);

This leaves the calculated measure to be NULL at the attributes above (or unrelated to) Month, and computes rolling 6 month average for Month and all attributes to which Month is related (i.e. Day etc).

calcualte the growth of that

That’s the part where Ward made a comment of “Note that in part 1, the output of the rolling average becomes the input to the growth calculations”. I just wanted to note, that this is absolutely routine in MDX

// calcualte the growth of that 
CREATE GrowthOfRolling6MonthAverage = Rolling6MonthAverage - (Rolling6MonthAverage, [Date].[Month Name].PrevMember);

count each "growth" calc to the row average

The way I interpret this is we need to compare sales growth to the growth of rolling average and then count how many times sales growth exceeded growth of rolling average. For simplicity of reading, I broke this into two calculated measures, one computing sales growth, and another one doing counting. Of course, it could’ve been done in one calculated measure too, but this way it is a little more natural to read.

// calculate growth
CREATE GrowthOfSales = [Measures].[Internet Sales Amount] - ([Measures].[Internet Sales Amount], [Date].[Month Name].PrevMember);

In the original posting, Ward said “for each region, I want to see which products were growing faster than the rolling six month average, at least six times out of the last N periods.” So I really had choose what N periods would be in my example – and I choose that we are interested in the months of 2003.

// count each "growth" calc to the row average 
CREATE HowManyTimeGrowthExceededRollingAverage = 
    Filter(([Date].[Calendar Year].&[2003],[Date].[Month Name].[Month Name]), GrowthOfSales > GrowthOfRolling6MonthAverage));

keep only rows above average six or more times

This is simple filter, but since we need to find products for every region, we also need to run Generate over regions. For Adventure Works, I choose Country attribute to represent regions:

// keep only rows above average six or more times 
CREATE SET ProductsThatGrewFasterThanRollingAverageMore6OrMoreTimes AS
 Generate([Customer].[Customer Geography].[Country], 
    CrossJoin([Customer].[Customer Geography],
        HowManyTimeGrowthExceededRollingAverage >= 6

My understanding of nextanalytics scripting language is that it computes static sets in certain context, and this is why I used CREATE SET statement in MDX. However, MDX is obviously capable of computing such a set completely dynamically, depending on the current context, slices, subcubes etc.

remove all time periods except the first and last
calculate growth

Aha, here we have an opportunity to replace two lines of nextanalytics script with just one line of MDX. We don’t need to remove anything to calculate growth from the start of period to its end. Assuming again that the period is months in 2003, we would just use OpeningPeriod and ClosingPeriod MDX functions.

// create growth from start to end of period
CREATE GrowthFromStartToEnd =
  ([Measures].[Internet Sales Amount], ClosingPeriod([Date].[Calendar].[Month], [Date].[Calendar].[Calendar Year].&[2002]))
- ([Measures].[Internet Sales Amount], OpeningPeriod([Date].[Calendar].[Month], [Date].[Calendar].[Calendar Year].&[2002]));

remove any below 25 %

One way to do it is to create calculated measure which computes relative growth as percentage, but then we would need to deal with division by zero handling, so it is simpler to code the 25% rule inside the Filter itself.

CREATE SET ProductsThatGrewMoreThan25PctFromStartToEnd AS
  Generate([Customer].[Customer Geography].[Country], 
    CrossJoin([Customer].[Customer Geography],
        GrowthFromStartToEnd > 
           0.25*([Measures].[Internet Sales Amount], 
             OpeningPeriod([Date].[Calendar].[Month], [Date].[Calendar].[Calendar Year].&[2002]))

intersect part 1 and part 2

Just to prove the point, let’s use Intersect function. However, better approach would’ve been to use the Generate(Filter()), and combine the conditions inside Filter using AND. The following query shows final set of products by regions:

, Intersect(
    ProductsThatGrewMoreThan25PctFromStartToEnd ) ON 1
FROM [Adventure Works]

Hopefully this exercise will convince Ward that MDX is pretty powerful, yet simple language which allows for complex analysis. But how long did it take to implement all this logic. I measured myself, and it only took 21 minutes of my time to write all of the MDX statements. However, this also included about 4 minutes to arbitrage dispute between my kids, so net spent time was 17 minutes. Factoring that I am probably faster at writing MDX than average, I estimate that it would take around 1 hour to somebody with “normal amount of MDX education”. Much better than “few months in BI tool” and also somewhat better than “few hours in nextanalytics”.

Published Sunday, June 08, 2008 6:05 PM by mosha
Filed under:
Anonymous comments are disabled
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement