THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Microsoft OLAP by Mosha Pasumansky

MDX answer to NextAnalytics challenge

NextAnalytics is a young BI company founded by Ward Yaternick of OLAP@Work fame. He has a blog dedicated to his new product, and while I was reading it, I ran across the post named “Can a business intelligence product be used to answer analytic questions?”. The main premises of this post was that real-life non-trivial calculations are very difficult with traditional BI tools using SQL or MDX, but can be done in much simpler fashion using NextAnalytics product. Ward then gives very concrete example:

That’s the crux of the problem:  The nature of analytics often means that the need evolves.  It’s either because the analytics identified the problem or the analytic led to other questions.  By their nature, analytics are iterative and sequential. For example:

  • I want to see the fastest growing products but it's more complicated than just telling me one measurement.

  • for each region, I want to see which products were growing faster than the rolling six month average, at least six times out of the last N periods.

  • next, again for each region, of those products that were just discovered, which ones grew by more than twenty five percent from the start of the period to the end of the period.

  • I want to know the intersection of those two sets of products across all regions. 

  • By the way, I won’t need these questions answered until next year at this time, although I might change them a bit.

Certainly this looks like something that can be done with basic MDX. Ward, however, stated that

That’s an example of one analytic combined with another, iteratively and sequentially. Neither of which would be possible in MDX or SQL without the creation of a complex underlying data structure.  Notice that this isn’t simply an OLAP  “drill down” or a “slice and dice”. These are distinct questions, with the answers needing to be merged and intersected.  This kind of query would take only a few hours in nextanalytics, or a few months in a BI tool.

I left a comment to the blog saying that I disagreed with Ward’s statement, and unless by “complex underlying data structure” he meant regular OLAP cube, I thought this example can be easily done in MDX. Ward took his time to reply to my comment, going into detail how this would be done with nextanalytics:

Allow me to describe the processing a bit better in a pseudocode style:
// the column axis has time periods, the row axis has regions and products.
for each region
{
// part 1
calculate the rolling six month average
calcualte the growth of that
count each "growth" calc to the row average
keep only rows above average six or more times
// part 2
remove all time periods except the first and last
calculate growth
remove any below 25 %
// part 3
intersect part 1 and part 2
}

And then he threw a gauntlet, challenging me to do the same in MDX as easily as the pseudocode above.


Note that in part 1, the output of the rolling average becomes the input to the growth calculations.
Now that I've been more clear, do you still think this is achievable in MDX? Is it fair of me to qualify your claim by saying "In a single MDX expression?" "someone with a normal amount of MDX education" and "not having to create intermediate data structures in a server or disk"? Because, if it takes any of that, then my blog's point still stands IMO.
To do all that with nextanalytics is about just as many lines as what I just typed above. No external tables, no external cubes. And, the nextanalytics "code" is still readable a year later and easily modified.

Well, I decided to pick up this challenge, and write complete implementation of this problem in MDX using Adventure Works sample database. I also decided to time myself to see how much time it will actually take – to have real answer to the claim that it would take “few months in a BI tool”. This weekend it was mix of rain and clouds in Seattle, so my family didn’t go backpacking, and I thought that I would use 2 hour window between teaching my son to bike on two wheels and going to watch “Chronicles of Narnia: Prince Caspian”, to implement this. This blog presents the results of this experiment.

I used MDX Studio to develop all of the code. One nice thing about it, is that it allows mixing MDX Script type of statement, which I used to define calculated measures with SELECT statements. In order to do it, I just specified “Cube=Adventure Works” in the connection string, and everything worked fine. (Of course, I also could’ve used more traditional syntax of CREATE MEMBER statement, but I am too used to MDX Scripts now).

I went line by line over the Ward’s pseudocode writing MDX for it. I didn’t try to write the super-optimized MDX, and I didn’t do any smart tricks – I just wanted to write natural, most straightforward MDX, that average MDX developer would’ve done in my place.

calculate the rolling six month average

There are several techniques how to approach it, and I wrote both chapter in my “Fast Track to MDX” book and blog entry about it, but let’s take the simplest approach:

// calculate the rolling six month average 
CREATE Rolling6MonthAverage;
(Rolling6MonthAverage, [Date].[Month Name].[Month Name]) 
  = Avg([Date].[Month Name].Lag(5):[Date].[Month Name], [Measures].[Internet Sales Amount]);

This leaves the calculated measure to be NULL at the attributes above (or unrelated to) Month, and computes rolling 6 month average for Month and all attributes to which Month is related (i.e. Day etc).

calcualte the growth of that

That’s the part where Ward made a comment of “Note that in part 1, the output of the rolling average becomes the input to the growth calculations”. I just wanted to note, that this is absolutely routine in MDX

// calcualte the growth of that 
CREATE GrowthOfRolling6MonthAverage = Rolling6MonthAverage - (Rolling6MonthAverage, [Date].[Month Name].PrevMember);

count each "growth" calc to the row average

The way I interpret this is we need to compare sales growth to the growth of rolling average and then count how many times sales growth exceeded growth of rolling average. For simplicity of reading, I broke this into two calculated measures, one computing sales growth, and another one doing counting. Of course, it could’ve been done in one calculated measure too, but this way it is a little more natural to read.

// calculate growth
CREATE GrowthOfSales = [Measures].[Internet Sales Amount] - ([Measures].[Internet Sales Amount], [Date].[Month Name].PrevMember);

In the original posting, Ward said “for each region, I want to see which products were growing faster than the rolling six month average, at least six times out of the last N periods.” So I really had choose what N periods would be in my example – and I choose that we are interested in the months of 2003.

// count each "growth" calc to the row average 
CREATE HowManyTimeGrowthExceededRollingAverage = 
  Count(
    Filter(([Date].[Calendar Year].&[2003],[Date].[Month Name].[Month Name]), GrowthOfSales > GrowthOfRolling6MonthAverage));

keep only rows above average six or more times

This is simple filter, but since we need to find products for every region, we also need to run Generate over regions. For Adventure Works, I choose Country attribute to represent regions:

// keep only rows above average six or more times 
CREATE SET ProductsThatGrewFasterThanRollingAverageMore6OrMoreTimes AS
 Generate([Customer].[Customer Geography].[Country], 
    CrossJoin([Customer].[Customer Geography],
      Filter(
        [Product].[Product].[Product], 
        HowManyTimeGrowthExceededRollingAverage >= 6
      )
    )
  );

My understanding of nextanalytics scripting language is that it computes static sets in certain context, and this is why I used CREATE SET statement in MDX. However, MDX is obviously capable of computing such a set completely dynamically, depending on the current context, slices, subcubes etc.

remove all time periods except the first and last
calculate growth

Aha, here we have an opportunity to replace two lines of nextanalytics script with just one line of MDX. We don’t need to remove anything to calculate growth from the start of period to its end. Assuming again that the period is months in 2003, we would just use OpeningPeriod and ClosingPeriod MDX functions.

// create growth from start to end of period
CREATE GrowthFromStartToEnd =
  ([Measures].[Internet Sales Amount], ClosingPeriod([Date].[Calendar].[Month], [Date].[Calendar].[Calendar Year].&[2002]))
- ([Measures].[Internet Sales Amount], OpeningPeriod([Date].[Calendar].[Month], [Date].[Calendar].[Calendar Year].&[2002]));

remove any below 25 %

One way to do it is to create calculated measure which computes relative growth as percentage, but then we would need to deal with division by zero handling, so it is simpler to code the 25% rule inside the Filter itself.

CREATE SET ProductsThatGrewMoreThan25PctFromStartToEnd AS
  Generate([Customer].[Customer Geography].[Country], 
    CrossJoin([Customer].[Customer Geography],
      Filter([Product].[Product].[Product], 
        GrowthFromStartToEnd > 
           0.25*([Measures].[Internet Sales Amount], 
             OpeningPeriod([Date].[Calendar].[Month], [Date].[Calendar].[Calendar Year].&[2002]))
      )
    )
  );

intersect part 1 and part 2

Just to prove the point, let’s use Intersect function. However, better approach would’ve been to use the Generate(Filter()), and combine the conditions inside Filter using AND. The following query shows final set of products by regions:

SELECT {} ON 0
, Intersect(
    ProductsThatGrewFasterThanRollingAverageMore6OrMoreTimes,
    ProductsThatGrewMoreThan25PctFromStartToEnd ) ON 1
FROM [Adventure Works]

Hopefully this exercise will convince Ward that MDX is pretty powerful, yet simple language which allows for complex analysis. But how long did it take to implement all this logic. I measured myself, and it only took 21 minutes of my time to write all of the MDX statements. However, this also included about 4 minutes to arbitrage dispute between my kids, so net spent time was 17 minutes. Factoring that I am probably faster at writing MDX than average, I estimate that it would take around 1 hour to somebody with “normal amount of MDX education”. Much better than “few months in BI tool” and also somewhat better than “few hours in nextanalytics”.


Published Sunday, June 08, 2008 6:05 PM by mosha
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

SSAS-Info.com said:

Pingback - link to this post was added to our website in the [SSAS Articles]/[MDX] section: http://www.ssas-info.com/analysis-services-articles/50-mdx/938-mdx-answer-to-nextanalytics-challenge

June 8, 2008 10:35 PM
 

Chris Webb said:

Well said, Mosha.

I've done a fair amount of teaching MDX and the truth is that people do find even the basics very difficult to understand, but my feeling is that the problem people have with MDX is not the language as such but thinking a) multidimensionally and b) in sets. The question is then, does Ward's new language solve either of these problems? Or has he reinvented the wheel?

June 9, 2008 6:25 AM
 

Thomas Ivarsson said:

Hi,

Can you publish the entire script at the end of this blog post? My conclusion is that we need better clients that can crete these analytics sets at run time. End users will never do this and it is imposible to create all sets that the business users would like to have. It will fill up a cube in no time.

June 9, 2008 2:50 PM
 

Miky Schreiber said:

Hi,

I agree with Thomas. The question is how difficult it is to the end user. While the regular end user will never write MDX or NextAnalytics script, the advanced user will. Now, the question is what will be more simple for him - MDX or NextAnalytics script.

June 9, 2008 11:55 PM
 

Ward Yaternick said:

Great job, and very educational too!    Two points should be made in reply:

I don’t think you are acknowledging your level of genius when you suggest that other people could do this in an hour.  Maybe a day, maybe two. That just my opinion. Any comments from out there in the blogosphere?

I also don’t agree with your implicit assumption that the cube and other infrastructure parts are insignificant.   To put a cube together from scratch and maintain it is a big decision and cost. If these weren’t tangible, then Jaspersoft and Pentaho wouldn’t have a leg to stand on (As Open Source companies, all their revenue is labour, isn’t it?).

Therefore, I think you have conclusively proved that (1) if the cube exists and (2) someone can author MDX then MDX does deliver on complex analytics.

The distinction is important because at least some business analytics don’t have the expected ongoing payback to cost-justify the creation and maintenance of a cube.    This is because certain users want a question answered then to ask a completely different question. If you were the BI consultant and the IT shoppe who supported this kind of analysis, and if you had had to make or modify a cube each time, then it would probably not be cost-effective.  It wouldn’t be long before you ended up with quite a few cubes  (or expressions within a cube) and you’d never know which ones weren’t used anymore or which MDX worked with which cube.  In other words, your analytical development environment might end up being chaotic.

nextanalytics can help in these situations because the data can come directly from SQL or CSV files and the incremental IT overhead behind an analytic is very low.  You can feasibly write an analytic like what you did with MDX, run it against a CSV file or a simple SQL query, and then throw it away or keep it (all contained as lines in single text file) in a folder somewhere.

The other situation arises when analytics must come from cube technologies which haven’t implemented MDX the same as Microsoft OLAP.  e.g. SAP BW and maybe even some others. See the note from this blog:  http://www.panorama.com/blog/?p=44 These other servers can have access to this kind of sophisticated analytics, even if their server doesn’t support the kinds of operations you relied on in your MDX.

Another situation in which a cube isn’t ideal are when ISVs and consultants want to integrate analytics into their applications but they need to minimize the number of other vendor’s licenses and reduce the complexity and cost of the installed products.   This priority is the same when an IT budget is under constraint.  

Sometimes, the analytic data processing needs to run on a unix box against something like MySQL or Oracle i.e. where Microsoft OLAP isn’t available.  

Perhaps they need an Open Source client  to make integration easier with their dashboard or portal.  For this reason, they can’t use any of the popular OLAP Clients.

All this lands us where nextanalytics services it’s niches.  By now I hope you can see our positioning. We’re not competing with MDX especially in the conditions you wrote about.

To conclude and re-iterate: If you have a cube, and you can write MDX, then you probably don’t need nextanalytics.

If those other things I described are interesting to you, then perhaps nextanalytics has a place in your tookit.

June 10, 2008 8:45 AM
 

Microsoft OLAP by Mosha Pasumansky said:

My post “ MDX answer to NextAnalytics challenge ” triggered some very active discussion, and I would

June 11, 2008 2:27 AM
 

Ward Yaternick said:

Hi Folks,

Sorry for the delay in replying.  We're a small company and that means we typically have more than a few balls in the air.

What I am responding to are the requests (in this blog and my own blog) for us to show the nextanalytics version of the script.  I've posted it at:

http://www.nextanalytics.com/public/Examples/MDX-Challenge.html

Thanks for all the insights Mosha.  I have some other examples which I think are quite cool. Would you (or anyone) be interested in demo'ing the MDX for those? If so, let me know, I'll put them in my blog, one at a time.

Best regards,

Ward

June 11, 2008 12:15 PM

Leave a Comment

(required) 
(optional)
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement