THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Microsoft OLAP by Mosha Pasumansky

Follow up on NextAnalytics challenge

My post “MDX answer to NextAnalytics challenge” triggered some very active discussion, and I would like to address some of the comments

Chris Webb said:
I've done a fair amount of teaching MDX and the truth is that people do find even the basics very difficult to understand, but my feeling is that the problem people have with MDX is not the language as such but thinking a) multidimensionally and b) in sets. The question is then, does Ward's new language solve either of these problems? Or has he reinvented the wheel?

I agree with the a) part, but not with the b) part. Thinking multidimensional is hard and take some time to get used to. However thinking in sets is not hard at all. SQL language is mostly based on relational algebra and uses sets extensively. Practitioners don’t find sets difficult.

Thomas Ivarsson said:
Can you publish the entire script at the end of this blog post? My conclusion is that we need better clients that can create these analytics sets at run time. End users will never do this and it is impossible to create all sets that the business users would like to have. It will fill up a cube in no time.

I am pessimistic that we will ever see such sophisticated clients. If they were in demand, we would’ve already had them. I am afraid the users will have to code such analytics either using MDX or NextAnalytics script or something else. End users don’t have any problem building such sets in Excel, so maybe Excel is our hope ?

Here is the entire script:

// Nextanalytics challenge

// calculate the rolling six month average 
CREATE Rolling6MonthAverage;
(Rolling6MonthAverage, [Date].[Month Name].[Month Name]) 
  = Avg([Date].[Month Name].Lag(5):[Date].[Month Name], [Measures].[Internet Sales Amount]);
  
// calcualte the growth of that 
CREATE GrowthOfRolling6MonthAverage = Rolling6MonthAverage - (Rolling6MonthAverage, [Date].[Month Name].PrevMember);

// calculate growth
CREATE GrowthOfSales = [Measures].[Internet Sales Amount] - ([Measures].[Internet Sales Amount], [Date].[Month Name].PrevMember);

// count each "growth" calc to the row average 
CREATE HowManyTimeGrowthExceededRollingAverage = 
  Count(
    Filter(([Date].[Calendar Year].&[2003],[Date].[Month Name].[Month Name]), GrowthOfSales > GrowthOfRolling6MonthAverage));

CREATE SET ProductsThatGrewFasterThanRollingAverageMore6OrMoreTimes AS
 Generate([Customer].[Customer Geography].[Country], 
    CrossJoin([Customer].[Customer Geography],
      Filter(
        [Product].[Product].[Product], 
        HowManyTimeGrowthExceededRollingAverage >= 6
      )
    )
  );

SELECT {} ON 0
, ProductsThatGrewFasterThanRollingAverageMore6OrMoreTimes ON 1
FROM [Adventure Works]

// create growth from start to end of period
CREATE GrowthFromStartToEnd =
  ([Measures].[Internet Sales Amount], ClosingPeriod([Date].[Calendar].[Month], [Date].[Calendar].[Calendar Year].&[2002]))
- ([Measures].[Internet Sales Amount], OpeningPeriod([Date].[Calendar].[Month], [Date].[Calendar].[Calendar Year].&[2002]));

CREATE SET ProductsThatGrewMoreThan25PctFromStartToEnd AS
  Generate([Customer].[Customer Geography].[Country], 
    CrossJoin([Customer].[Customer Geography],
      Filter([Product].[Product].[Product], 
        GrowthFromStartToEnd > 0.25*([Measures].[Internet Sales Amount], OpeningPeriod([Date].[Calendar].[Month], [Date].[Calendar].[Calendar Year].&[2002]))
      )
    )
  );

SELECT {} ON 0
, ProductsThatGrewMoreThan25PctFromStartToEnd ON 1
FROM [Adventure Works]

SELECT {} ON 0
, Intersect(
    ProductsThatGrewFasterThanRollingAverageMore6OrMoreTimes,
    ProductsThatGrewMoreThan25PctFromStartToEnd ) ON 1
FROM [Adventure Works]

 

Miky Schreiber said:
I agree with Thomas. The question is how difficult it is to the end user. While the regular end user will never write MDX or NextAnalytics script, the advanced user will. Now, the question is what will be more simple for him - MDX or NextAnalytics script.
 

Valid point. But I don’t think there is an absolute measure of “what is simpler” – practitioners will tend to use the tool that they are more familiar with. MDX is probably more popular than nextanalytics, but SQL is way more popular than either one of these. And Excel will easily rival SQL in familiarity. On the other hand, developers will want to use “real” programming language. And the question of “what is simpler” will raise there as well. While functional languages like F# are clearly more suited for such tasks, more people are familiar with conventional languages like C#.

Update: Ward published the entire nextanalytics script for this problem, so constrast and compare it yourself: http://www.nextanalytics.com/public/Examples/MDX-Challenge.html

And finally Ward wrote long comment, from which I will cite selectively

Ward Yaternick said: 

I also don’t agree with your implicit assumption that the cube and other infrastructure parts are insignificant.   To put a cube together from scratch and maintain it is a big decision and cost.

Therefore, I think you have conclusively proved that (1) if the cube exists and (2) someone can author MDX then MDX does deliver on complex analytics.

I do agree with that. I wasn’t trying to say that setting up a cube was trivial. However, if the cube was set up for some particular purpose, it is very easy to extend it with MDX calculations to get additional value out of it.

nextanalytics can help in these situations because the data can come directly from SQL or CSV files and the incremental IT overhead behind an analytic is very low.  You can feasibly write an analytic like what you did with MDX, run it against a CSV file or a simple SQL query, and then throw it away or keep it (all contained as lines in single text file) in a folder somewhere.

This certainly looks like a more agile way of doing things – quickly throw small script together, run it and move on. I think this approach will be attractive to lots of users – from small IT shops to independent users in bigger organizations. This is at odds, however, with classic methodology of building centralized data warehouse, one version of truth and all that. It doesn’t make it less viable, of course, it is just a tradeoff that someone will have to make – tradeoff between chaos and control, or, in Eric Raymond’s words, between bazaar and cathedral.

Another situation in which a cube isn’t ideal are when ISVs and consultants want to integrate analytics into their applications but they need to minimize the number of other vendor’s licenses and reduce the complexity and cost of the installed products.   This priority is the same when an IT budget is under constraint.

I think this is somewhat weak argument, because it implies that both complexity and pricing of NextAnalytics is below this of other products, like Microsoft Analysis Services. But this is definitely something that practitioners will need to evaluate and compare.

Sometimes, the analytic data processing needs to run on a unix box against something like MySQL or Oracle i.e. where Microsoft OLAP isn’t available 

Just wanted to note, that even though data could be in Unix box, in Oracle or MySQL, Microsoft OLAP doesn’t have problem connecting to it. Oracle is fully supported data source, and many people were successful reading data from MySQL too.

Perhaps they need an Open Source client  to make integration easier with their dashboard or portal.  For this reason, they can’t use any of the popular OLAP Clients 

I didn’t quite get this. There are plenty of Open Source clients which connect just fine to Microsoft Analysis Services (usually through XMLA, but not only). Since you mentioned Pentaho earlier, I wanted to note that Pentaho Analysis Views are based on JPivot, which works fine with Microsoft OLAP. There are many other open source clients for Microsoft OLAP as well.

 

All this lands us where nextanalytics services it’s niches.  By now I hope you can see our positioning. We’re not competing with MDX especially in the conditions you wrote about.

 

Yes, I think I see the positioning, and I believe that both your product and your approach have good potential, regardless of what MDX can and cannot do. And I actually think that it does compete, not just with MDX, but with OLAP and with Data Warehousing in general. And this is a good thing. Competition is healthy, it will force everybody to improve their products.

Update: Nextanalytics replied at http://www.nextanalytics.com/public/Examples/MDX-Challenge.html

Published Tuesday, June 10, 2008 11:28 PM by mosha
Anonymous comments are disabled
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement