THE SQL Server Blog Spot on the Web
Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Microsoft OLAP by Mosha Pasumansky

Follow up on NextAnalytics challenge

My post “MDX answer to NextAnalytics challenge” triggered some very active discussion, and I would like to address some of the comments

Chris Webb said:
I've done a fair amount of teaching MDX and the truth is that people do find even the basics very difficult to understand, but my feeling is that the problem people have with MDX is not the language as such but thinking a) multidimensionally and b) in sets. The question is then, does Ward's new language solve either of these problems? Or has he reinvented the wheel?

I agree with the a) part, but not with the b) part. Thinking multidimensional is hard and take some time to get used to. However thinking in sets is not hard at all. SQL language is mostly based on relational algebra and uses sets extensively. Practitioners don’t find sets difficult.

Thomas Ivarsson said:
Can you publish the entire script at the end of this blog post? My conclusion is that we need better clients that can create these analytics sets at run time. End users will never do this and it is impossible to create all sets that the business users would like to have. It will fill up a cube in no time.

I am pessimistic that we will ever see such sophisticated clients. If they were in demand, we would’ve already had them. I am afraid the users will have to code such analytics either using MDX or NextAnalytics script or something else. End users don’t have any problem building such sets in Excel, so maybe Excel is our hope ?

Here is the entire script:

// Nextanalytics challenge

// calculate the rolling six month average 
CREATE Rolling6MonthAverage;
(Rolling6MonthAverage, [Date].[Month Name].[Month Name]) 
  = Avg([Date].[Month Name].Lag(5):[Date].[Month Name], [Measures].[Internet Sales Amount]);
  
// calcualte the growth of that 
CREATE GrowthOfRolling6MonthAverage = Rolling6MonthAverage - (Rolling6MonthAverage, [Date].[Month Name].PrevMember);

// calculate growth
CREATE GrowthOfSales = [Measures].[Internet Sales Amount] - ([Measures].[Internet Sales Amount], [Date].[Month Name].PrevMember);

// count each "growth" calc to the row average 
CREATE HowManyTimeGrowthExceededRollingAverage = 
  Count(
    Filter(([Date].[Calendar Year].&[2003],[Date].[Month Name].[Month Name]), GrowthOfSales > GrowthOfRolling6MonthAverage));

CREATE SET ProductsThatGrewFasterThanRollingAverageMore6OrMoreTimes AS
 Generate([Customer].[Customer Geography].[Country], 
    CrossJoin([Customer].[Customer Geography],
      Filter(
        [Product].[Product].[Product], 
        HowManyTimeGrowthExceededRollingAverage >= 6
      )
    )
  );

SELECT {} ON 0
, ProductsThatGrewFasterThanRollingAverageMore6OrMoreTimes ON 1
FROM [Adventure Works]

// create growth from start to end of period
CREATE GrowthFromStartToEnd =
  ([Measures].[Internet Sales Amount], ClosingPeriod([Date].[Calendar].[Month], [Date].[Calendar].[Calendar Year].&[2002]))
- ([Measures].[Internet Sales Amount], OpeningPeriod([Date].[Calendar].[Month], [Date].[Calendar].[Calendar Year].&[2002]));

CREATE SET ProductsThatGrewMoreThan25PctFromStartToEnd AS
  Generate([Customer].[Customer Geography].[Country], 
    CrossJoin([Customer].[Customer Geography],
      Filter([Product].[Product].[Product], 
        GrowthFromStartToEnd > 0.25*([Measures].[Internet Sales Amount], OpeningPeriod([Date].[Calendar].[Month], [Date].[Calendar].[Calendar Year].&[2002]))
      )
    )
  );

SELECT {} ON 0
, ProductsThatGrewMoreThan25PctFromStartToEnd ON 1
FROM [Adventure Works]

SELECT {} ON 0
, Intersect(
    ProductsThatGrewFasterThanRollingAverageMore6OrMoreTimes,
    ProductsThatGrewMoreThan25PctFromStartToEnd ) ON 1
FROM [Adventure Works]

 

Miky Schreiber said:
I agree with Thomas. The question is how difficult it is to the end user. While the regular end user will never write MDX or NextAnalytics script, the advanced user will. Now, the question is what will be more simple for him - MDX or NextAnalytics script.
 

Valid point. But I don’t think there is an absolute measure of “what is simpler” – practitioners will tend to use the tool that they are more familiar with. MDX is probably more popular than nextanalytics, but SQL is way more popular than either one of these. And Excel will easily rival SQL in familiarity. On the other hand, developers will want to use “real” programming language. And the question of “what is simpler” will raise there as well. While functional languages like F# are clearly more suited for such tasks, more people are familiar with conventional languages like C#.

Update: Ward published the entire nextanalytics script for this problem, so constrast and compare it yourself: http://www.nextanalytics.com/public/Examples/MDX-Challenge.html

And finally Ward wrote long comment, from which I will cite selectively

Ward Yaternick said: 

I also don’t agree with your implicit assumption that the cube and other infrastructure parts are insignificant.   To put a cube together from scratch and maintain it is a big decision and cost.

Therefore, I think you have conclusively proved that (1) if the cube exists and (2) someone can author MDX then MDX does deliver on complex analytics.

I do agree with that. I wasn’t trying to say that setting up a cube was trivial. However, if the cube was set up for some particular purpose, it is very easy to extend it with MDX calculations to get additional value out of it.

nextanalytics can help in these situations because the data can come directly from SQL or CSV files and the incremental IT overhead behind an analytic is very low.  You can feasibly write an analytic like what you did with MDX, run it against a CSV file or a simple SQL query, and then throw it away or keep it (all contained as lines in single text file) in a folder somewhere.

This certainly looks like a more agile way of doing things – quickly throw small script together, run it and move on. I think this approach will be attractive to lots of users – from small IT shops to independent users in bigger organizations. This is at odds, however, with classic methodology of building centralized data warehouse, one version of truth and all that. It doesn’t make it less viable, of course, it is just a tradeoff that someone will have to make – tradeoff between chaos and control, or, in Eric Raymond’s words, between bazaar and cathedral.

Another situation in which a cube isn’t ideal are when ISVs and consultants want to integrate analytics into their applications but they need to minimize the number of other vendor’s licenses and reduce the complexity and cost of the installed products.   This priority is the same when an IT budget is under constraint.

I think this is somewhat weak argument, because it implies that both complexity and pricing of NextAnalytics is below this of other products, like Microsoft Analysis Services. But this is definitely something that practitioners will need to evaluate and compare.

Sometimes, the analytic data processing needs to run on a unix box against something like MySQL or Oracle i.e. where Microsoft OLAP isn’t available 

Just wanted to note, that even though data could be in Unix box, in Oracle or MySQL, Microsoft OLAP doesn’t have problem connecting to it. Oracle is fully supported data source, and many people were successful reading data from MySQL too.

Perhaps they need an Open Source client  to make integration easier with their dashboard or portal.  For this reason, they can’t use any of the popular OLAP Clients 

I didn’t quite get this. There are plenty of Open Source clients which connect just fine to Microsoft Analysis Services (usually through XMLA, but not only). Since you mentioned Pentaho earlier, I wanted to note that Pentaho Analysis Views are based on JPivot, which works fine with Microsoft OLAP. There are many other open source clients for Microsoft OLAP as well.

 

All this lands us where nextanalytics services it’s niches.  By now I hope you can see our positioning. We’re not competing with MDX especially in the conditions you wrote about.

 

Yes, I think I see the positioning, and I believe that both your product and your approach have good potential, regardless of what MDX can and cannot do. And I actually think that it does compete, not just with MDX, but with OLAP and with Data Warehousing in general. And this is a good thing. Competition is healthy, it will force everybody to improve their products.

Update: Nextanalytics replied at http://www.nextanalytics.com/public/Examples/MDX-Challenge.html


Published Tuesday, June 10, 2008 11:28 PM by mosha

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Chris Webb said:

You and I might find thinking in sets easy, but I still maintain that a lot of people do find thinking in sets hard. Why else, in the SQL world, is such a lot of fuss made over 'set-based SQL'? It's because so many people coming from a procedural programming background want to write procedural SQL; the same people coming to MDX would have a bigger problem because it's just not possible to write procedural MDX.

To back up my case, take a look at the blurb from Joe Celko's latest SQL book "Thinking in Sets":

http://www.amazon.co.uk/Joe-Celkos-Thinking-Sets-Management/dp/0123741378/ref=sr_1_2?ie=UTF8&s=books&qid=1213173897&sr=8-2

I quote: "Perfectly intelligent programmers often struggle when forced to work with SQL. Why? Joe Celko believes the problem lies with their procedural programming mindset, which keeps them from taking full advantage of the power of declarative languages."

June 11, 2008 4:50 AM
 

Thomas Ivarsson said:

Maybe a sofisticated client will be to expensive because of a limited market. ProClarity has a add in, the Selector, that can create named sets for end users. It is also probably hard to build good enough user interfaces for complex clients.

If we will see something on the market it will probably be an add in to Excel or part of the next release of Performance Point.

Adding a lot of complicated sets to a cube can create a maintenance nightmare.

June 11, 2008 11:40 AM
 

Kory Skistad said:

Reading this post about NextAnalytics and the original questions posed makes me wonder how this differs, other than scale, than a spreadsheet itself?  Instead of sets, you have Ranges.  Formulas reference other cells (or Ranges), which themselves can be formulas.  Ranges can have names similar to Named Sets, and scripts can be written to automate the processes (VBA/Macros).  Ranges are even based on coordinates, simlar to Named Sets in Analysis Services.

Setting up this problem in Excel is trivial, and can be accomplished in probably the same amount of time it took Mosha to construct against AdventureWorks, and Excel can use CSV, SQL and Analysis Services sources.

But Excel by itself cannot analyze or manage terabytes of data, and even though memory and the size of the worksheet itself has increased, it is still limited in its finite coordinate system.  This is where Analysis Services and MDX are necessary, and Excel can still be used to present the results.

So,

Can the original problem be solved easily in MDX?  Mosha proves it can.

Can the problem be solved easily in NextAnalytics?  Yes.

Can the problem be solved easily in Excel?  Yes (depending on the volume of data to analyze)

Can the problem be solved with modern RDBMS languages like TSQL, PL\SQL, etc.?  I think stored procs (CLR or native) can be written that accomplish this.

I guess the ultimate answer is "pick your poison".

-Kory

June 13, 2008 3:51 PM
 

Ward Yaternick said:

Kory, my original blog included an important element which I need to keep at the forefront of people's minds when they read the comments in Mosha's blog, otherwise I'm being half-quoted.  

The original sentence (in my own blog) which piqued Mosha's (and Chris Webb's) interest says:  "That’s an example of one analytic combined with another, iteratively and sequentially. Neither of which would be possible in MDX or SQL without the creation of a complex underlying data structure. "  Later on, I gave a clarification which clearly stated "No external tables, no external cubes. And, the nextanalytics "code" is still readable a year later and easily modified."

The reason I think it's important to clarify this is that it's my opinion that some (if not most) analytics should be easily disposable or at least easily enhancable.   In this, a business user asks a question, they get the answer, they deal with the business issue (or opportunity), and then they move on to the next analytic query which might be a derivative of the first.  For this kind of agility, analytic queries must be lightweight in terms of producing the results and the cost of preparation, and no ongoing overhead especially if the query result isn't to be requested by the user. In contrast, if you built a cube for a single or small set of analytics, then it's highly likely that your IT workflows are going to keep building it and you never recoup the IT cost. If the business user needs a dramatically different analytic next reporting period, you're in a start-over position. In other words, and I don't know if there's already a name for this out there, but "throwaway analytics". I guess the closest thing is adhoc spreadsheets.

Anyway, in his response to the point about not relying on a complex data structure, Mosha made the valid point that, if the base cube does indeed exist, then it is useful to extend it with MDX.  I think that's a fair comment: If you have a sunk cost in a cube, then extending it with MDX fits the condition.  My only comment to that is that, if you DON"T have the cube, then maybe it's not worth it to build one.

Mosha did indeed demonstrate that MDX is very elegant, powerful and capable.  But, we need to be careful on one point: There was no consensus that his approach was easy (as you suggest he did).  There was some opposing discussion in fact.

Per your other points, doing this example analytic natively in Excel is far from easy and it has nothing to do with data volumes - the required formula and cross worksheet cell referencing that would be required would be impossible to implement properly.

Likewise, to do this with plain SQL, would take a lot of work, probably weeks, and you'd end up with a complex underlying structure which violates the original principle of the blog. Remember, the supplied nextanalytics script queried the original adventureworks tables, not the cube.  Doing it without nextanalytics, would be very very difficult.

June 17, 2008 3:30 AM
 

Ward Yaternick (extending the challenge, just a bit) said:

As background, Mosha's MDX didn't quite do what the pseudo-code was intended to suggest. I didn't draw attention to it before because Mosha had (to us all including me) proved that MDX was quite powerful, which was what he was intending to do.  

But, on further reflection, I thought it might be fun to ask if someone else could modify his MDX. The idea being that this would help to answer the question, was his MDX "easy".  So, if you decide to take up this assignment, here's the clarified challenge:

1) Mosha's original code had a very small math error. His growth calculated the difference between two periods, but that's not quite the same thing as growth.

AND

2) I'm used to be but am no longer an MDX expert, but it seems that Mosha's code didn't implement the "iterative" part of the pseudocode.  

Notice that the pseudocode has a "foreach region"  set of braces.   The foreach region was intended to request that the MDX produce a set of crosstabs, one for each region.   An analogy in excel would be to have a product by time worksheet for each region.   Two things consider:  The list of products are unlikely to be the same for each region and you don't know in advance the names of the regions.

If I were the one writng the MDX, I'd see if it was possible to put each region on a separate page axis, but that's just me. Perhaps you know of a better way - my knowledge of MDX is apparently stale.  The factor that's going to complicate this approach would be if pages support having different members on each page. That part was always fuzzy to me - I think it depends on the implementation - I think some servers don't even support pages.  

The question, the challenge is, does anyone know if using pages is possible to do this, and if pages support having different members on each page?

Remember, modifying the mdx is a challenge is for people other than Mosha, because the point in question is whether or not his finished result could be considered "easy" and perhaps to affirm his belief that someone else could produce the query in about an hour.  Since this challenge is working from his code, probably it should take less than that.

Looking forward to seeing people's MDX submissions. You could post them here or as a comment to my blog:

http://www.nextanalytics.com/MyBlog/MyBlog/Can-a-business-intelligence-product-be-used-to-answer-analytic-questions.html

BTW, Adding the foreach region part to the nextanalytics script would take one extra line, at the bottom of the script.

June 17, 2008 3:52 AM

Leave a Comment

(required) 
(optional)
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement