THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

James Luetkehoelter

Nearly any SQL topic presented at times in a slightly eclectic manner.

5 Reasons why I think MDX is the Devil

First, an up front apology to Mosha - this is half in jest, half real. MDX is a powerful and useful language, but I find that its very existence has made my life very difficult when it comes to developing a solid Analysis Services solution. That's not to say I don't use it, or endorse it, but I worry greatly about its overall usage.

1) It tempts you by looking like SQL, but once your brought into it you realize how complex it is (if you don't have Mosha's MDX Studio, get it).

2) MDX allows you to do TOO much when it comes to a multi-dimensional structure, without first understanding multi-dimensional concepts. I've seen code that creates filters or subset cubes that overlap or end up cutting out measures implicitly. You have no warning about possibly undermining previous dimensional modeling - that would be a nice addition to see.

3) Along those lines, it enables sloppy dimensional modeling in the first place. "Throw it together, will add in things later". With a multi-dimensional database, it is very important to model it correctly, just like you would a relational structure. Unfortunately there are no Codd/Date rules of normalization, and often very little work done up front understanding the actually relationships and what the data is meant to represent. For example, instead of using a many-to-many cube dimension (what I call the dimesion to measure tie) to represent something like a bill of materials, I often see this done afterwards somehow via MDX calculations and such. Instead of going back to the dimensional modeling, the answer is usually "we can do this in MDX later".

4) Yet another language with its own syntactical oddities - damn you syntax, you will be the death of me!

5) I can't count the number of calculations I've seen in a cube that could have been done before the cube is loaded in the first place. An MDX calculation that does something like PRICE * QUANTITY = TOTAL SALES is just a waste of processing on the cube side. Remember, MDX is post aggregation - so the large the cube, the more dimensions it has and the level of drill down it allows all play a roll in the speed of things.

Those are 5 to start - anyone have anymore?

Published Sunday, January 04, 2009 12:24 PM by James Luetkehoelter
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Peter said:

Context:

About two years ago I was put on a project to replace an existing SQL based based cube solution (which was performing fine). Analysis Server was 'sold' to the customer as they could also generate reports from it with their own reporting tools.

Upto then the reporting was done by the application and the SQL was generated dynamicaly. The same application would need to query the AS/Dimensional cubes instead by the end of the project.

The model was quite small but had a complex structure given the number of dimensions. There were two time base grouping 'entities' similar to what a supplier and a customergroup would be like. Then there was a large tree structure where reporting would be done against (quite deep, variable in depth).

I designed two fact tables binding all the dimensions with measures together. One fact table was for the time related groupings of suppliers and customers. The other was for storing the measures for recorded specific supplier/customer interactions that had taken place for a node in the large tree structure.

It looks like a simple schoolbook case and I understood the problem and existing solution in full. My expectations from Analysis server were simple. Provide a portal for multi-dimensional external reporting tools to work with the data in an intuitive way and allow more/interactive slicing and dicing with good performance.

How it wend down:

After playing a bit with MDX examples and trying to implement a working system using the Microsoft tools I ran into too many time consuming issues. Thus I took a 3 day Micrsoft course in the hope of learning something useful on top of what I had figured out thus far on my own...

How disapointed I was afterwards, part to the way it was given, but also because I witnesed that the product was not even finsihed (version 2005). Manualy editing XML files to do certain tasks so a project can be deployed etc...WTF! After this I spend another month reading up more to get things done.

Anyways...here is my list of largest griefs!

1. False and conflicting information EVERYWHERE

I left the project for what it was with the conviction that even the creators of the different parts of the system did not share the same design in mind or even had any design experience to speak of at all.

The amount of quirks and inconsistensies was mind blowing and I feld like I landed in the version 3.0 to version 4.0 browser wars where 50% did not work...properly...but this time it was from one source...Microsoft.

A lot of the time individual features comparable in complexity of no more then simple SQL with some formatting afterwards, worked. As soon as you want to make something useful from these building constucts things would break somewhere as restrictions kicked in. Feature rich, usability poor...not unlike the early version 4 browsers that would break down as soon as you combined working examples or the side effect you get when augmenting existing webpages.

Horror all over again...where 95% of your time is spend figuring out why things do not work as advertised. Make even a tiny change somewhere, you have to redo nearly everything from the ground up.

Thus fixing one thing usualy broke ** several ** other.

This was especialy present with the large and deep tree that formed a hierarchy. If you would use Excel to query the data, you needed different properties set for a given well formatted output then when you wanted to do the same with a single dynamical generated MDX.

Its anti-life itself!

2. Dimensional modeling

Why they did a monster/obscurifying implementation like they did is still beyond me. I expected it to be very elegant from base concepts and it should have been...but I left in horror.

In my view the modeling can be conceptualy as a 2D star diagram with some extra metainfo on the non-fact table entities to facilitate hierarchies (as they presented in their own interface). All done using the well known create table statements etc. And just like you add indexes to tables you should be able to correlate attributes into hierarchies in the same way from SQL augmentations.

The fact tables themself are just that...a growing table with unmodifyable data. Querying should be possible in flat 2D with SQL as well as in a higher dimension query language. It would be fairly simple to design such a system. The trick is ofcourse in the optimising physical storage, but that is the job of the server engine and not the conceptual designer of a paricular solution.

The current way of doing the modeling with the microsoft interace is total horror, even if you know what you are doing you have to go trough to many screens and dropdown controls...everything affects everyting as model and presentation are entangled in ways unhealthy.

This entanglement removes abstraction and hence one cannot reliably produce code to or report against the data as you never can proove something is correct. Just changing a next to meaningless representation of the top of a tree will alter the result of your queries beyond of merely the representaion. If I see such behaviour in work here form someone making code, I would work em out the door!

Also during the course I had several times I found things that were not well thought out to the point I would qualify it as a severe bug! Things were not working like lego, but instead worked like elastic spagetty with supergly at some points attached between the strings. Trying to correct one thing, was inevitably detroying everything else.

3. MDX

Don't even get me started...this must be one of the most horrible designd languages EVER. Even Perl would look structured next to it, and that is intened for programs no longer then a page.

Generating MDX queries is one of the most annoying things I have ever done with respect to queriying. Same comment as with all the rest, it looks natural at first but if you going to dig deeper or actualy know what you want to get out and need to formulate something explicitly, its like invisible jelly.

For even the most simple queries, you need to formulate rediculus things and be creative to sidestep unwanted effects. For querying the simple star diagram along dimentions and filtering using IDs I had to use like 8 derrived nested queries!!! Filter upon Filter as the syntax simply makes too many assumption on what I want and attaches behaviour to it. It does not allow me to express what is needed.

I am not even going to start about the depreciated stuff you seem to need for like just about everything! Its total pans and I definatly want to kick the creator of this in the nuts!

My conclusion.

Microsoft rushed in and delivered half a product, and even less of a design to get some market share. They even handing it out with SQL server at no extra.

Sounds familiar?

And the development teams of both their tools and the designer of MDX I like to kick in the nuts for wasting so much of my time too!

I ain't a happy camper, even after two years of not touching the product anymore. It's tainted for life and I hope it dies soon so something sensible can surface instead!

What I DID like!

Yes, I actually likes one thing that was like the last item of the course which subsequently got not much time allocated to it.

Data mining!!!

One word....Fantastic!

And it does not even require a multidimensional models to work. It is applicable to any well structured set of data and provides instand insight into things otherwise hidden in the data.

Why they put it into analysis server instead of making a seperate product of it s not clear to me. It deserves so much more attention!

January 5, 2009 12:16 PM
 

Peter said:

Small correction

There were 3 fact tables...two were for the grouping of customers into customergroups and the grouping Suppliers into Suppliergroups. Both depending on a time dimension ofcourse, which was also used for the measures involving specific suppliers/customers on a particular tree node (3rd fact table).

January 5, 2009 12:27 PM
 

James Luetkehoelter said:

Peter,

Some comments,

1) Yes, conflicting information is always a pain. And usually people don't really understand a "cube" in the first place. And yes, there are frustrating items when implementing - like having a properties page but yet another "edit" page for containers and tasks.

However I've been working with AS since 2000 (and different multi-dimensional before that) and if done correctly, the frustrations are minimized. But that's the rub - it's often hard to find useful training or knowledge transfer from a consultant. But I would not condemn the product to Hades...

2)Dimensional modeling to me is confusing until you understand it - once you do, it's an entirely different way of thinking about data. For a great explanation of the complex many-to-many, get Marco Russo's ebook off of sqlBI.eu - it has practical examples. To me dimesnional modeling is they key to the real power of SSAS. The point is that SSAS really isn't basing things on Kimball or Inmon as a data warehousing methodology. I agree with that 100%.

3) I would say it is a challenging language. For me the problem is not the language itself but the sheer breadth of what it can do. And it is syntactically fussy. The biggest issue, again, I think is training, knowledge of multi-dimensional data, etc. The thing that worries me is that people look to it as a cure all (which it often can be) when a more elegent solution is best done pre-processing. BTW I think PERL is a great language.

But "kick the creator of this in the nuts" is totally unacceptable language. The only reason I left this post up is to comment on that. You owe those people an apology - if you don't like the tool or language, don't use it. It's one thing to be frustrated but another to be angry and rude. Nothing positive comes from that type of comment.

BTW, Data Mining is something that scares me, knowing statistical modeling. I find too many people tempted to use the data mining "model" as prediction information - which with the exception of a time serious is 100% false.

January 5, 2009 1:49 PM
 

Peter said:

Ok, I appologise about the kick the nuts thing, I was quite mad after writing half the story with all the frustrations boiling up again. I also accept your personal experiences are different you are deeper into it and MDX as a language as well.

I did put a real lot of effort into it and as a perfectionist wanted things to be working right. In fact I did make it work, but don't trust the results for obvious reasons. To get it to work I had to let go of the primary objective, having it deliver representable data using both self generated dynamic MDX as well as external tools as Excel (Excel lost out, but we never deployed it).

In my experience the product can never deliver what I needed it to, as everyhting was affecting everything else. The whole model forced me to focus on a specific limited set of queries and one alteration and you need to start over. I started by thinking from the data perspective and model this in a way that would allow for easy reporting (filter/aggregation aka slicing/dicing) with minimal processing for the sort of queries I had in mind. Afterwards I feel this aproach does not work and I am unsure what will.

What I hoped the product could bring that cannot be done conventionaly is to support flexible reporting by different applications querying its model and perform aggregations based on applied filtering. This to me is its main purpose as the rest can, and has to be done before anyway in SQL.

BTW, I still see dimensional modelling is just that, modeling more then 2 dimensions/axis along which you can perform all kinds of operations using multi-dimensional sets of data (slices, subcubes, etc).

No explanation I read says its different, but they just use a whole lot of more words for it. I do have plenty of confusion about how cubes should are pictured (seen so many conflicting interpretations). In some, the hierarchies are themselfs considered a dimension. For my model this seems just to confuse as the properties of these hierarchies were tuned to deliver proper formatted output to Excel. But these settings also distort the querying process when you want raw data (hence I dropped Excel).

It does not work for me as I expect and I tried countless detailed interpretations based in subtelties expressed in wording of the documentation. There was always something that did quite add up...it felt a lot like trying to reverse engeneer something complex without any context.

I think the problem with documentation is one every writer has. If you try to keep the mesage simple/understandable to most, you are liley ending up not telling the message but something else, thus confusing everyone else.

I gave up on it and maybe its my perception of what the product is for. Even then there were undeniable aspects of the product that were limited, incomplete or buggy to the point I would consider it bad regardles of its intended purpose (think of backup, scheduling, sperate server from SQL, linked server/driver support).

January 6, 2009 7:11 AM
 

Peter said:

Small correction (i always have to reread my own writing a few times)

It does not work for me as I expect and I tried countless detailed interpretations based in subtelties expressed in wording of the documentation. There was always something that did ** not ** quite add up...it felt a lot like trying to reverse engeneer something complex without any context.

January 6, 2009 8:50 AM
 

James Luetkehoelter said:

Thank you Peter for taking the edge off - things are always more productive if we are at least civil.

I here your frustrations, and I have felt too. I would acknowledge that documentation for multi-dimensional modeling, MDX, the simplistic analogy of a cube are all very, very confusing and at times contradictory. You aren't alone in feeling that way. It takes a significant investment to really get a handle on the whole picture. Once you do though, it is a very powerful tool.

To me the most difficult thing to really grasp (which then makes MDX all the more difficult) is multi-dimensional space. The cube analogy makes it seem much simpler than it is, where as even an abstract mathmatical representation of 5+ dimensions is difficult to comprehend. Try bringing up to 20+. Whenever I've taught BI classes, the first thing I've tried to do is just get the basic concept of multi-dimensional data across. Out of a class of 12, usually only 1 or 2 get the "ding" sound at first pass.

If anything, I think the product is undersold in simplicity - it is far more complex than MS marketing (that's marketing, not the developers) would have us believe.

Again, thank you for a more civil response - you'll always get better responses that way. And thank you for apologizing. The world needs more people willing to do that.

January 6, 2009 7:03 PM
 

Pelican said:

I am actually quite a fan of dimensional modelling rather than MDX although MDX, when used carefully and wisely, does the job quite well such as budgeting and forecasting type application where budget and forecast numbers need to be allocated and spread across. I often get quite crossed when MDX is compared to SQL but the true fact is MDX is 99.99% different from SQL, it is only the first line, i.e. “Select ….from” is similar to SQL. Quite understandably, when MDX is used as a quick fix the whole application not only becomes hard to manage but that quick fix and number of many quick fixes need to be maintained throughout the organisation.

MDX almost cannot be used in a very large database and cube environment. I have had the opportunity to meet Nicholas Dritsas of Microsoft on one of the user group meeting. He has specialised in Analysis Services. He mentioned even a little calculation would kill this type of large database and cube applications. Calculated set cannot be used, “DistinctCount” is THE most offending item followed by “Except”.

Although, there may be no way out without being using MDX in many cases, such as time related calculations where PTD, YTD, TEI/TEC, Period-To-Period need to be computed. To some extend, time dimension table may be adjusted such a way that these calculations can be performed within SQL database and can be used as a base measures in cube. Not many resources available publicly and, like others, I do tend to fix it later by MDX. I would like to see these types of advanced data modelling articles in future.

Pelican

March 3, 2009 11:43 PM
 

Frank said:

Just a few comments...

1) I have seen very few poeple get the dimensional model correct the first time they implement it in a cube. It usually takes a few times massaging it to get it to where it works for the cube and for how the useres want to see the data.

2) I love MDX. When I first started using it I both fascinated and fustrated by it. By after using it for the last 5 years, I have become more profecient at it. And I almost always learn something new each day with it. Whether it is using a new function or how it can be used to manipulate data output. I will say that most of my knowledge has come from blogs, some MSDN articles, MDX Solutions book (great book!) and my own tinkering around. Obviously no one that I know of teaches classes on MDX and there are very few "real world" books out there on it except for the MDX Solutions book.

July 15, 2009 6:36 PM
 

Bassam Muhammad said:

I think the biggest mistake is to rely on the members list as a way of thinking to construct the mdx queries , i mean in SQL you just specify column names , NOT , the data itself then at WHERE you can filter it out , designers of MDX could have make it that easy but they did not and relied on a much harder concept , drawing members themselvs

example , what's wrong with designing mdx like that:

SELECT productcategory,productsubcategory at rows,

year , month at columns

where product category in ('food','computers') AND year in (2007,2008)

ATTRIBUTES productsubcategory.expand=true

from salescube

that way , logical thinking of puting attirbutes themselvs in the axis - NOT DATA Members - is much easier and like SQL

also this attribute thing can describe any properties of an attribute which something can be specific to dimentional layout

if mdx is designed with this concept then it would have been much easier for SQL developers to understand

Bassam

December 15, 2010 4:54 AM

Leave a Comment

(required) 
(required) 
Submit

About James Luetkehoelter

I am passionate about what I do - which is DBA, development, IT and IT business consulting. If you don't know me, haven't met me or have never heard me speak, I'm a little on the eccentric side. One attendee recently described me as being "over the top". Yup, that about says it - because I only speak on topics that I'm passionate about.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement