THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

Updates about Multidimensional vs Tabular #ssas #msbi

I recently read the blog post from James Serra Tabular model: Not ready for prime time? (read also the comments because there are discussions about a few points raised by James) and the following post from Christian Wade Multidimensional or Tabular. In the last 2 years I worked with many companies adopting Tabular in different scenarios and I agree with some of the points expressed by James in his post (especially about missing features in Tabular if compared to Multidimensional), but I strongly disagree in others.

In general, Tabular is a good choice for a new project when:

  • the development team does not have a good knowledge of Multidimensional and MDX (DAX is faster to learn, not so easy as it is sold by MS, but definitely easier than MDX)
  • you don’t need calculations based on hierarchies (common in certain financial applications, but not so common as it could seem)
  • there are important calculations based on distinct count measures
  • there are complex calculations based on many-to-many relationships

Until now, I never suggested to migrate an existing Multidimensional model to a Tabular one. There should be very important reasons for that, such as performance issues in distinct count and many-to-many relationships that cannot be easily solved by optimizing the Multidimensional model, but I still never encountered this scenario.

I would say that in 80% of the new projects, you might use either Multidimensional or Tabular and the real difference is the time-to-market depending on the skills of the development team. So it’s not strange that who is used to Multidimensional is not moving to Tabular, not getting a particular benefit from the new model unless specific requirements exist. The recent DAXMD feature that allows using SharePoint Power View on Multidimensional is a really important one, even if I’d like having also Excel Power View enabled for this scenario (this should be just a question of time).

Another scenario in which I’m seeing a growing adoption of Tabular is in companies that creates models for their product/service and do that by using XMLA or Tabular AMO 2012. I am used to call them ISVs, even if those providing services cannot be really defined in this way. These companies are facing the multitenancy challenge with Tabular and even if this is a niche market, I see some potential here, because adopting Tabular seems a much more natural choice than Multidimensional in those scenario where an analytical engine has to be embedded to deliver one of the features of a larger product/service delivered to customers.

I’d like to see other feedbacks in the comments: tell your story of choosing between Tabular and Multidimensional in a BI project you started with SQL Server 2012, thanks!

Published Monday, November 11, 2013 3:38 PM by Marco Russo (SQLBI)
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

James Serra said:

Some great points Marco.  While I have not seen a large adoption of Tabular, other folks have.  My post was intended to inform those who are starting a new project of the limitations of Tabular so they don't go in blindly thinking Tabular can do everything that multidimensional can do.  If those limitations are not a problem, then Tabular, because of its ease of use, is the better option.  I was hoping my post started a discussion on the experiences many are seeing between the two models, and it seems I am succeeding :-)

November 11, 2013 10:58 AM
 

Marco Russo (SQLBI) said:

James,

thanks for your feedback - I hope we'll see other comments as well! :)

Marco

November 11, 2013 11:49 AM
 

Paulo Corrêa said:

Marco,

I agree with many of points raised by James. I think that if a team has knowledge about Multidimensional model, the best choice is the Multidimensional model even that the limitations are not a problem for the initial requirements. I think this because if requirements changes, the Tabular model might not be enough to support this changes. I think so that the fact of Tabular Model is not available in Standard Edition is a limiter for a more simple scenario where the Tabular Model would be the best choice for teams without knowledge of MDX. :)

Paulo Corrêa

November 11, 2013 2:17 PM
 

Marco Russo (SQLBI) said:

I agree that the lack of Tabular in standard edition is certainly something that limit the Tabular adoption in many IT shops. They use PowerPivot as a workaround, but it's not really a Tabular replacement.

November 11, 2013 4:02 PM
 

David said:

I'm ready to start adopting tabular if only because Microsoft seems to be investing in it rather than in multidimensional (MD).  Aside from over-hyped "Sharepoint PowerView on MD" or whatever, there has been little improvement made to multidimensional in recent years.  It is a bit discouraging to work with a platform that seems to be headed towards eventual deprecation.  We can't even get our MD-related bugs fixed anymore.

My theory is that SQL Server relational database is taking over the storage side of the BI equation (new indexing features, PDW, etc) and SSAS tabular is replacing MD  with its in-memory formula engine & lower cost of entry.  I suspect Microsoft wants to eventually squeeze MD out of the picture altogether.

November 11, 2013 10:44 PM
 

Marco Russo (SQLBI) said:

David, thanks for the feedback. I understand the pain of not having new MD features. For MD-related bugs, the best way to get attention from Microsoft is opening a support incident and not just waiting an answer on Connect items. If the bug affects a system in production, it gets higher attention. If the bug is the request of a new feature, that this gets a lower priority.

November 12, 2013 2:53 AM
 

Gilbert Quevauvilliers said:

I have to say that after reading a few of the blogs where different guys have given their different views of MultiDimensional (MD) vs Tabular  I have to say that in my experience it all depends on the reporting that you are trying to deliver to the client.

What I often do is I actually almost always develop my project in MD, because of the many features in MD. After which I often leverage off MD and write MDX queries which I then use to populate my Tabular Project. I am aware that this often means more time on the project, but I often find due to leveraging off of MD, I can build better and faster reports into Tabular.

I do find often that using the calculated columns, this enables me to make the data more meaningful, which is something that you cannot do in MD. After which using almost any of the Microsoft Reporting tools out there makes the visual aspect of the report very appealing to the clients.

I also sometimes will just create a POC in Excel using Power Pivot, after which I can then leverage it onto the Tabular server if required.

I think that there is a valid use for both MD and Tabular, it all depends on your requirements.

November 12, 2013 3:35 AM
 

Marco Russo (SQLBI) said:

Gilbert, your feedback is very interesting. It seems you're using MD as a data mart for many reports, but you can still query it directly for simple formulas.

Thanks!

November 12, 2013 3:43 AM
 

Greg Low said:

Keep in mind as well that MD has a standard edition offering, Tabular does not. I think this is an error in licensing policy.

November 13, 2013 11:06 PM
 

Marco Russo (SQLBI) said:

I agree Greg, Tabular on Standard would have been better (maybe with some limitation on db size if MS wanted to push BI/Enterprise editions) - I was just wondering whether DAXMD support is supported in Standard edition or not, I'll investigate on that.

November 14, 2013 2:48 AM
 

Chris Ross said:

The member-level permissions available in MD is almost reason enough for me avoid Tabular. My reporting layer is significantly more simple than otherwise as a single report can be built to accommodate the needs of multiple layers of a department or organisation. Even just the capabilities of a default member provides huge benefits as users can immediately see reports in their preferred default context any time the open a report or pivot table built on MD. Does anything else (besides custom architecture in SQL) do that?

November 16, 2013 1:46 AM
 

Marco Russo (SQLBI) said:

Chris, you put good points. Please note that many of these strenghts are related to MDX and not to Multidimensional. These features might appear in Tabular, too (and the default member could be defined in the MDX Script also in Tabular, even if not supported - it deserves a test...).

But yes, today this level of abstraction and metadata richness is provided only by Multidimensional.

November 16, 2013 1:56 AM
 

Imad said:

I think one of the most important and powerful feature in MD and I see it little covered when comparing to tabular is the aggregation navigator, aggregation designer and the ability to reference an aggregate based on a relationship. This single feature gives you the ability to compress your data infinitly to constant size that is independent of the transaction volume. As your data grows you will have little degradation in performance. As long as tabular is lacking this, it will never make to the industrial level league. It will stay in the realm of micky mouse projects, and sadly Microsoft has no idea of the harm they are causing with their uninformed marketing crew.

November 26, 2013 4:17 AM
 

Imad said:

Correction:

This single feature gives you the ability to compress the context space of  queries infinitly to constant size that is independent of the transaction volume.

It boggles my mind how you claim tabular is faster when it keeps recalculating with every query from the floor of the pyramid. Plus the floor of the pyramid is not constant in size, essentially the response time of queries becomes correlated to the transaction volume and degrades non linearly relative to the transaction volume size. This is huge risk to implement on a serious data warehouse.  We have done simulation and benchmarking on this comparing both models. Every statistical result pointed to MD beating tabular by big margin on every record.

November 26, 2013 6:09 AM
 

Marco Russo (SQLBI) said:

Imad,

thank for your feedback. It would be nice having your measures about the tests you've done, since this might be different depending on several scenarios and it would be interesting having concrete numbers to compare.

Roughly speaking, for a compressed in-memory database in the order of magnitude of a few GBs (which is in the range of 10-100 GB of space in the relational data warehouse) Tabular performs pretty well, faster or on pair with MD. When you go over the limit of 100GBs of compressed in-memory database (1-10TB or more of relational data warehouse) predefined aggregations might perform better on well-known queries, but when you receive a query that doesn't hit the aggregation, the first query is slower than the equivalent in Tabular (after data being in cache, MD and Tabular might be similar or MD could be better since it has a cache at the formula engine level that is not available in Tabular).

I look forward to see your numbers, if you can share.

Thanks!

Marco

November 26, 2013 9:03 AM
 

Imad said:

Marco,

You are downplaying the true function of a BI data model, it is pyramid and not flat like tabular. This means you should accomplish 80-90% aggregate cache hit ratio on user queries, those aggregates are all in memory and compressed million times more than leaf level like tabular, plus they are pre processed. You will only need to go to the leaf in a scoped drill down and that would be 10-15% of user traffic. Now, if you are worried about the 10-15% leaf access, you can push the floor to in memory in SQL vertipaq and you will get close response like tabular for remaining 10-15% of queries (scoped drill down).  

So with the 85% of the queries, MD was 10 to 100 times faster than tabular and once we moved the floor to sql in memory we got very close response to tabular on the remaining 10-15% leaf drill down queries.

We did the probability distribution on response time from the logs of both systems and the difference was shocking.  It is true the MD distribution have little outliers, but as I mentioned above that can be fixed by moving the floor to in memory in SQL.

MD Percentile

50% Percentile: 10 seconds

80% Percentile: 30 seconds

Tabular Percentile

50% Percentile: 90 seconds

80% Percentile: 150 seconds

A YTD Gross margin calculation on a 7 level hierarchy that depends on additive and non additive measures with couple billion transactions took 100 times more time on tabular, on MD, it is almost instant. Plus I don't wanna go to non linear calculations like aggregating a multiplication, the difference is staggering between pre processed and  real time processing.

One other thing is the server utilization, on tabular user queries were recalculating the same metric with every queries, it is like you keep banging your head over and over again to do the same thing.  

Oracle is not stupid to have aggregate navigator and designer with their in-memory system, because they know you can beat that with some fancy marketing.

November 26, 2013 11:43 AM
 

Marco Russo (SQLBI) said:

Imad,

it would be nice looking at the MDX and DAX queries you tried. The execution time of queries in Tabular for 2 billion transactions is very high. I have a database with 4 billion records with queries that return in less than 3 seconds in Tabular and up to 8 seconds when many-to-many relationships are involved. MD can be optimized for regular measures, but not on M2M scenarios. I agree that MD can be faster than Tabular, but the numbers I see are strange and I'd like to see which type of workload have been used.

Tabular has its own limits, but has also many strenghts. I've been surprised several times by performance that can be obtained in Tabular, even if DAX optimization and data distribution are very important factors to consider.

I don't know if you can share more info about the tests you performed (such as data model, tables' density, queries used) but any more information you could share would be very helpful to the community.

Thanks,

Marco

November 26, 2013 4:01 PM
 

Imad said:

I am not talking about specific queries, there is no dax or mdx, pure measures.  I am talking about system throughput. You can't look at individual queries and make generalized claim, this is not scientific.  

I said couple billion, close to 6 billion.

Until I see a scientific discussion supplied with mathematical evidence on how leaf level calculation will beat a pre processed aggregate calculation, I won't share anything and I won't waste my time.

Thanks for your opinion, appreciated.

November 26, 2013 4:35 PM
 

Marco Russo (SQLBI) said:

Imad,

I can base my feedback on my experience. From a theorical point of view, you are right: a pre processed aggregate calculation requires less computation than leaf level calculation. With the current hardware and current implementation, there are scenarios in which the result might be different. Consider that:

- access to RAM is expensive of some order of magnitude compared to data stored in cache

- sequential access to RAM is faster than random access to RAM (I didn't believe to that, but it happens for a number of reasons)

- the number of aggregations required to provide quick answer to any query might be huge, and the time required to locate and retrieve the best aggegation for the query requested might be more expensive than doing the leaf level calculation, depending on the size of the columns involved and their compression ratio

- certain aggregations are not useful for queries made at different granularities (think to distinct count and many-to-many calculations)

So the point is that I had your same point of view a few years ago, but after several empirical evidence of the performance that you can achieve with Tabular, I always evaluate carefully the database, the requirements and the calculations involved before suggesting between Tabular and Multidimensional.

The number you provided in one of your comments are so high (90 seconds, 150 seconds) that I was wondering what calculation were issued. In the current implementation, DAX and Tabular performance might be very sensible to the DAX syntax, and different syntaxes providing the same result can have wide differences in performance.

I'm really sorry you don't want to share your experience. The number you provided are interesting and deserve further investigation and I would be very happy if you change your position and would like to collaborate in this research.

Thanks,

Marco

November 30, 2013 5:46 AM
 

Matthew said:

Hi Marco

I'm currently working on a Tabular Project for an online and High Street Retailer. The solution replaces an existing Multidimensional solution. Tabular was chosen for the following:

- The speed and ease of getting a functioning solution 'up and running'

- Many-to-Many capabilities (basket analysis etc)

- Flexibility of the data model (very easy and quick to deploy changes)

- The impressive speed of DAX queries, especially as the users like to drill down to individual transactions

Many thanks for passing on your knowledge via your blog, I've found it very helpful.

December 2, 2013 6:02 AM
 

Marco Russo (SQLBI) said:

Matthew,

thanks for your feedback!

Marco

December 2, 2013 11:39 AM
 

John Simon said:

Hi Marco,

I run a large BI team in a Telco and we are using Tabular by default, MD when required.

One issue we face with Tabular is when the calculations are trying to do a lot of IF statements because we cannot use Unary operators in hierarchies.

However, we chose Tabular by default because it is faster to develop, more flexible and we get great compression rates.

December 2, 2013 8:50 PM
 

Marco Russo (SQLBI) said:

John,

thanks for your feedback. Can you provide some example of the calculation that requires unary operators?

Have you seen the Alberto's blog post on this topic?

http://sqlblog.com/blogs/alberto_ferrari/archive/2011/03/01/powerpivot-parent-child-and-unary-operators.aspx

Marco

December 3, 2013 1:39 AM
 

John Simon said:

Hi Marco,

I have read that article.

It is not so much having unary operators, but having multiple hierarchies using them. So the calculations need to run IF statements against multiple hierarchies which slows them down, especially when we have 2 billion rows in the fact table.

Interestingly enough, the multi dimensional cube was 96Gb, and the tabular model 20Gb. The MD cube also processed much faster, and in this case performed better.

However, even given this, we are still using tabular by default as in general it suits our purposes.

We probably now have 90% tabular models and 10% MD cubes. I think there is definitely room for both in any implementation.

December 8, 2013 3:19 PM
 

Marco Russo (SQLBI) said:

Hi John,

I understand the problem of multiple hierarchies with Unary operator - in this scenario MD is much simpler to handle.

The comparison in size between MD and Tabular is interesting - does the MD have 96Gb including aggregations? The process time really depends on the hardware, but unless you create many aggregations the speed of MD might be better, also because Tabular cannot read multiple partitions from the same table at the same time.

Thank you very much for the feedback!

Marco

December 9, 2013 12:30 AM
 

Maria said:

Hi Marco,

For me one of the main limitations of Tabular is inability to support multi-user scenarios.

From official guide:

Problem: Analysis Services in tabular mode is not optimized for multiuser scenarios. As a result, an expensive query could starve all other user queries of resources.

Cause: By design, each user’s query should be so fast that the engine should just finish one query before moving on to the next on. This design gives the CPU and caches and threads the least amount of context switching.

Workaround: Design queries so that they do not run too long.

And though performance of single query is acceptable (but in distinct counts over rolling weeks it is still far from instantaneous for models with ~ 8 bln rows), when few queries are executed in parallel - I see significant difference in run-time what stops me from using Tabular in production.

What do you think?

December 10, 2013 9:31 PM
 

Marco Russo (SQLBI) said:

Maria,

this is an interesting scenario and the problem you describe is a well-known one. A complete answer to this would be very long, I try to summarize:

- scale-out on multiple server can serve more users, like Multidimensional

- real issue is how to guarantee the latency answering each single query; one way is to connect more demanding queries/users to a separate server, so that they have a limited effect on simpler queries

- distinct count optimization on 8 billion rows model is challenging, but still possible (within certain limits) - we are working on a White paper on this topic that we'll publish in the next weeks/months, contact me directly if you want to discuss it deeper

- do you have an alternate method for the distinct count calculation that is faster? In my experience, Multidimensional scales better but is also slower in this specific scenario (distinct count on large fact tables).

There is no silver bullet here - there are many optimizations possible, but the core of the issue you described exists.From my point of view, it also exists with Multidimensional, so it would be interesting having data to compare scalability performance between the two implementations (comparing average response time observing different number of concurrent users). I would be very much interested in this analysis if you have some number to share (doing these tests in a comparable way is not easy, so having the ability to discuss them is very interesting to me).

Thanks!

Marco

December 11, 2013 12:59 AM
 

Maria said:

Hi Marco,

As you said, there are many optimisations possible. M2M Distinct Count works reasonably well for me (much better then native Distinct Count which in general I'm trying to avoid for a number of reasons), but of course, for big time periods, run-time is in minutes not seconds.

In some limited scenarios, it may outperform Tabular solution, but on average performance of a single-query in Tabular is faster.

Unfortunately, I don't have scalability test results but I will contact you to discuss some numbers.

Maria

December 11, 2013 6:39 PM
 

Marco Russo (SQLBI) said:

Maria,

you can write to marco(dot)russo(at)sqlbi(dot)com

Thanks!

Marco

December 11, 2013 6:51 PM
 

Chris Ross said:

This is a really great thread!  What does concern me though is that MS seems to abandon a robust, working, useful, product (eg MD, SSRS) for a potentially improved but definitely underdeveloped one (eg tabular, power BI).

MS seems to be years behind tableu on the visualization end and while PowerBI is very cool but it doesn't come close to meeting the needs of many of the reports built even in SSRS. I just don't want to see this happening in the data layer because I love my MS-based data warehouse!

December 12, 2013 5:09 AM
 

Chris Ross said:

PS - are there any signicant data mining capabilities in tabular? If we're going to talk about how competitive a product is, that and other 'big data' topics should be in the top concerns.

December 12, 2013 5:18 AM
 

Marco Russo (SQLBI) said:

Hi Chris, thanks for the feedback!

I understand you're worried for MS corporate BI tools evolution - it seems they are dedicating more resources to self-service than to corporate BI in these days. I agree that a more balanced approach is required. I hope MS is already working on that.

Marco

December 12, 2013 5:18 AM
 

Marco Russo (SQLBI) said:

Today, there are no specific data mining capabilities in a Tabular instance. You can always use a Multidimensional instance with existing Data Mining tools.

December 12, 2013 5:21 AM
 

Mike Cornell said:

Hi Marco -

About a year-and-a-half ago, my company stood up both MD and Tabular instances of SSAS.  That said, every project over the last year has been in Tabular.  I attribute that to many of the things you call out:

• We had no pre-existing MD models

• While some members of our team are well versed in MD, it's been very easy to pick up Tabular and establish efficient dev processes with it

• Our data and models typically utilize complex many-to-many calculations and distinct count calculations

One of the biggest aspects of Tabular that has made it work so well in our organization is less technical and more on the development methodology side.  Gilbert touched on it briefly in his comment, but for us, the ability to quickly build prototypes utilizing Power Pivot has been huge.  It's great to get a group of SMEs from a department into a room and begin putting together a small-scale model with their data up on a projector right in front of their eyes.  The context, immediate feedback, and insights gained from those sessions lead to a much more efficient build-out of the larger, enterprise solution and ultimately ensure a more business-driven outcome.

The tools, language, and modeling techniques of MD do not yield themselves as well to this sort of interaction.

Thanks! - Mike

January 1, 2014 11:32 PM
 

Marco Russo (SQLBI) said:

Hi Mike, thanks for the feedback and Happy New Year! :)

January 2, 2014 2:42 AM
 

Sunil Kadimdiwan said:

I agree with Chris Ross.

Microsoft seems to be ignoring very useful products SSAS-MD and SSRS. We have not seen changes since 2008-R2 for SSRS. I also do not see any in the upcoming 2014-version. Perhaps someone can shed light on the future direction on these 2 products.

Thanks,

Sunil.

January 8, 2014 5:50 PM
 

Marco Russo (SQLBI) said:

Hi Sunil,

I share your pain and I don't have news to share, unfortunately.

Marco

January 10, 2014 11:36 AM
 

Terje said:

Hi,

I was wondering about what about user Security in tabular vs MD. I'm looking at implementing user Access by user name, using bridge tbl and a user hierarchy..in MD this Works well, but not sure how this will work in Tabular..

Regards

Terje

February 25, 2014 8:02 AM
 

Marco Russo (SQLBI) said:

Terje,

Tabular implements a row-level security model: every user/Group can have a dax expression in every table that defins with rows are visible to that user. You can implement bridge table, but you have to define the access rule with a DAX expressions instead than an MDX one.

Marco

February 25, 2014 8:09 AM
 

Steven Neumersky said:

IMO you (and James) both have good points, and I truly believe the most important thing is to not OVERSELL Tabular in the large enterprise. However, it and PowerPivot are still great tools for any enterprise--especially in the area of POCs, time-to-market, and supplemental communication of requirements between IT and Power Users (in addition to already existing perks of "in-memory" analytic engines).

March 26, 2014 12:08 PM
 

Marco Russo (SQLBI) said:

Steven, thanks for the feedback. I agree that overselling is never a good strategy!

March 26, 2014 1:37 PM

Leave a Comment

(required) 
(required) 
Submit

About Marco Russo (SQLBI)

Marco Russo is a consultant, writer and trainer specialized in Business Intelligence with Microsoft technologies. He runs the SQLBI.COM website, which is dedicated to distribute resources useful for BI developers, like Integration Services components, Analysis Services models, tools, technical information and so on. Marco is certified as MCT, MCDBA, MCSD.NET, MCSA, MCSE+I.

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement