THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

Large Dimensions in SSAS Tabular #ssas #vertipaq

After many years of helping several companies around the world creating small and large data models using SQL Server Analysis Services Tabular, I’ve seen a common performance issue that is underestimated at design time. The VertiPaq engine in SSAS Tabular is amazingly fast, you can have billion of rows in a table and query performance are incredible. However, in certain conditions queries made over tables with a few million rows are very slow. Why that?

Sometime the problem is caused by DAX expressions that can be optimized. But if the problem is in the storage engine (something that you can measure easily with DAX Studio), then you might be in bigger troubles. However, if you are not materializing too much (and this is a topic for another post of for the Optimizing DAX course), chances are that you are paying the price of expensive relationships in your data model.

The rule of thumb is very simple: a relationship using a column that has more than 10 million unique values will be likely slow (hopefully this will improve in future versions of Analysis Services – this information is correct for SSAS 2012/2014). You might observe slower performance already at 1 million unique values in the column defining the relationship. As a consequence, if you have a star schema and a large dimension, you have to consider some particular optimization (watch my session at Microsoft Ignite to get some hint about that).

If you want to know more, read my article on SQLBI about the Costs of Relationships in DAX, with a more complete discussion of the problem and a few measures of the timings involved.

Published Thursday, August 20, 2015 11:24 AM by Marco Russo (SQLBI)

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Ben Lezin said:

Hello Marco,

It was a pleasure meeting you and hearing you and Alberto speak at the PASS Summit.

You may recall I asked you a question regarding this specific issue for one of my models that runs out of memory during processing time with 50 million rows.  I was able to find a workaround by partitioning the problematic table to monthly partitions, but you had mentioned that the calculated columns aril process the entire table in spite of the partitions.

I have subsequently installed your DAX studio tool but it's not obvious to me how to transform the DAX for the calculated column to the DAX studio format for analysis.

Could you please explain how to transform the following calculated column so that it's compatible with your DAX studio tool?

=MAXX(FILTER(PatientAcctBalHistory,[HistoryDateAcctID]=EARLIER([PreviousHistoryDateAcctID])),PatientAcctBalHistory[PatientAccountBalance])

My hunch is that the performance impact is caused by the high degree cardinality of the concatenated HistoryDateAcctID and PreviousHistoryDateAcctID columns that I created so that I could dynamically get the previous month's status for each account.  My next step is to use T-SQL window functions in the ETL to handle this but I would rather use the model. So if you can also think of a better way to achieve the same result using a measure of different type of calculated column it would be much appreciated!

Thanks!

Ben

November 1, 2015 3:43 PM
 

Marco Russo (SQLBI) said:

Use tis syntax:

EVALUATE ADDCOLUMNS ( <tablename>, "newcolumn", <expression> )

You probably want to replace <tablename> with a FILTER in order to retrieve only a small number of rows (you don't want to evaluate the expression for 50 million rows in DAX Studio).

November 1, 2015 4:11 PM

Leave a Comment

(required) 
(required) 
Submit

About Marco Russo (SQLBI)

Marco Russo is a consultant, writer and trainer specialized in Business Intelligence with Microsoft technologies. He runs the SQLBI.COM website, which is dedicated to distribute resources useful for BI developers, like Integration Services components, Analysis Services models, tools, technical information and so on. Marco is certified as MCT, MCDBA, MCSD.NET, MCSA, MCSE+I.

This Blog

Syndication

Archives

Privacy Statement