THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Alberto Ferrari

SSAS: Reference materialized dimension might produce incorrect results

Reference dimensions let you create a relationship between a measure group and a dimension using an intermediate dimension to act as a bridge between them. In the Adventure Works demo, for example, they are used to link reseller sales and orders to the geography dimension.

mg

Now, something that is less evident is how this specific kind of relationship is handled by SSAS. Let us look further in the definition of the relationship:

mat

If the relationship is not materialized (something SSAS will dislike) then the relationship will be resolved during query time, with slow performances. Nevertheless, everything will work fine.

If, on the other hand, we follow SSAS suggestion and decide to materialize it, we will need to understand what is going on under the cover, to avoid a very subtle misfeature of SSAS.

When the relationship is materialized, SSAS will resolve it during processing time issuing a more complex query, during the partition processing (not dimension, I have written the right word: partition!) issuing a JOIN between the fact table and the intermediate dimension, to gather the value of the key for the final dimension from the intermediate one. This will lead to poor processing performances, but it is something that we already know. Anyway, what is important to point out is that the relationship is resolved during partition, not dimension processing.

The real problem comes when we need to update the relationship. We might think (at least this is what I thought) that issuing a ProcessUpdate on the intermediate dimension would refresh the relationship and perform all the necessary updates. This is not true. Since the materialized relationship is resolved during partition processing, any dimension processing will not refresh it. You can try to ProcessUpdate the intermediate or the final dimension: no way, the relationship still uses the data that was there when the partition was processed first time. If we want to refresh this relationship, the only way is to reprocess the partition, at that point we will have the new data correctly available.

This might be a very big problem if, for very large cubes, we decide to reprocess only the newest partitions, reprocessing only the needed aggregations for the older ones. If older partition will not be processed they will use the older version of the relationship. We will end up with an inconsistent situation where some partitions use a relationship and other use a different one, depending on when they have been last processed.

Clearly, if we rely on non-materialized relationships, everything will work fine since the relationship will rely only on dimensional data and does not need to be resolved during partition processing. In fact, issuing a ProcessUpdate on the intermediate dimension will imediately refresh the non materialized relationship, as it is supposed to do.

I think BOL should be much clearer on this, materializing a relationship is not just a matter of query speed, it has subtle consequences that, at their extreme, might lead to incorrect results from the cube.

Published Wednesday, February 25, 2009 3:59 PM by AlbertoFerrari

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

SQLBI - Marco Russo said:

Alberto Ferrari wrote about an interesting issue he discovered working on optimization of dimension process

February 25, 2009 10:01 AM
 

Miky Schreiber said:

Hey Alberto,

Very strong post. I learned a lot from it.

Maybe setting a partition/measure group which is dedicated for the problematic dimension will do?

Miky

February 25, 2009 3:25 PM
 

James Luetkehoelter said:

Great post Alberto, especially clarifying the processing portion of it (partition vs dimension).

February 25, 2009 6:26 PM
 

Dan English said:

Very nice post Alberto, thanks.  Great content and explanation

February 27, 2009 7:13 AM
 

Jesse Orosz said:

I went through a very similar issue a couple years back:

Problem:

http://jesseorosz.spaces.live.com/blog/cns!E322FD91218E57CF!250.entry

Solution:

http://jesseorosz.spaces.live.com/blog/cns!E322FD91218E57CF!245.entry

March 30, 2009 5:06 PM
 

AlbertoFerrari said:

Jesse, I looked at your posts, it seems that reference dimensions hide more than one problem. I never thought about interferences between aggregations and reference dimension, but reading your experience has been very teaching. :)

Thanks.

Alberto

March 31, 2009 1:32 AM
 

Mathieu DESPRIEE said:

Very interesting post.

I might add that they're also issues with unknown members if you materialize the relationship.

Especially when in a relationship Fact -> DimA -> DimB, there's no support of unknown member in DimA, but you need one in DimB.

November 5, 2009 9:32 AM
 

Rohan said:

Hi Alberto ,

I have a situation where in

i join between the fact and dimension on product sub category number which is number from 1-n.

Each product has product subcatogory  ranged from 1-n

fact                         Dimension

sub_cat_code  measures           Category    subcategory sold flag

1                                  a              1         1

1                                  a              2         1

2                                  a              3         0

3                                  b              1         0

                                  b              2         1

                                  b              3         1

                                  c              1         1

                                  c              2         0

                                  c              3         1

When the cube is processed

and

i select product category , sub category and sold flag as 1

i get

category subcategory measure

a           1         10

b           2         20

c           3         12

But for

Category Subcategory

a         2

b         3

c         1

Go missing as they have been allocated to category b,c,a respectively

Could you please suggest a work around or a solution to get the entire data

Thanks

Rohan

November 16, 2009 11:39 PM
 

AlbertoFerrari said:

Rohan,

It is pretty hard to understand the problem with so few information. I don't think this is the right place where to discuss it, since it seems a modeling problem and not a reference dimension one. :)

Plese drop me an e.mail if you want to further investigate on it.

November 17, 2009 10:02 AM
 

Amit said:

November 2, 2010 6:40 AM
 

ashutosh said:

ver nice article

June 28, 2012 3:46 AM
 

StJohn Kettle said:

Alberto

I'm becoming somewhat wary of using referenced dimensions at all. If they are materialised you say that there is a perfomance hit. But that hit is bigger than it needs to be because (at least SSAS that ocmes with SQL Server 2008 R2) obtains the required foreign key from the intermediate dimension in a very inefficient way. See this post for why: http://bifuture.blogspot.com.au/2012/02/ssas-optimizing-processing-query-using.html.

I have followed Hennie's advice, replacing the referenced dimensions with ordinary ones (which has the unpleasant effect of adding lots of redundant joins to the data source view) and reduced processing time by a factor of 40.

Your book Expert Cube development . . . could perhaps stress more the performance impact (page 105).

August 29, 2012 7:13 PM
 

Durga said:

Detailed post, Thanks.

I never find like this in any other blogs/sites.

January 29, 2013 1:21 AM

Leave a Comment

(required) 
(required) 
Submit

About AlbertoFerrari

Alberto Ferrari is a Business Intelligence consultant. He his interests lie in two main areas: BI development lifecycle methodologies and performance tuning of ETL and SQL code. His main activities are with SSIS and SSAS for the banking, manufacturing and statistical sectors. He is also a speaker in international conferences like European PASS Conference and PASS Summit.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement