THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Microsoft OLAP by Mosha Pasumansky

Averages, ratios, division by zero and NON_EMPTY_BEHAVIOR

How to get averages and other ratio based MDX calculations demonstrate great performance ? I thought I covered this subject in great depth before, but I probably wasn't concise enough. Last week I was asked to help with performance to very big and important customer (who shall remain unnamed). When I arrived on site, I found the project staff to be very knowledgeable about Analysis Services and MDX. The model was designed following all the best practices, they have studied the Performance Guide and implemented all the relevant changes. Yet they couldn't get a very simple MDX calculation of average to perform well in medium size Excel reports. They also studied all of my earlier blog on this subject ("How to check if cell is empty", "Budget Variance and NEB" and "Performance of IIF" blogs), yet it wasn't working for them. We have found a solution which satisfied both the business needs and delivered great performance, so I thought I should share it, since it is such a common scenario.

The problem applies to pretty much any kind of ratio, but we will pick the example of average to make it more specific. Let's suppose, that we keep in the data warehouse two measures about the product sales - SalesAmount and SalesCount. Now we want to compute the SalesAverage, so the most straightforward way to do it in MDX Script would be:

Create SalesAverage = Measures.SalesAmount / Measures.SalesCount;

The careful designer, however, would be concerned about division by zero, so following all the best practices the more robust expression would seem to be:

Create SalesAverage = IIF( Measures.SalesCount = 0, NULL, Measures.SalesAmount / Measures.SalesCount );

Everything seems to be fine so far. But if we will put the SalesAverage measure into the report which crossjoins several attributes using NON EMPTY, we will find out that even though the result of the report is small, it still takes too long. This is, of course, a well known situation, and NON_EMPTY_BEHAVIOR performance hint is the usual way of dealing with it. So everybody's first impulse would be to define NON_EMPTY_BEHAVIOR either to Measures.SalesAmount or to Measures.SalesCount. But is it correct ?

Before reading any further, it is strongly advised that the reader familiarizes himself with the section of Performance Guide which explains NON_EMPTY_BEHAVIOR taxonomy.

The answer to this question is "it depends". It depends on how SalesCount was built and what it was built on.  But first let's review what would happen when SalesCount has a value of 0. Then according to our expression, the SalesAverage is NULL. This violates the contract of the second taxonomy of NON_EMPTY_BEHAVIOR which states that calculated measure is NULL if and only if the measure defined as NON_EMPTY_BEHAVIOR is NULL. Here we violate the "and only if" part of this contract. Indeed, we got SalesAverage to be NULL while the base measure is not NULL - it is 0, which is different from NULL. Trying to force NON_EMPTY_BEHAVIOR here would result in undeterministic wrong results! The irony is that the other taxonomy of NON_EMPTY_BEHAVIOR is not as restrictive and only requires calculation result to be NULL if the expression of NON_EMPTY_BEHAVIOR is NULL. It is certainly my wish and desire that both taxonomies get unified and both not require the "and only if" part. Since this is not a very difficult change, I hope that this improvement will be implemented in the next version of Analysis Services. I would definitely vote for this feature on connect. Anyway, getting back to our problem that we have to solve with the current version. If SalesCount can never be 0, but only NULL, then we do not have a problem. If the SalesCount was defined as a measure with AggregationFunction=Count, counting transactions in the fact table - it seems to be safe. Even if it was defined as AggregationFunction=Sum on top of column in the fact table which never gets value of 0, then it is safe again. Unfortunately, in our case, the SalesCount measure could have values of 0 in the fact table.

One proposed solution to resolve this was to rewrite the MDX as following

Create SalesAverage = IIF( NOT IsEmpty(Measures.SalesCount) AND Measures.SalesCount = 0, 
  0, 
  Measures.SalesAmount / Measures.SalesCount );

This trick guarantees, that when SalesCount is equal to 0 (but not NULL), the ratio translates into 0, therefore it is safe to define NON_EMPTY_BEHAVIOR = Measures.SalesCount. However, the business people didn't accept such modification. They wanted the averages to be NULL and filtered out by NON EMPTY whenever SalesCount was either 0 or NULL.

Given that requirement, I proposed another solution.

1. Go to DSV, and insert the named calculation which converted 0's in the fact table for SalesCount column to NULL

2. Changed the source of the SalesCount measure to the named calculation created as step 1

3. Changed SalesCount's NullProcessing property to 'Preserve' value

Making these changes ensured that SalesCount measure will never be 0 in the cube, since all the 0's were translated to NULLs and measure was set up to preserve these NULLs. Now finally it was OK to define NON_EMPTY_BEHAVIOR to be Measures.SalesCount.

But this is not the end of it. Now we notice that since SalesCount is never zero, we will never get division by zero. SalesCount still can be NULL though, and so we still have to deal with division by NULL. But while NULL behaves similarly to zero in arithmetic computations, they are not identical. The most important difference is that 0/0 results in undefined value #1.IND, but NULL/NULL results in NULL. Since SalesAmount and SalesCount come from the same measure group, when the transaction is missing from the fact table - both measures are NULL together, so we are in the NULL/NULL case. This leaves only scenarios when SalesCount is 0 in the fact table. But when the SalesCount is 0, the SalesAmount is 0 as well, since we can only get zero money if we sold zero items. Therefore doing the same trick in DSV named calculation and NullProcessing=Preserve, we get SalesAmount to be NULL always when SalesCount is NULL.

Therefore we don't even need to check for the division by zero, since it is always safe. The final MDX Script would be:

CREATE MEMBER CurrentCube.Measures.SalesAverage AS 
  Measures.SalesAmount / Measures.SalesCount
  , NON_EMPTY_BEHAVIOR = Measures.SalesCount;
Published Tuesday, March 06, 2007 7:58 PM by mosha
Filed under: ,
Anonymous comments are disabled
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement