THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
 in Microsoft OLAP by Mosha Pasumansky (Entire Site) Search

Gross margin - dense vs. sparse block evaluation mode in MDX

Gross margin (also known as Gross profit margin or Gross profit rate) is defined as (Revenue – Cost of Sales)/Revenue. In terms of Adventure Works sample database we can write this in MDX as

`[Gross Margin] = ([Measures].[Internet Sales Amount] - [Measures].[Internet Total Product Cost]) / [Measures].[Internet Sales Amount];`

While this expression is simple enough, it might be tempting to try to optimize it. If we look at the evaluation tree for this expression – we will see 5 nodes in it: Fetching values for Internet Sales Amount twice – because it appears twice in the expression, fetching values for Total Cost, one minus operation and one division operation. Accessing Internet Sales Amount twice is not really a problem, because second access is going to come from cache anyway, but one might think that removing this extra operation would still improve thing, if only a little.

So, perhaps we can rewrite the formula using simple math equivalence: (a-b)/a = 1 – b/a

The first problem with this rewrite is that it is not always correct. In math it relies on the fact that a/a=1, but is is only true when a <> 0. The result of a/a is undefined when a=0. In MDX, 0/0 will also result in undefined number, usually formatted as “-1.#IND”. But more interesting question is what would happen in MDX when a is empty, i.e. has value of NULL. In MDX NULL/NULL = NULL, therefore when both Sales and Cost are empty, i.e. there is no record in the fact table, the two formulas are going to give different results.

```(a-b)/a = NULL when a = NULL and b = NULL
1 – b/a = 1    when a = NULL and b = NULL```

So at very least someone first need to set exact definition of Gross margin at the coordinates where no sales occurred. Is Gross margin 100% there or is it empty as well ?

But I want to direct your attention to the performance aspect of this change. Remember, it seemed that 1-b/a would perform a little bit better – is it really so ? Let’s compare both approaches side by side in AS2005 using MDX Studio.

```// Gross margin using 1-a/b
WITH
MEMBER [Gross Margin] AS
1
-
[Measures].[Internet Total Product Cost]
/
[Measures].[Internet Sales Amount]
, FORMAT_STRING = 'Percent'
MEMBER [Max Gross Margin] AS
Max
(
(
[Customer].[Customer Geography].[Customer]
,[Product].[Product Categories].[Subcategory]
,[Date].[Calendar].[Calendar Year]
)
,[Measures].[Gross Margin]
)
SELECT
[Measures].[Max Gross Margin] ON 0
```Time              : 6 sec 640 ms
Calc covers       : 6
Cells calculated  : 1
Sonar subcubes    : 2
NON EMPTYs        : 0
Autoexists        : 1
EXISTINGs         : 0
SE queries        : 2
Cache hits        : 3
Cache misses      : 1
Cache inserts     : 1
Cache lookups     : 4
Memory Usage KB   : 88840```
```WITH
MEMBER [Gross Margin] AS
(
[Measures].[Internet Sales Amount]
-
[Measures].[Internet Total Product Cost]
)
/
[Measures].[Internet Sales Amount]
, FORMAT_STRING = 'Percent'
MEMBER [Max Gross Margin] AS
Max
(
(
[Customer].[Customer Geography].[Customer]
,[Product].[Product Categories].[Subcategory]
,[Date].[Calendar].[Calendar Year]
)
,[Measures].[Gross Margin]
)
SELECT
[Measures].[Max Gross Margin] ON 0
```Time              : 234 ms
Calc covers       : 7
Cells calculated  : 1
Sonar subcubes    : 1
NON EMPTYs        : 0
Autoexists        : 0
EXISTINGs         : 0
SE queries        : 3
Cache hits        : 3
Cache misses      : 1
Cache inserts     : 1
Cache lookups     : 4
Memory Usage KB   : 0```

The results come shockingly different. Almost 7 seconds for the “optimized” approach, and mere 234 ms for the original approach. Let’s try to understand why. First let’s compare the number of SE queries. It is 3 in the (a-b)/a approach, and 2 in the 1-b/a – so our optimization to reduce number of SE queries to fetch Internet Sales Amount indeed worked (and as we can see from hierarchical profiling), on cold cache run, only first SE query went for partitions, the other two came from cache).

Usually, when we get such a big difference in the results, we suspect that slow query executes in the cell-by-cell mode, and fast query executes in the block mode. However, there is no reason to believe that the slow query went through cell-by-cell mode. Both expressions use exactly same operands: minus and divide, and both of these are optimized for block mode. Perfmon counters from the run also don’t give evidence for the cell-by-cell mode. But then, if both queries indeed went through block mode, why one query is so much significantly slower than the other one ?

The answer lies in the density and sparsity characteristics of the space. Let’s take closer look at both evaluation subtrees:

```Operator "/"
|
+-- Operator "-"
|    |
|    +-- SE Data (Internet Sales Amount, Customer, Subcategory, Year)
|    |
|    +-- SE Data (Internet Total Product Cost, Customer, Subcategory, Year)
|
+-- SE Data (Internet Sales Amount, Customer, Subcategory, Year)```

In this tree every leaf element is SE Data, and it is sparse – i.e. even though the query subcubes cover big space, the SE Data node only fetches records that exist in fact table, i.e. non empty data. Both “/” and “-” operators operate efficiently on sparse data, because they can iterate only over non-empty cells (the algorithm for operator “*” was explained here, the algorithms for “/” and “-” are slightly more complicated, but similar).

Now the evaluation subtree for the slow query:

```Operator "-"
|
+-- Constant (1) (Customer, Subcategory, Year)
|
+-- Operator "/"
|
+-- SE Data (Internet Total Product Cost, Customer, Subcategory, Year)
|
+-- SE Data (Internet Sales Amount, Customer, Subcategory, Year)```

Here we have two leaf nodes of SE Data – no problem with them, but another leaf node is constant 1. Since it is constant, it will evaluate the same value over all the cells in its subcube – and this value, 1, is not NULL. Therefore, this node evaluates to non empty result in every single cell. So while the query plan is still using block mode, it doesn’t get much benefit out of it, because the “block” is dense, there is no single empty cell in it, and while algorithm says to iterate over non empty cells, it ends up iterating over every cell, which makes it as slow as if it was in cell by cell mode.

Published Saturday, November 1, 2008 4:23 PM by mosha
Filed under: ,