As the readers of my blog know, bulk evaluation mode (called "block computation mode" in Katmai) delivers much better performance in MDX than the cell-by-cell evaluation mode. Therefore the most important optimization technique with MDX in Analysis Services is to rewrite MDX in such a way that makes block computations possible. Easy to say, but not always easy to do. With the release of Katmai's CTP5, Microsoft published the BOL article outlining conditions when block computations are and are not possible. Chris Webb picked on this article and mentioned in his blog, that "*the list of set functions is a bit limited (where is Filter?)*". In reply I said that I didn't think that Filter function was that common inside the MDX calculations, but Greg Galloway immediatelly came up with a good example, one that involves Count(Filter(...)). This is indeed a common calculation - every time we want to know how many entities are there which satisfy certain condition. (Greg's example was to find out how many physicians performed 10 or more cases during certain time period.)

Let's build an example using Adventure Works sample cube and see how we can optimize it. In Adventure Works terms, our task would be to find the number of products which had more than 5 orders placed over the Internet. Such calculation can be written as

Count(Filter([Product].[Product].[Product], [Measures].[Internet Order Quantity] > 5))

Indeed, neither in AS2005 nor in AS2008 the Filter function is optimized to work in the block computation mode, therefore the query involving this calculation will execute in the cell by cell mode:

WITH MEMBER [Measures].[High Volume Products Count] AS

Count(Filter([Product].[Product].[Product], [Measures].[Internet Order Quantity] > 5))

SELECT [Customer].[Customer Geography].[Country] ON 0

, [Date].[Calendar].[Date].MEMBERS ON 1

FROM [Adventure Works]

WHERE [Measures].[High Volume Products Count]

If we execute this query in MDX Studio, we will get the following stats:

Time : 30 sec 781 ms

Calc covers : 4

Cells calculated : 4217436

Sonar subcubes : 2

SE queries : 1

Cache hits : 1

Cache misses : 1

Cache inserts : 1

Cache lookups : 2

Memory Usage KB : 4160

The best hint here that this query indeed executed in the cell-by-cell mode (beyong the slow execution time) is value of "Cells calculated" perfmon counter. Now, the way most people approach optimization for such MDX is trying to reduce the number of cells to iterate. One way to do it is to eliminate manually all the empty cells from Filter. I.e., if value of [Internet Order Quantity] is NULL for certain product, it is definitely less than 5. With this in mind, one possible rewrite for the query would be

WITH MEMBER [Measures].[High Volume Products Count] AS
Count(Filter(
Exists([Product].[Product].[Product],,"Internet Sales")
,[Measures].[Internet Order Quantity] > 5))
SELECT [Customer].[Customer Geography].[Country] ON 0
, [Date].[Calendar].[Date].MEMBERS ON 1
FROM [Adventure Works]
WHERE [Measures].[High Volume Products Count]

Here Exists inside Filter eliminates the empty space. Let's run this again in MDX Studio and look at the new stats

Time : 12 sec 46 ms
Calc covers : 954
Cells calculated : 50036
Sonar subcubes : 7900
SE queries : 7899
Cache hits : 7899
Cache misses : 1
Cache inserts : 1
Cache lookups : 7900
Memory Usage KB : 43936

The time indeed decreased, from 30 seconds to 12 seconds. This is good, but something doesn't add up. The number of cells went down from 4 million to only 50 thousands, i.e. about 85 times, yet the execution time decreased by only 2 times. The explanation is that putting Exists (or NonEmpty) inside MDX calculation is usually a bad idea - since now for every cell, there is an SE query being sent (it can be seen through "Query Subcube" event in trace too). In our case there were 7900 such queries sent. Because the set inside Exists was always the same and the context was fixed, only 1 such query actually had to go to disk, and other 7899 hit the cache, yet the overhead of issuing SE query is non-trivial, even when it is answered from the cache.

Conclusion is that even though we get some performance gain from this approach, it doesn't play well with the rest of the system, and only drives us further from the goal of switching to the superior block computation mode. We need to rewrite the calculation in such a way that we eliminate Filter. Fortunately, it is possible to do. Let's recall that Count function returns number of tuples in the set, and Filter return set of tuples which satisfy certain condition. I.e. we are counting how many tuples satisfy a condition. If we convert count to sum, and sum up 1's every time when the condition is met and 0's every time when the condition is not met, we will get the same result. I.e.

Count(Filter(set, condition)) = Sum(set, Iif(condition, 1, 0))

Using this formula we can rewrite our calculation as following:

WITH MEMBER [Measures].[High Volume Products Count] AS
Sum(
[Product].[Product].[Product],
Iif([Measures].[Internet Order Quantity] > 5,1,0))
SELECT [Customer].[Customer Geography].[Country] ON 0
, [Date].[Calendar].[Date].MEMBERS ON 1
FROM [Adventure Works]
WHERE [Measures].[High Volume Products Count]

Executing this query yields the following stats:

Time : 6 sec 375 ms
Calc covers : 4
Cells calculated : 6948
Sonar subcubes : 1
SE queries : 1
Cache hits : 1
Cache misses : 1
Cache inserts : 1
Cache lookups : 2
Memory Usage KB : 0

This is definitely much better. The time is down to 6 seconds, and we can tell that within every cell the Sum works very efficiently, because we only see 6948 cell calculated (this is the exact number of cell in the resulting cellset). Yet, we are not in true block computation mode yet. We are still running Sum for every cell, instead of computing the entire query in one operation. What prevents us from doing it now ? Now the problem is with Iif function inside the Sum. I have written about Iif function and its interaction with block mode in the past. If we reread that article, we will see that we are in the scenario where condition inside Iif looks at the cell values and not at the attribute coordinates, and the only thing we can do here is to have one of the branches to return NULL. And it probably makes sense to define our calculated member to return NULL instead of 0 when there are no products which satisfy our condition. After this rewrite we get

WITH MEMBER [Measures].[High Volume Products Count] AS
Sum(
[Product].[Product].[Product],
Iif([Measures].[Internet Order Quantity] > 5,1,NULL))
SELECT [Customer].[Customer Geography].[Country] ON 0
, [Date].[Calendar].[Date].MEMBERS ON 1
FROM [Adventure Works]
WHERE [Measures].[High Volume Products Count]

This gets us down to 4 seconds, which is yet another improvement, but still doesn't yet reach the best execution plan. Now it is time to use the performance optimization hints. Looking at the expression Iif([Measures].[Internet Order Quantity] > 5,1,NULL) we observe that it is guaranteed to be NULL when [Measures].[Internet Order Quantity] is NULL. Therefore we can define the MDX script flavor of NON_EMPTY_BEHAVIOR for it. Since NEB cannot be defined on subexpressions, we will separate it into special calculated measure. Let's write the following fragment inside MDX Script:

CREATE HIDDEN Summator;
[Measures].[Summator] = Iif([Measures].[Internet Order Quantity] > 5,1,NULL);
NON_EMPTY_BEHAVIOR([Measures].[Summator]) = [Measures].[Internet Order Quantity];

And then use it inside our calculation:

WITH
MEMBER [Measures].[High Volume Products Count] AS
Sum([Product].[Product].[Product], [Measures].[Summator])
SELECT [Customer].[Customer Geography].[Country] ON 0
, [Date].[Calendar].[Date].MEMBERS ON 1
FROM [Adventure Works]
WHERE [Measures].[High Volume Products Count]

Finally, when we execute this query, it finishes in about 0.2 second (218 milliseconds). This is the performance we expect to observe from the block computation mode. From 32 seconds to 0.2 seconds - improvement of over 160 times !

Now the even better news are, that starting with Katmai November CTP5 (which was released couple of days ago, so I finally can publicly speak about it), the trick with explicitly defining NON_EMPTY_BEHAVIOR is not required, the engine recognizes it itself. Therefore, even our previous query (the one which took 4 seconds in AS2005) returns in 0.2 seconds in AS2008. (This improvement in query optimizer in Katmai is just a tiny little tip of the iceberg around improvements in block computation query plans, and I intend to cover this subject in more depth in the upcoming blogs, if I will have enough time for that).