THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Microsoft OLAP by Mosha Pasumansky

Optimizing MDX aggregation functions

One of the most significant changes in Analysis Services 2008 was improving performance of MDX queries and calculations. In particular, query optimizer can choose block (a.k.a. subspace) computation mode in query plan more often than in AS2005, and usually using block/subspace computation mode brings performance orders of magnitude better than without it. However, even in AS2008, query optimizer is not always capable of using block mode. The “Performance Improvements for MDX in SQL Server 2008 Analysis Services” whitepaper documents which MDX constructs are optimized and which not, and MDX Studio automates process of analyzing MDX and figuring out which fragments are not optimized, and offers advice how to optimize them. In this article we will cover techniques related to the optimization of aggregation functions in MDX, i.e. Sum, Min, Max, Aggregate and (in AS2008) Avg. All examples were done on AS2008 version of Adventure Works (some of these techniques will work with AS2005, but others won’t).

1. Aggregating over Filter

Scenario: We want to compute average sales for the products which increased its sales since same date last month. The normal approach is to use Filter function to determine which products grew their sales, and then apply Avg to the resulting set. Now applying this to every day in the Year 2003, we will get

WITH 
  MEMBER [Measures].AvgGrowingProducts AS 
    Avg
    (
      Filter
      (
        [Product].[Product].[Product].MEMBERS
       ,[Measures].[Sales Amount] > ([Measures].[Sales Amount],ParallelPeriod([Date].[Calendar].[Month]))
      )
     ,[Measures].[Sales Amount]
    ) 
SELECT 
  [Measures].AvgGrowingProducts ON 0
 ,Descendants
  (
    [Date].[Calendar].[Calendar Year].&[2003]
   ,[Date].[Calendar].[Date]
  ) ON 1
FROM [Adventure Works];

Running this query took more than 7 seconds on my laptop, and the culprit is clear from the perfmon counters – there were 295011 cells calculated – this is a sign of cell-by-cell iterations as opposed to working in the block mode. And if we run “Analyze” function of MDX Studio, it will tell us why this is happening: Function ‘Filter’ was used inside aggregation function – this disables block mode. How do we get rid of Filter ? I actually has written about it before – in the “Optimizing Count(Filter(...)) expressions in MDX” with respect to the Count function, but same technique can be applied to any aggregation function, including Avg (in AS2008). The idea is simple – even though Filter reduces the set of products, in order to compute the Filter in the first place, we really have to check sales of every single product, so we may as well just run Avg over all products, but with smart expression which will return NULL in case product should not be included – and since Avg ignores NULLs, this will work correctly. Here is how rewritten MDX will look like:

WITH 
  MEMBER [Measures].Growth AS 
    IIF
    (
      [Measures].[Sales Amount] > ([Measures].[Sales Amount] ,ParallelPeriod([Date].[Calendar].[Month]))
     ,[Measures].[Sales Amount]
     ,NULL
    ) 
   ,FORMAT_STRING = 'Currency' 
  MEMBER [Measures].AvgGrowingProducts AS 
    Avg
    (
      [Product].[Product].[Product].MEMBERS
     ,[Measures].Growth
    ) 
SELECT 
  [Measures].AvgGrowingProducts ON 0
 ,Descendants
  (
    [Date].[Calendar].[Calendar Year].&[2003]
   ,[Date].[Calendar].[Date]
  ) ON 1
FROM [Adventure Works];

Now it executes in mere 200 ms, and number of cells calculated is 365 – exactly the same number as number of cells in the result – which is the best theoretical value we can get !

2. Aggregating over NonEmpty

Scenario: Compute average sales for all products. The solution is straightforward:

WITH 
  MEMBER [Measures].AvgProductSales AS 
    Avg
    (
      [Product].[Product].[Product].MEMBERS
     ,[Measures].[Sales Amount]
    ) 
SELECT 
  [Measures].AvgProductSales ON 0
 ,[Date].[Date].[Date].MEMBERS ON 1
FROM [Adventure Works];

It is very efficient – only 100 ms and single SE query. But what if someone wants to optimize it even more ? Is this possible ? Poking with MDX Studio expression debugger, we can notice, that the sets over which Avg is run are pretty sparse. For example, on August 7, 2001 – there were only 3 products which had sales, the rest was empty. Year later, on August 7, 2002 there were only 5 products which sold. So, it might be tempting to add NonEmpty function over the products, in a hope to reduce the size of the set before it is fed to Avg:

WITH 
  MEMBER [Measures].AvgProductSales AS 
    Avg
    (
      NonEmpty
      (
        [Product].[Product].[Product].MEMBERS
       ,[Measures].[Sales Amount]
      )
     ,[Measures].[Sales Amount]
    ) 
SELECT 
  [Measures].AvgProductSales ON 0
 ,[Date].[Date].[Date].MEMBERS ON 1
FROM [Adventure Works];

This turns out to be a big mistake. The execution time jumps 18-fold to 1 sec 796 ms, and the all the perfmon counters jump as well, with number of SE queries going to 1189 from 1. I have written about this before, but it is worthwhile to reiterate. Unfortunately, this kind of wrong advice “reduce the space with NonEmpty before applying computation” is a popular myth. I see it mentioned in various presentations, and it even shows up in otherwise good “MDX Performance Hints” document. The irony is that the tip which is supposed to improve performance, actually makes it worse. The reason here is twofold:

1) Since NonEmpty is used inside calculated member, it will be computed every time this calculated member is invoked. No block mode. Each call to NonEmpty triggers at least one SE query. This is why we see so many SE queries in perfmon – one for every cell which uses AvgProductSales calculated member.

2) The result of NonEmpty could be a set of any shape and form. It is very difficult for Avg to work with such unpredictable sets. In our example we got lucky, that since we only had single hierarchy inside NonEmpty – we didn’t end up with arbitrary shape set – otherwise performance would’ve been even worse.

So the guidance is to never use NonEmpty inside calculations, but it is OK to use NonEmpty while building axes or named sets – because then it will be only called once and not for every cell.

3. Aggregating over Union

Scenario: Running sum by day. Typical and very sensible approach to this problem is to call Sum over all the preceding days. Assuming that our calculation will only ever be called on Date attribute (this is easy to ensure with SCOPE assignment, but for simplicity I will put it here as regular calculated member):

WITH 
  MEMBER [Measures].RunSales AS 
    Sum
    (
      NULL : [Date].[Date].CurrentMember
     ,[Measures].[Sales Amount]
    ) 
SELECT 
  [Measures].RunSales ON 0
 ,[Date].[Date].[Date].MEMBERS ON 1
FROM [Adventure Works];

This works OK – about 2 seconds execution time, and in block mode. We still want to optimize it. The common optimization technique for running sum is to do summation at higher grains whenever possible. I.e. instead of running over days, we would like to run over months until the current month, and only from there over the remaining days in this month. This approach makes a lot of sense, but the implementation should be done carefully. Let’s see what will happen if we modify the set to be union of months and days:

WITH 
  MEMBER [Measures].RunSales AS 
    Sum
    (
      Union
      (
        NULL : [Date].[Calendar].Parent.PrevMember
       ,
        [Date].[Calendar].FirstSibling : [Date].[Calendar].CurrentMember
      )
     ,[Measures].[Sales Amount]
    ) 
SELECT 
  [Measures].RunSales ON 0
 ,[Date].[Date].[Date].MEMBERS ON 1
FROM [Adventure Works];

The results are disastrous ! The execution time rose above 3 seconds, but more alarmingly, both number of cells calculated and SE queries jumped to thousands. Why is that ? After all, if we read the official documentation, it says that Sum over Union is in fact optimized for block mode, but the results we see here clearly show the opposite. The root cause is a little bit deeper. It is true that Sum(Union()) is optimized, but one of the other conditions for aggregation functions to work optimally, is to have the input set in a good shape, i.e. not arbitrary shaped set. Our Union combines sets from different granularities, so we end up with the mixed grain set, which is classified as arbitrary shape. How can we fix it ? One way is to break single Sum into two Sum’s, each one over the set of uniform single grain, i.e.

WITH 
  MEMBER [Measures].RunSales AS 
      Sum
      (
        NULL : [Date].[Calendar].Parent.PrevMember
       ,[Measures].[Sales Amount]
      )
    + 
      Sum
      (
        [Date].[Calendar].FirstSibling : [Date].[Calendar].CurrentMember
       ,[Measures].[Sales Amount]
      ) 
SELECT 
  [Measures].RunSales ON 0
 ,[Date].[Date].[Date].MEMBERS ON 1
FROM [Adventure Works];

Now the results are great – the runtime is about 400 ms, and other stats look good too. It is possible to optimize it even further using techniques described in the “Take advantage of FE caching to optimize MDX performance” article. We notice that the first component of plus is the same for all days in the month, so we can take advantage of it by caching it at the first day and reusing for all other days as following:

WITH 
  MEMBER [Measures].RunMonthSales AS
      Sum
      (
        NULL : [Date].[Calendar].CurrentMember
       ,[Measures].[Sales Amount]
      )
  MEMBER [Measures].RunSales AS 
      ([Measures].RunMonthSales, [Date].[Calendar].Parent.PrevMember)
    + 
      Sum
      (
        [Date].[Calendar].FirstSibling : [Date].[Calendar].CurrentMember
       ,[Measures].[Sales Amount]
      ) 
SELECT 
  [Measures].RunSales ON 0
 ,[Date].[Date].[Date].MEMBERS ON 1
FROM [Adventure Works];

This gives better runtime of 280 ms, on bigger cubes the difference will be more significant.

4. Aggregating over Exists or CrossJoin ?

To be written when I get more time.

5. Aggregting over Descendants or EXISTING ?

To be written when I get more time.

Published Wednesday, October 22, 2008 1:32 PM by mosha
Filed under: ,
Anonymous comments are disabled
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement