THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Microsoft OLAP by Mosha Pasumansky

Moving averages in MDX

Moving averages is an important statistical technique for analyzing time series data. It is often used to do forecasting or trend estimations, and it is especially popular for analyzing financial data. There are many different types of moving averages, from simple to weighted to exponential etc. "Moving average" article in Wikipedia is a good starting point to learn about them. In this article we will discuss the implementation of moving averages in MDX.

Simple Moving Average

The classic way to compute moving average is explained in my "Fast Track to MDX" book, chapter 8. Translated to Adventure Works example, the moving average over last 12 months for the [Internet Sales Amount] can be written as following:

Avg(
  [Ship Date].[Month Name].CurrentMember.Lag(11):[Ship Date].[Month Name],
  [Measures].[Internet Sales Amount]
)

Here we use Lag function to go 11 months back, then put a range from that 11 months ago month to the current one, and compute average over it. To get a sense how moving average behaves when we look at last 12, 6 or 3 months, we can run the following query:

WITH 
 MEMBER [Measures].[SMA12] AS Avg(
  [Ship Date].[Month Name].CurrentMember.Lag(11):[Ship Date].[Month Name],
  [Measures].[Internet Sales Amount]
 )
 MEMBER [Measures].[SMA6] AS Avg(
  [Ship Date].[Month Name].CurrentMember.Lag(5):[Ship Date].[Month Name],
  [Measures].[Internet Sales Amount]
 )
 MEMBER [Measures].[SMA3] AS Avg(
  [Ship Date].[Month Name].CurrentMember.Lag(2):[Ship Date].[Month Name],
  [Measures].[Internet Sales Amount]
 )
SELECT 
 {[Measures].[Internet Sales Amount], [Measures].[SMA12], [Measures].[SMA6], [Measures].[SMA3]} ON 0
 ,[Ship Date].[Calendar].[Month] ON 1
FROM [Adventure Works]

And it's result can be seen in the graph below :

The above formulas for moving average are straightforward. In fact, the BIDS in the Calculation tab, at the left side of the screen has a Templates tab, which contains templates for common calculations. Moving average is among them, and the MDX code in the template is the same as above:

//
/*Returns the average value of a member over a specified time interval.*/
CREATE MEMBER CURRENTCUBE.[MEASURES].[Moving Average]
AS Avg
(
   [<<Target Dimension>>].[<<Target Hierarchy>>].CurrentMember.Lag(<<Periods to Lag>>) :
   [<<Target Dimension>>].[<<Target Hierarchy>>].CurrentMember,
   [Measures].[<<Target Measure>>]
)
// This calculation returns the average value of a member over the specified time interval.
            ,
FORMAT_STRING = "Standard";

The use of MDX function Avg here is justified, because we are talking about moving average. However, we should note how the Avg function treats cells with NULL value. For the compatibility with Excel, when the cell contains no value, i.e. it contains NULL, Avg treats it as if the cell didn't exist. I.e. it doesn't count this cell in the total number of cells to be used in the denominator. If such behavior is not desirable, instead of Avg(set, exp) one could write Sum(set, exp)/Count(set). Our example then will become

Sum(
  [Ship Date].[Month Name].CurrentMember.Lag(11):[Ship Date].[Month Name],
  [Measures].[Internet Sales Amount]
) 
/
Count([Ship Date].[Month Name].CurrentMember.Lag(11):[Ship Date].[Month Name])

Or, if it is OK to ignore the values near the beginning of the Time dimension, then instead of Count() we can write the constant 12 in the denominator

Sum(
  [Ship Date].[Month Name].CurrentMember.Lag(11):[Ship Date].[Month Name],
  [Measures].[Internet Sales Amount]
) / 12 

If the difference in semantics w.r.t. NULL treatment between Avg and Sum is not important (for example, it is known that NULLs won't happen anyway), then Sum is always preferable in Analysis Services 2005, since it can have better performance.  

Speaking of performance, let's see how well this expression behaves. Teo Lachev have provided the query below as an Adventure Works compatible example showing problematic performance:

WITH MEMBER [Measures].[SMA12] AS
 Sum
 (
        [Ship Date].[Calendar].CurrentMember.Lag(11):[Ship Date].[Calendar],
        [Measures].[Internet Sales Amount]
 )
  /
 Count([Ship Date].[Calendar].CurrentMember.Lag(11):[Ship Date].[Calendar])
SELECT [Measures].[SMA12] ON 0,
[Customer].[Customer].[Customer].Members on 1
FROM [Adventure Works]
WHERE ([Ship Date].[Calendar].[Month].&[2004]&[8])

This query executes for about 13 seconds on my laptop. This is definitely too long - there are about only 18,484 customers in Adventure Works, and since we are getting data for 12 months - it would still be about 17,000 cells per second, which is suspiciously low. What is even more interesting, is if we compare this to the calculation for running sum from my blog entry, we will get the following queries to compare:

WITH MEMBER [Measures].[Sum 12] AS
Sum
(
        [Ship Date].[Calendar].CurrentMember.Lag(11):[Ship Date].[Calendar],
        [Measures].[Internet Sales Amount]
)
SELECT [Measures].[Sum 12] ON 0,
[Customer].[Customer].[Customer].Members on 1
FROM [Adventure Works]
WHERE ([Ship Date].[Calendar].[Month].&[2004]&[8])

WITH MEMBER [Measures].[RSum] AS
Sum
(
        [Ship Date].[Calendar].[Month].Item(0):[Ship Date].[Calendar],
        [Measures].[Internet Sales Amount]
)
SELECT [Measures].[RSum] ON 0,
[Customer].[Customer].[Customer].Members on 1
FROM [Adventure Works]
WHERE ([Ship Date].[Calendar].[Month].&[2004]&[8])

It is immediately clear from here that the only difference between the two queries is .Item(0) instead of Lag(11), i.e. running sum query sums up all the months from the beginning of the time, while the first query only looks at most recent 12 months. The common sense would tell us that the first query should execute faster, because it looks at less data (12 months vs 38 months). Yet, the first query takes 13 seconds, and the second one takes only 1.3 seconds - completely against the common sense.

To understand what's going on here, let's take a look at the performance counters. The interesting counter here to watch is MSAS 2005: MDX\Total calculation covers. When we execute the first query, it goes to 18487. But for the second query it stays low at 4. The high number of calculation covers is a clear indication that something went wrong with the execution plan. As we know, AS2005 query optimizer has two major modes in which it evaluates subspaces - the one I used to call "bulk evaluation mode" (it Katmai marketing materials this mode received an official name - "block computation") and cell by cell mode. (If you are interested in detailed explanations on these modes and other internals of the execution plans it might be worthwhile attending my preconference session "Deep Dive into MDX" on PASS 2007 summit in two weeks). But even in the cell by cell mode, the number of calculation covers should have been smaller. Number 18487 is suspiciously close to the number 18484, which is number of customers, which leads us to assumption that there was calculation cover created for every customer. Running query trace with "Query Subcube Verbose" enabled confirms this assumption - we see SE subcube query generated for every single customer. The conclusion is that use aggregation function such as Aggregate or Sum over a range set which uses Lag function produces horrible execution plan. Lag by itself is not a bad function, but in this combination, AS2005 misses good query plan. This particular scenario, as well as many others is fixed in Katmai, but what can we do in the meantime ? We could work around use of Lag function and use another MDX function which will position us on the desired member. The most obvious function to do that is ParallelPeriod.

The query rewritten to use ParallelPeriod will look the following:

WITH MEMBER [Measures].[SMA12] AS
 Sum
 (
        ParallelPeriod([Ship Date].[Calendar].[Month],11):[Ship Date].[Calendar],
        [Measures].[Internet Sales Amount]
 )
 /
 Count(ParallelPeriod([Ship Date].[Calendar].[Month],11):[Ship Date].[Calendar])
SELECT [Measures].[SMA12] ON 0
,[Customer].[Customer].[Customer].Members on 1
FROM [Adventure Works]
WHERE ([Ship Date].[Calendar].[Month].&[2004]&[8])

And it executes in about 1 second. 

Weighted Moving Average

Weighted moving average (WMA) is moving average where the values for the previous time periods are multiplied by weight before being summed up. Usually, the more recent time periods get bigger weight. One very common distribution of weights is when they decrease arithmetically, i.e. WMA is computed by the following formula: 

 

The denominator is a simple arithmetic progression which can be calculated using the following formula:

 

How can we implement this formula in MDX ? There are several possible solutions. The most straightforward one is during iteration over the set to check the position inside the set and multiply by the appropriate weight. AS2005 added a new MDX function to get the index of tuple during iteration over the set - CurrentOrdinal. For the 6 month WMA the MDX will look like following:

WITH MEMBER Measures.WMA6 AS
  Sum(
    ([Ship Date].[Calendar].Lag(6):[Ship Date].[Calendar].PrevMember) as Iter,
    Iter.CurrentOrdinal * [Measures].[Internet Sales Amount]
  ) / (6*7/2)
  , FORMAT_STRING = 'Currency'
SELECT {[Measures].[Internet Sales Amount], Measures.WMA6} ON 0
, [Ship Date].[Calendar].[Month].MEMBERS on 1
FROM [Adventure Works]

Here n=6, therefore n*(n+1)/2 = 21. However, this gives inaccurate results at the beginning of Time, for the first 5 months, since the set actually contains less then 6 members in it. If this is important, the formula can be rewritten to be more precise:

WITH 
 MEMBER Measures.WMA6 AS
  Sum(
    ([Ship Date].[Calendar].Lag(6):[Ship Date].[Calendar].PrevMember) as Iter,
    Iter.CurrentOrdinal * [Measures].[Internet Sales Amount]
  ) 
  /
  (  Count([Ship Date].[Calendar].Lag(6):[Ship Date].[Calendar].PrevMember)
   *(Count([Ship Date].[Calendar].Lag(6):[Ship Date].[Calendar].PrevMember)+1)
   /2
  )
  , FORMAT_STRING = 'Currency'
SELECT {[Measures].[Internet Sales Amount], Measures.WMA6} ON 0
, [Ship Date].[Calendar].[Month].MEMBERS on 1
FROM [Adventure Works]

We can improve this a little by explicitly caching the common subexpression so it will be calculated only once (more on caching techniques during "Deep dive to MDX" PASS preconference session):

WITH 
 MEMBER Measures.IterCount AS Count([Ship Date].[Calendar].Lag(6):[Ship Date].[Calendar].PrevMember), VISIBLE = false
 MEMBER Measures.WMA6 AS
  Sum(
    ([Ship Date].[Calendar].Lag(6):[Ship Date].[Calendar].PrevMember) as Iter,
    Iter.CurrentOrdinal * [Measures].[Internet Sales Amount]
  ) 
  /
  (IterCount*(IterCount+1)/2)
  , FORMAT_STRING = 'Currency'
SELECT {[Measures].[Internet Sales Amount], Measures.WMA6} ON 0
, [Ship Date].[Calendar].[Month].MEMBERS on 1
FROM [Adventure Works]

Now, let's play a closer attention to the results of this query. If we look at the numbers, we will notice, that they seem to be way too big. In fact they are exactly 10,000 times bigger than what they are supposed to be ! This is a known bug in AS2005 with respect to the mathematical operations with Currency data type. Until this bug is fixed, we will have to compensate for it by dividing the result by 10,000:

WITH MEMBER Measures.WMA6 AS
  Sum(
    ([Ship Date].[Calendar].Lag(6):[Ship Date].[Calendar].PrevMember) as Iter,
    Iter.CurrentOrdinal * [Measures].[Internet Sales Amount]
  ) / 21 / 10000
  , FORMAT_STRING = 'Currency'
SELECT {[Measures].[Internet Sales Amount], Measures.WMA6} ON 0
, [Ship Date].[Calendar].[Month].MEMBERS on 1
FROM [Adventure Works]

Exponential  Moving Average

Exponential moving average (EMA) is a special case of WMA, where the weights decrease exponentially. There are few variations on the exact formula, below is one common definition:

The parameter "alpha" is called smoothing factor, and it relates to the number of periods to go back in the moving average computation as

 

Since the above definition is a recursive one, we can write this in MDX using recursion as well:

WITH 
 MEMBER Measures.Alpha AS 2/(6+1)
 MEMBER Measures.EMA6 AS 
  IIF( [Ship Date].[Calendar].CurrentMember IS [Ship Date].[Calendar].CurrentMember.Level.Members(0)
       ,[Measures].[Internet Sales Amount]
       ,[Measures].[Internet Sales Amount]*Alpha + (Measures.[EMA6], [Ship Date].[Calendar].PrevMember)*(1-Alpha) )
  , FORMAT_STRING = 'Currency'
SELECT {[Measures].[Internet Sales Amount], Measures.EMA6} ON 0
, [Ship Date].[Calendar].[Month].MEMBERS on 1
FROM [Adventure Works]

Recursive calculations are not efficient in AS2005, since query optimizer doesn't know how to optimize them. In order to improve performance, the EMA can be rewritten using non-recursive approach. The math works out to the following formula:

I leave the translation of this formula into MDX as an exercise for the reader.

The chart below shows comparison between SMA, WMA and EMA in a graphical way.

Published Tuesday, September 04, 2007 2:29 PM by mosha
Filed under:
Anonymous comments are disabled
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement