THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Microsoft OLAP by Mosha Pasumansky

Counting 'in-flight' events in MDX

Time dimension has special meaning in OLAP and DW. The classic problems involve looking at previous period, parallel period, computing moving averages and running sums. Today we will look into less common, but nevertheless interesting problem of 'in-flight' events. This scenario arises when there is more than one date/time attribute associated with the fact. For example, for the marketing campaign we can have creation date, start and end date for the campaign; for airplanes there is departure and arrival time; in Adventure Works for each order we track order creation date, order shipping date and order delivery date. Now, we can put role playing dimensions for Time, and include all of them in the cube. This allows slicing by any one of them and analysis such as - how many orders were created on particular date or date range, how many orders shipped on particular date or date range, how many orders were delivered on particular date or date range etc. We could also load into the cube interesting metrics such as how long did it take to between order shipped until it was delivered etc. But what this article is going to discuss is the following question: On any particular date or date range how many orders are in-delivery, i.e. they were already shipped but weren't delivered yet. We will use Adventure Works cube as an example. Let's start with the report where we want to see how many orders were in-delivery from May 5th 2003 to May 9th 2003. In order to find such orders, we need to look at the orders which had Ship date prior to May 9th, but Delivery date post May 5th. Both of these sets can be written using Range operator:

Ship dates prior to May 9th - NULL : [Ship Date].[Calendar].[Date_].[May 9, 2003]

Delivery dates post May 5th - [Delivery Date].[Calendar].[Date_].[May 5, 2003] : NULL

Since we want to apply both conditions together, we should just CrossJoin them together (it is equivalent of logical AND operator). Now if we put this set into WHERE clause of the MDX SELECT statement, we will get only data which satisfy our requirements:

SELECT Measures.[Internet Transaction Count] ON 0
FROM [Adventure Works]
WHERE (NULL : [Ship Date].[Calendar].[Date_].[May 9, 2003], [Delivery Date].[Calendar].[Date_].[May 5, 2003] : NULL)

Let's discuss the performance of this query. It may look like the query will have to touch a lot of data, because the conditions Ship Date < May 9th and Delivery Date > May 5th cover large set of dates. However, we apply both conditions together, and Analysis Services uses bitmap indexes which will be intersected. Therefore the amount of data to be scanned and therefore the performance of the query directly depends on how big the date range is. It will be the best if we are looking at single day, and it will be the worst if we are looking at all years, because in the later case it will have to scan pretty much the entire fact table.

The query above worked fine, but it had the date range hardcoded. What if we wanted to create a calculated measure which will count number of in-delivery orders dynamically. In order to do that, we need to use another Time dimension relative to which we will calculate this metric. Let's suppose that we will pick order creation date dimension for that purpose. If user has chosen the whole year - we will define date range from the beginning of the year to its end. If it is quarter - then from the beginning of the quarter to the end etc. The MDX functions which allow us to navigate to the beginning and end of arbitrary time period are OpeningPeriod and ClosingPeriod respectively. I.e. to get the first day of the period we will use OpeningPeriod([Date_].[Calendar].[Date_]) and to get the last day we will use ClosingPeriod([Date_].[Calendar].[Date_]). Now we know the days of the interval, but we need to translate them from [Date_] dimension to the corresponding days in [Ship Date] and [Delivery Date] dimensions. The best way to do it in the properly designed Time dimension when the keys are unique inside the level (i.e. when the hierarchy is natural) is to use LinkMember function. LinkMember maps members using their keys. If the keys are not unique in the level (i.e. when the hierarchy is unnatural), the result is unpredictable. Luckily our Calendar hierarchy is OK, so we can use LinkMember(ClosingPeriod([Date_].[Calendar].[Date_]),[Ship Date].[Calendar]) to get to the last day of the selected time period in [Ship Date] dimension, and LinkMember(OpeningPeriod([Date_].[Calendar].[Date_]),[Delivery Date].[Calendar]) to get the first day of the selected time period in [Delivery Date] dimension.

Finally, we need to run Aggregate over the CrossJoin similar to one in the first report. So it seems like the following expression will work

Aggregate(
 CrossJoin(
   NULL:LinkMember(ClosingPeriod([Date_].[Calendar].[Date_]),[Ship Date].[Calendar]),
   LinkMember(OpeningPeriod([Date_].[Calendar].[Date_]),[Delivery Date].[Calendar]):NULL
 ), Measures.[Internet Transaction Count]
)

However, there is an important nuance here. Although it looks like we shifted coordinate in the [Date_] dimension, actually we did it inside [Ship Date] and [Delivery Date] dimensions, and coordinate inside [Date_] remained the same. Since shipment of the order can happen with delay after order was created, we will get skewed results. In the extreme, if there is always at least one day passed before order is shipped - running the above expression at the day level will always result in NULL. Therefore for the above formula to work we need to shift [Date_] dimension to the neutral state - meaning at its Root.

The following expression demonstrates the fix:

Aggregate(
 (
   NULL:LinkMember(ClosingPeriod([Date_].[Calendar].[Date_]),[Ship Date].[Calendar]),
   LinkMember(OpeningPeriod([Date_].[Calendar].[Date_]),[Delivery Date].[Calendar]):NULL,
   Root([Date_])
 ), Measures.[Internet Transaction Count]
)

We can test this expression with the following query

WITH 
MEMBER Measures.ActiveOrders AS 
Aggregate(
 (
   NULL:LinkMember(ClosingPeriod([Date_].[Calendar].[Date_]),[Ship Date].[Calendar]),
   LinkMember(OpeningPeriod([Date_].[Calendar].[Date_]),[Delivery Date].[Calendar]):NULL,
   Root([Date_])
 ), Measures.[Internet Transaction Count]
)
SELECT {Measures.[Internet Transaction Count],Measures.ActiveOrders} ON 0
, Descendants([Date_].[Calendar], [Date_].[Calendar].[Date_], BEFORE) ON 1
FROM [Adventure Works]

 

Published Friday, June 01, 2007 7:20 PM by mosha
Filed under:
Anonymous comments are disabled
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement