THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Microsoft OLAP by Mosha Pasumansky

MDX and partitioning

Partitions have been a built-in feature of Analysis Services ever since the first version. Naturally, all the engine subsystems, including the MDX query optimizer, work well with partitions. There are rare cases, however, when over-aggressive prefetching can generate query plan which is not optimal with respect to the partitioning scheme. Today we will review one such scenario and see how MDX could be rewritten to suit the partitioning scheme better.

Let's consider scenario where we need to determine the last date for which there is data in certain measure group. The example will be built around Adventure Works sample database, using Date dimension and Internet Sales Amount as a measure.

One more thing to note: while writing this post I used MDX Studio which simplifies many operations, so when below I say "look at the set", it means inspecting it in the Watch window of MDX Studio, when I say "clear the cache", it means using MDX Studio's cache clear button etc. The only other tool that needs to be run is Profiler, since AS traces are not yet integrated into MDX Studio, but the MDX Studio development team is working on this feature. Also, in order to be able to run MDX Script statements from within MDX Studio, I added "Cube=Adventure Works" in the Connection Properties field of the Connect dialog.

Due to limitations of sqlblog.com, in all examples below remove the space before the square braket in the [Date ] fragments

The most straightforward way to compute the last date is by using the following formula:

CREATE SET CurrentCube.LastDate as Tail(NonEmpty([Date ].[Date ].[Date ], [Measures].[Internet Sales Amount]), 1)

We can see that the result set will have a single member in it - July 31, 2004. But if we execute this statement over the clean cache, we can see the following events in the trace:

Query Begin    0 - MDXQuery    MDX Studio v0.2.6.0        
Progress Report Begin    14 - Query    Started reading data from the 'Internet_Sales_2001' partition.  
Progress Report Begin    14 - Query    Started reading data from the 'Internet_Sales_2002' partition.  
Progress Report Begin    14 - Query    Started reading data from the 'Internet_Sales_2003' partition.  
Progress Report End    14 - Query    Finished reading data from the 'Internet_Sales_2001' partition.   
Progress Report Begin    14 - Query    Started reading data from the 'Internet_Sales_2004' partition.  
Progress Report End    14 - Query    Finished reading data from the 'Internet_Sales_2002' partition.  
Progress Report End    14 - Query    Finished reading data from the 'Internet_Sales_2003' partition.  
Progress Report End    14 - Query    Finished reading data from the 'Internet_Sales_2004' partition.  
Query End    0 - MDXQuery   

We see that all partitions got queried, even though the last non empty date is in 2004, and therefore it should've been enough to query just the Internet_Sales_2004 partition. Unfortunately, the query optimizer gets carried away in this example, decides to ignore the outer Tail(..., 1) and executes NonEmpty over all dates literally over all dates. So, we need a different strategy here. The second obvious strategy for finding last non empty date is the recursive approach - start from the last date in the dimension and move back by one date until we run into a date which has data. The formula for that is

CREATE
 MEMBER CurrentCube.LastDateIndex AS
   Iif(
    IsEmpty([Measures].[Internet Sales Amount]), 
    [Date ].[Calendar].PrevMember,
    Rank([Date ].[Calendar].CurrentMember, [Date ].[Calendar].[Date ])
   )
 SET LastDate AS [Date ].[Calendar].[Date ].Item((LastDateIndex, Tail([Date ].[Calendar].[Date ],1).Item(0))-1)

Here we implemented the recursive logic inside calculated member LastDateIndex, and then we position it on the last date (using Tail(1) over level of dates for that). The result is the same as before, but trace looks very different:

Query Begin	0 - MDXQuery	MDX Studio v0.2.6.0		
Query Subcube	2 - Non-cache data August 31, 2004	
Query Subcube	2 - Non-cache data August 30, 2004	
Query Subcube	2 - Non-cache data August 29, 2004	
Query Subcube	2 - Non-cache data August 28, 2004	
Query Subcube	2 - Non-cache data August 27, 2004	
Query Subcube	2 - Non-cache data August 26, 2004	
Query Subcube	2 - Non-cache data August 25, 2004	
Query Subcube	2 - Non-cache data August 24, 2004	
Query Subcube	2 - Non-cache data August 23, 2004	
Query Subcube	2 - Non-cache data August 22, 2004	
Query Subcube	2 - Non-cache data August 21, 2004	
Query Subcube	2 - Non-cache data August 20, 2004	
Query Subcube	2 - Non-cache data August 19, 2004	
Query Subcube	2 - Non-cache data August 18, 2004	
Query Subcube	2 - Non-cache data August 17, 2004	
Query Subcube	2 - Non-cache data August 16, 2004	
Query Subcube	2 - Non-cache data August 15, 2004	
Query Subcube	2 - Non-cache data August 14, 2004	
Query Subcube	2 - Non-cache data August 13, 2004	
Query Subcube	2 - Non-cache data August 12, 2004	
Query Subcube	2 - Non-cache data August 11, 2004	
Query Subcube	2 - Non-cache data August 10, 2004	
Query Subcube	2 - Non-cache data August 9, 2004	
Query Subcube	2 - Non-cache data August 8, 2004	
Query Subcube	2 - Non-cache data August 7, 2004	
Query Subcube	2 - Non-cache data August 6, 2004	
Query Subcube	2 - Non-cache data August 5, 2004	
Query Subcube	2 - Non-cache data August 4, 2004	
Query Subcube	2 - Non-cache data August 3, 2004	
Query Subcube	2 - Non-cache data August 2, 2004	
Query Subcube	2 - Non-cache data August 1, 2004	
Progress Report Begin	14 - Query	Started reading data from the 'Internet_Sales_2004' partition.
Progress Report End	14 - Query	Finished reading data from the 'Internet_Sales_2004' partition.
Query Subcube	2 - Non-cache data July 31, 2004	
Query End	0 - MDXQuery	

Now we see lots of non-cached Query Subcube requests for the dates going from August 31, 2004 down to July 31, 2004. Interestingly, only the last one triggers reading of the partition, since for the others, partition autoslice automatically detects that they don't belong to any partition. So in certain sense this is a good result, since we only touched one partition, but in another sense, this query plan with lots of Query Subcube requests doesn't look good. We were lucky, that in Adventure Works there were only 31 days at the end of Date dimension without data, so recursion went 31 step only. But it is not uncommon to have Date dimensions looking several years ahead, in which case recursion could go on for thousands of steps, and overhead of thousands of Query Subcube requests could become substantial, not to mention the overhead of the depth of recursion.

This leads us to another approach. Instead of iterating by days, we can iterate by partitions. Since in Adventure Works partitioning scheme is by years, we will iterate by years, and then find the last non empty date within a year. The MDX that implements this approach is below:

CREATE
 MEMBER CurrentCube.LastYearIndex AS
  Iif(
   IsEmpty([Measures].[Internet Sales Amount]), 
   [Date ].[Calendar Year].PrevMember,
   Rank([Date ].[Calendar Year].CurrentMember, [Date ].[Calendar Year].[Calendar Year])
  )
 SET LastYear AS [Date ].[Calendar Year].[Calendar Year].Item((LastYearIndex, [Date ].[Calendar Year].LastChild)-1)
 SET LastDate AS Tail(NonEmpty([Date ].[Date ].[Date ].MEMBERS, (LastYear, [Measures].[Internet Sales Amount])), 1)

Again, this gives us the desired result, and now the trace looks like following:

Query Begin	0 - MDXQuery	MDX Studio v0.2.6.0
Progress Report Begin	14 - Query	Started reading data from the 'Internet_Sales_2004' partition.
Progress Report End	14 - Query	Finished reading data from the 'Internet_Sales_2004' partition.
Query Subcube	2 - Non-cache data
00000000,000,00000,00,000000000000100000,000000000000000000,000000000000000000,0000000000000000000000,000000000000000000000,00,10
Progress Report Begin	14 - Query	Started reading data from the 'Internet_Sales_2004' partition.
Progress Report End	14 - Query	Finished reading data from the 'Internet_Sales_2004' partition.
Query Subcube	2 - Non-cache data
00000000,000,00000,00,010000000000100000,000000000000000000,000000000000000000,0000000000000000000000,000000000000000000000,00,10
Query End	0 - MDXQuery

We eliminated lots of Query Subcube requests, but we still have two, and now each one triggers query of partition. Why there are two of them ? This is easily answered by inspecting the Query Subcube granularity bitmasks. We see that the difference between them is in the Date dimension. First Query Subcube doesn't have granularity on Date attribute, while the second one has. It is clear now, that the first Query Subcube request is triggered by IsEmpty([Measures].[Internet Sales Amount]) check inside Iif - because it happens at the Year granularity, and the second Query Subcube is due to NonEmpty over dates with the filter on the specific year.

We don't like the fact that we go to the partition twice, essentially for the same data but on different granularity. So we can do a little trick in MDX. Since we will have to go to the Date granularity within a year anyway, let's do it sooner rather than later - and expand granularity already inside Iif check. The resulting MDX will look like following:

CREATE
 MEMBER CurrentCube.LastYearIndex AS
  Iif(
   IsEmpty(Aggregate(Descendants([Date ].[Calendar], [Date ].[Calendar].[Date ]), [Measures].[Internet Sales Amount])),
   [Date ].[Calendar Year].PrevMember,
   Rank([Date ].[Calendar Year].CurrentMember, [Date ].[Calendar Year].[Calendar Year])
  )
 SET LastYear AS [Date ].[Calendar Year].[Calendar Year].Item((LastYearIndex, [Date ].[Calendar Year].LastChild)-1)
 SET LastDate AS Tail(NonEmpty([Date ].[Date ].[Date ].MEMBERS, (LastYear, [Measures].[Internet Sales Amount])), 1)

Let's see what we get in the trace now:

Query Begin	0 - MDXQuery	MDX Studio v0.2.6.0
Progress Report Begin	14 - Query	Started reading data from the 'Internet_Sales_2004' partition.
Progress Report End	14 - Query	Finished reading data from the 'Internet_Sales_2004' partition.
Query Subcube	2 - Non-cache data
00000000,000,00000,00,011010000001100000,000000000000000000,000000000000000000,0000000000000000000000,000000000000000000000,00,10
Query Subcube	1 - Cache data
00000000,000,00000,00,010000000000100000,000000000000000000,000000000000000000,0000000000000000000000,000000000000000000000,00,10
Query End	0 - MDXQuery

This is perfect - exactly what we wanted. The second Query Subcube which used to be Non-cache data, now turned into cached one, and there is only one query to only one partition going on.

This post was written as a result of discussion on sql.ru forum

Published Tuesday, December 18, 2007 9:21 PM by mosha
Filed under: ,
Anonymous comments are disabled
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement