<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://sqlblog.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Microsoft OLAP by Mosha Pasumansky : performance</title><link>http://sqlblog.com/blogs/mosha/archive/tags/performance/default.aspx</link><description>Tags: performance</description><dc:language>en</dc:language><generator>CommunityServer 2.1 SP2 (Build: 61129.1)</generator><item><title>Gross margin - dense vs. sparse block evaluation mode in MDX</title><link>http://sqlblog.com/blogs/mosha/archive/2008/11/01/gross-margin-dense-vs-sparse-block-evaluation-mode-in-mdx.aspx</link><pubDate>Sat, 01 Nov 2008 22:23:35 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:9822</guid><dc:creator>mosha</dc:creator><slash:comments>2</slash:comments><comments>http://sqlblog.com/blogs/mosha/comments/9822.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/mosha/commentrss.aspx?PostID=9822</wfw:commentRss><description>&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Gross_margin"&gt;Gross margin&lt;/a&gt; (also known as &lt;a href="http://en.wikipedia.org/wiki/Gross_profit_margin"&gt;Gross profit margin&lt;/a&gt; or Gross profit rate) is defined as (Revenue – Cost of Sales)/Revenue. In terms of Adventure Works sample database we can write this in MDX as&lt;/p&gt;  &lt;pre class="code"&gt;[Gross Margin] = (&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Internet Sales Amount] &lt;/span&gt;- &lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Internet Total Product Cost]&lt;/span&gt;) / &lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Internet Sales Amount];&lt;/span&gt;&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;While this expression is simple enough, it might be tempting to try to optimize it. If we look at the evaluation tree for this expression – we will see 5 nodes in it: Fetching values for Internet Sales Amount twice – because it appears twice in the expression, fetching values for Total Cost, one minus operation and one division operation. Accessing Internet Sales Amount twice is not really a problem, because second access is going to come from cache anyway, but one might think that removing this extra operation would still improve thing, if only a little.&lt;/p&gt;

&lt;p&gt;So, perhaps we can rewrite the formula using simple math equivalence: (a-b)/a = 1 – b/a&lt;/p&gt;

&lt;p&gt;The first problem with this rewrite is that it is not always correct. In math it relies on the fact that a/a=1, but is is only true when a &amp;lt;&amp;gt; 0. The result of a/a is undefined when a=0. In MDX, 0/0 will also result in undefined number, usually formatted as “-1.#IND”. But more interesting question is what would happen in MDX when a is empty, i.e. has value of NULL. In MDX NULL/NULL = NULL, therefore when both Sales and Cost are empty, i.e. there is no record in the fact table, the two formulas are going to give different results.&lt;/p&gt;

&lt;pre&gt;(a-b)/a = NULL when a = NULL and b = NULL
1 – b/a = 1    when a = NULL and b = NULL&lt;/pre&gt;

&lt;p&gt;So at very least someone first need to set exact definition of Gross margin at the coordinates where no sales occurred. Is Gross margin 100% there or is it empty as well ?&lt;/p&gt;

&lt;p&gt;But I want to direct your attention to the performance aspect of this change. Remember, it seemed that 1-b/a would perform a little bit better – is it really so ? Let’s compare both approaches side by side in AS2005 using &lt;a href="http://www.mdxstudio.com"&gt;MDX Studio&lt;/a&gt;.&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:green;"&gt;// Gross margin using 1-a/b
&lt;/span&gt;&lt;span style="color:blue;"&gt;WITH 
  MEMBER &lt;/span&gt;[Gross Margin] &lt;span style="color:blue;"&gt;AS 
       &lt;/span&gt;&lt;span style="color:silver;"&gt;1
    &lt;/span&gt;- 
        &lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Internet Total Product Cost]
      &lt;/span&gt;/ 
        &lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Internet Sales Amount] 
    &lt;/span&gt;, FORMAT_STRING = &lt;span style="color:red;"&gt;'Percent'
  &lt;/span&gt;&lt;span style="color:blue;"&gt;MEMBER &lt;/span&gt;[Max Gross Margin] &lt;span style="color:blue;"&gt;AS 
    &lt;/span&gt;Max
    (
      (
        &lt;span style="color:navy;"&gt;[Customer]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Customer Geography]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Customer]
       &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Product]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Product Categories]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Subcategory]
       &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar Year]
      &lt;/span&gt;)
     ,&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Gross Margin]
    &lt;/span&gt;) 
&lt;span style="color:blue;"&gt;SELECT 
  &lt;/span&gt;&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Max Gross Margin] &lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;&lt;span style="color:silver;"&gt;0
&lt;/span&gt;&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;&lt;span style="color:navy;"&gt;[Adventure Works]&lt;/span&gt;;&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;pre&gt;Time              : 6 sec 640 ms
Calc covers       : 6
Cells calculated  : 1
Sonar subcubes    : 2
NON EMPTYs        : 0
Autoexists        : 1
EXISTINGs         : 0
SE queries        : 2
Cache hits        : 3
Cache misses      : 1
Cache inserts     : 1
Cache lookups     : 4
Memory Usage KB   : 88840&lt;/pre&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH 
  MEMBER &lt;/span&gt;[Gross Margin] &lt;span style="color:blue;"&gt;AS 
      &lt;/span&gt;(
        &lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Internet Sales Amount]
      &lt;/span&gt;- 
        &lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Internet Total Product Cost]
      &lt;/span&gt;)
    / 
      &lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Internet Sales Amount] 
    &lt;/span&gt;, FORMAT_STRING = &lt;span style="color:red;"&gt;'Percent'
  &lt;/span&gt;&lt;span style="color:blue;"&gt;MEMBER &lt;/span&gt;[Max Gross Margin] &lt;span style="color:blue;"&gt;AS 
    &lt;/span&gt;Max
    (
      (
        &lt;span style="color:navy;"&gt;[Customer]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Customer Geography]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Customer]
       &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Product]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Product Categories]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Subcategory]
       &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar Year]
      &lt;/span&gt;)
     ,&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Gross Margin]
    &lt;/span&gt;) 
&lt;span style="color:blue;"&gt;SELECT 
  &lt;/span&gt;&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Max Gross Margin] &lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;&lt;span style="color:silver;"&gt;0
&lt;/span&gt;&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;&lt;span style="color:navy;"&gt;[Adventure Works]&lt;/span&gt;;&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;pre&gt;Time              : 234 ms
Calc covers       : 7
Cells calculated  : 1
Sonar subcubes    : 1
NON EMPTYs        : 0
Autoexists        : 0
EXISTINGs         : 0
SE queries        : 3
Cache hits        : 3
Cache misses      : 1
Cache inserts     : 1
Cache lookups     : 4
Memory Usage KB   : 0&lt;/pre&gt;

&lt;p&gt;The results come shockingly different. Almost 7 seconds for the “optimized” approach, and mere 234 ms for the original approach. Let’s try to understand why. First let’s compare the number of SE queries. It is 3 in the (a-b)/a approach, and 2 in the 1-b/a – so our optimization to reduce number of SE queries to fetch Internet Sales Amount indeed worked (and as we can see from hierarchical profiling), on cold cache run, only first SE query went for partitions, the other two came from cache).&lt;/p&gt;

&lt;p&gt;Usually, when we get such a big difference in the results, we suspect that slow query executes in the cell-by-cell mode, and fast query executes in the block mode. However, there is no reason to believe that the slow query went through cell-by-cell mode. Both expressions use exactly same operands: minus and divide, and both of these are optimized for block mode. Perfmon counters from the run also don’t give evidence for the cell-by-cell mode. But then, if both queries indeed went through block mode, why one query is so much significantly slower than the other one ?&lt;/p&gt;

&lt;p&gt;The answer lies in the density and sparsity characteristics of the space. Let’s take closer look at both evaluation subtrees:&lt;/p&gt;

&lt;pre&gt;Operator &amp;quot;/&amp;quot;
 |
 +-- Operator &amp;quot;-&amp;quot;
 |    |
 |    +-- SE Data (Internet Sales Amount, Customer, Subcategory, Year)
 |    |
 |    +-- SE Data (Internet Total Product Cost, Customer, Subcategory, Year)
 |
 +-- SE Data (Internet Sales Amount, Customer, Subcategory, Year)&lt;/pre&gt;

&lt;p&gt;In this tree every leaf element is SE Data, and it is sparse – i.e. even though the query subcubes cover big space, the SE Data node only fetches records that exist in fact table, i.e. non empty data. Both “/” and “-” operators operate efficiently on sparse data, because they can iterate only over non-empty cells (the algorithm for operator “*” was explained &lt;a href="http://sqlblog.com/blogs/mosha/archive/2005/12/06/performance-of-multiplication-in-mdx.aspx"&gt;here&lt;/a&gt;, the algorithms for “/” and “-” are slightly more complicated, but similar).&lt;/p&gt;

&lt;p&gt;Now the evaluation subtree for the slow query:&lt;/p&gt;

&lt;pre&gt;Operator &amp;quot;-&amp;quot;
 |
 +-- Constant (1) (Customer, Subcategory, Year)
 |
 +-- Operator &amp;quot;/&amp;quot;
      |
      +-- SE Data (Internet Total Product Cost, Customer, Subcategory, Year)
      |
      +-- SE Data (Internet Sales Amount, Customer, Subcategory, Year)&lt;/pre&gt;

&lt;p&gt;Here we have two leaf nodes of SE Data – no problem with them, but another leaf node is constant 1. Since it is constant, it will evaluate the same value over all the cells in its subcube – and this value, 1, is not NULL. Therefore, this node evaluates to non empty result in every single cell. So while the query plan is still using block mode, it doesn’t get much benefit out of it, because the “block” is dense, there is no single empty cell in it, and while algorithm says to iterate over non empty cells, it ends up iterating over every cell, which makes it as slow as if it was in cell by cell mode.&lt;/p&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=9822" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/mosha/archive/tags/mdx/default.aspx">mdx</category><category domain="http://sqlblog.com/blogs/mosha/archive/tags/performance/default.aspx">performance</category></item><item><title>Optimizing order of sets in MDX crossjoins</title><link>http://sqlblog.com/blogs/mosha/archive/2008/10/24/optimizing-order-of-sets-in-mdx-crossjoins.aspx</link><pubDate>Fri, 24 Oct 2008 07:00:05 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:9634</guid><dc:creator>mosha</dc:creator><slash:comments>0</slash:comments><comments>http://sqlblog.com/blogs/mosha/comments/9634.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/mosha/commentrss.aspx?PostID=9634</wfw:commentRss><description>&lt;p&gt;For scalar values a*b is the same as b*a. (although we saw that performance of these two could be different in MDX in “&lt;a href="http://sqlblog.com/blogs/mosha/archive/2005/12/06/performance-of-multiplication-in-mdx.aspx"&gt;Performance of multiplication in MDX&lt;/a&gt;” article). But if a and b are sets, then obviously results are different – since the order of tuples in the resulting set will be different. But does it matter if we were to perform some other operation on top of result, which wouldn’t depend on order of the tuples ? Semantically – the answer is no, but there could be a big difference in performance. Let’s take the following example:&lt;/p&gt;  &lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH MEMBER &lt;/span&gt;[Measures].x &lt;span style="color:blue;"&gt;AS
 Count&lt;/span&gt;(
   (
     &lt;span style="color:navy;"&gt;[Customer]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[City]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[City]
    &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Customer]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Gender]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Gender]
    &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Customer]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Education]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Education]
    &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Product]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Subcategory]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Subcategory]
    &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Product]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Color]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Color]
    &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Product]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Size]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Size]
   &lt;/span&gt;)
 )
&lt;span style="color:blue;"&gt;SELECT &lt;/span&gt;x &lt;span style="color:blue;"&gt;ON &lt;/span&gt;&lt;span style="color:silver;"&gt;0
&lt;/span&gt;&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;&lt;span style="color:navy;"&gt;[Adventure Works]
&lt;/span&gt;&lt;/pre&gt;

&lt;p&gt;This query runs in 78 ms – no problem. But if we do innocently looking change – move one set to another position -&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH MEMBER &lt;/span&gt;[Measures].x &lt;span style="color:blue;"&gt;AS
 Count&lt;/span&gt;(
   (
     &lt;span style="color:navy;"&gt;[Customer]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[City]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[City]
    &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Product]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Subcategory]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Subcategory]
    &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Customer]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Gender]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Gender]
    &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Customer]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Education]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Education]
    &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Product]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Color]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Color]
    &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Product]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Size]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Size]
   &lt;/span&gt;)
 )
&lt;span style="color:blue;"&gt;SELECT &lt;/span&gt;x &lt;span style="color:blue;"&gt;ON &lt;/span&gt;&lt;span style="color:silver;"&gt;0
&lt;/span&gt;&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;&lt;span style="color:navy;"&gt;[Adventure Works]&lt;/span&gt;&lt;/pre&gt;

&lt;p&gt;&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;Now this query takes 13 seconds, and also grabs significant memory. So what happened ? To understand this better, let’s go back to the first query and pay closer attention to what &lt;a href="http://www.mdxstudio.com"&gt;MDX Studio&lt;/a&gt; told us about the execution. We notice, that in PerfMon tab, it reported that there were 2 autoexists, and also hierarchical profiler shows 2 NON EMPTY events (each one corresponding to autoexist operation). Why does it report 2 ? The query has just single crossjoin, so there should’ve been only 1 autoexist. What happens here is that crossjoin detects that the set that it needs to apply autoexists to, can be nicely split into two parts – first part related to Customer dimension, and second part to Product dimension. Then it can apply the following formula&lt;/p&gt;

&lt;p&gt;JOIN( c1, c2, c3, p1, p2, p3 ) = CROSSJOIN( INNERJOIN(c1,c2,c3), INNERJOIN(p1,p2,p3) )&lt;/p&gt;

&lt;p&gt;I.e. it can run inner join against each dimension table separately, and then do full cross join between results. Inner join against single dimension table is very efficient since even in worse case, we don’t need to go deeper than the dimension key to which all other attributes relate – so this isn’t going to take additional memory or much time.&lt;/p&gt;

&lt;p&gt;However, in second case, the equation doesn’t hold anymore, since the order of dimensions inside crossjoin is mixed. So some other, much less efficient algorithm will be needed, probably loop join, with crossjoin now being pushed to the earlier stages. This is very inefficient and also going to take lots of additional memory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;: Inside crossjoin, cluster all the sets from the same dimension together – this will result in great benefit to performance.&lt;/p&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=9634" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/mosha/archive/tags/mdx/default.aspx">mdx</category><category domain="http://sqlblog.com/blogs/mosha/archive/tags/performance/default.aspx">performance</category></item><item><title>Optimizing MDX aggregation functions</title><link>http://sqlblog.com/blogs/mosha/archive/2008/10/22/optimizing-mdx-aggregation-functions.aspx</link><pubDate>Wed, 22 Oct 2008 19:32:01 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:9604</guid><dc:creator>mosha</dc:creator><slash:comments>13</slash:comments><comments>http://sqlblog.com/blogs/mosha/comments/9604.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/mosha/commentrss.aspx?PostID=9604</wfw:commentRss><description>&lt;p&gt;One of the most significant changes in Analysis Services 2008 was improving performance of MDX queries and calculations. In particular, query optimizer can choose block (a.k.a. subspace) computation mode in query plan more often than in AS2005, and usually using block/subspace computation mode brings performance orders of magnitude better than without it. However, even in AS2008, query optimizer is not always capable of using block mode. The “&lt;a href="http://msdn.microsoft.com/en-us/library/bb934106.aspx"&gt;Performance Improvements for MDX in SQL Server 2008 Analysis Services&lt;/a&gt;” whitepaper documents which MDX constructs are optimized and which not, and &lt;a href="http://www.mdxstudio.com"&gt;MDX Studio&lt;/a&gt; automates process of analyzing MDX and figuring out which fragments are not optimized, and offers advice how to optimize them. In this article we will cover techniques related to the optimization of aggregation functions in MDX, i.e. Sum, Min, Max, Aggregate and (in AS2008) Avg. All examples were done on AS2008 version of Adventure Works (some of these techniques will work with AS2005, but others won’t).&lt;/p&gt;  &lt;h3&gt;1. Aggregating over Filter&lt;/h3&gt;  &lt;p&gt;Scenario: We want to compute average sales for the products which increased its sales since same date last month. The normal approach is to use Filter function to determine which products grew their sales, and then apply Avg to the resulting set. Now applying this to every day in the Year 2003, we will get &lt;/p&gt;  &lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH 
  MEMBER &lt;/span&gt;[Measures].AvgGrowingProducts &lt;span style="color:blue;"&gt;AS 
    &lt;/span&gt;Avg
    (
      Filter
      (
        &lt;span style="color:navy;"&gt;[Product]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Product]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Product]&lt;/span&gt;.&lt;span style="color:blue;"&gt;MEMBERS
       &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Sales Amount] &lt;/span&gt;&amp;gt; (&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Sales Amount]&lt;/span&gt;,ParallelPeriod(&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Month]&lt;/span&gt;))
      )
     ,&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Sales Amount]
    &lt;/span&gt;) 
&lt;span style="color:blue;"&gt;SELECT 
  &lt;/span&gt;&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.AvgGrowingProducts &lt;span style="color:blue;"&gt;ON &lt;/span&gt;&lt;span style="color:silver;"&gt;0
 &lt;/span&gt;,Descendants
  (
    &lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar Year]&lt;/span&gt;.&amp;amp;[2003]
   ,&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Date]
  &lt;/span&gt;) &lt;span style="color:blue;"&gt;ON &lt;/span&gt;&lt;span style="color:silver;"&gt;1
&lt;/span&gt;&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;&lt;span style="color:navy;"&gt;[Adventure Works]&lt;/span&gt;;&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;Running this query took more than 7 seconds on my laptop, and the culprit is clear from the perfmon counters – there were 295011 cells calculated – this is a sign of cell-by-cell iterations as opposed to working in the block mode. And if we run “Analyze” function of MDX Studio, it will tell us why this is happening: Function ‘Filter’ was used inside aggregation function – this disables block mode. How do we get rid of Filter ? I actually has written about it before – in the “&lt;a href="http://sqlblog.com/blogs/mosha/archive/2007/11/22/optimizing-count-filter-expressions-in-mdx.aspx"&gt;Optimizing Count(Filter(...)) expressions in MDX&lt;/a&gt;” with respect to the Count function, but same technique can be applied to any aggregation function, including Avg (in AS2008). The idea is simple – even though Filter reduces the set of products, in order to compute the Filter in the first place, we really have to check sales of every single product, so we may as well just run Avg over all products, but with smart expression which will return NULL in case product should not be included – and since Avg ignores NULLs, this will work correctly. Here is how rewritten MDX will look like:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH 
  MEMBER &lt;/span&gt;[Measures].Growth &lt;span style="color:blue;"&gt;AS 
    &lt;/span&gt;IIF
    (
      &lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Sales Amount] &lt;/span&gt;&amp;gt; (&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Sales Amount] &lt;/span&gt;,ParallelPeriod(&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Month]&lt;/span&gt;))
     ,&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Sales Amount]
     &lt;/span&gt;,&lt;span style="color:blue;"&gt;NULL
    &lt;/span&gt;) 
   ,FORMAT_STRING = &lt;span style="color:red;"&gt;'Currency' 
  &lt;/span&gt;&lt;span style="color:blue;"&gt;MEMBER &lt;/span&gt;[Measures].AvgGrowingProducts &lt;span style="color:blue;"&gt;AS 
    &lt;/span&gt;Avg
    (
      &lt;span style="color:navy;"&gt;[Product]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Product]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Product]&lt;/span&gt;.&lt;span style="color:blue;"&gt;MEMBERS
     &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.Growth
    ) 
&lt;span style="color:blue;"&gt;SELECT 
  &lt;/span&gt;&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.AvgGrowingProducts &lt;span style="color:blue;"&gt;ON &lt;/span&gt;&lt;span style="color:silver;"&gt;0
 &lt;/span&gt;,Descendants
  (
    &lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar Year]&lt;/span&gt;.&amp;amp;[2003]
   ,&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Date]
  &lt;/span&gt;) &lt;span style="color:blue;"&gt;ON &lt;/span&gt;&lt;span style="color:silver;"&gt;1
&lt;/span&gt;&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;&lt;span style="color:navy;"&gt;[Adventure Works]&lt;/span&gt;;&lt;/pre&gt;

&lt;p&gt;&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;Now it executes in mere 200 ms, and number of cells calculated is 365 – exactly the same number as number of cells in the result – which is the best theoretical value we can get !&lt;/p&gt;

&lt;h3&gt;2. Aggregating over NonEmpty&lt;/h3&gt;

&lt;p&gt;Scenario: Compute average sales for all products. The solution is straightforward:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH 
  MEMBER &lt;/span&gt;[Measures].AvgProductSales &lt;span style="color:blue;"&gt;AS 
    &lt;/span&gt;Avg
    (
      &lt;span style="color:navy;"&gt;[Product]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Product]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Product]&lt;/span&gt;.&lt;span style="color:blue;"&gt;MEMBERS
     &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Sales Amount]
    &lt;/span&gt;) 
&lt;span style="color:blue;"&gt;SELECT 
  &lt;/span&gt;&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.AvgProductSales &lt;span style="color:blue;"&gt;ON &lt;/span&gt;&lt;span style="color:silver;"&gt;0
 &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:blue;"&gt;MEMBERS ON &lt;/span&gt;&lt;span style="color:silver;"&gt;1
&lt;/span&gt;&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;&lt;span style="color:navy;"&gt;[Adventure Works]&lt;/span&gt;;&lt;/pre&gt;

&lt;p&gt;It is very efficient – only 100 ms and single SE query. But what if someone wants to optimize it even more ? Is this possible ? Poking with MDX Studio expression debugger, we can notice, that the sets over which Avg is run are pretty sparse. For example, on August 7, 2001 – there were only 3 products which had sales, the rest was empty. Year later, on August 7, 2002 there were only 5 products which sold. So, it might be tempting to add NonEmpty function over the products, in a hope to reduce the size of the set before it is fed to Avg:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH 
  MEMBER &lt;/span&gt;[Measures].AvgProductSales &lt;span style="color:blue;"&gt;AS 
    &lt;/span&gt;Avg
    (
      NonEmpty
      (
        &lt;span style="color:navy;"&gt;[Product]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Product]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Product]&lt;/span&gt;.&lt;span style="color:blue;"&gt;MEMBERS
       &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Sales Amount]
      &lt;/span&gt;)
     ,&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Sales Amount]
    &lt;/span&gt;) 
&lt;span style="color:blue;"&gt;SELECT 
  &lt;/span&gt;&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.AvgProductSales &lt;span style="color:blue;"&gt;ON &lt;/span&gt;&lt;span style="color:silver;"&gt;0
 &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:blue;"&gt;MEMBERS ON &lt;/span&gt;&lt;span style="color:silver;"&gt;1
&lt;/span&gt;&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;&lt;span style="color:navy;"&gt;[Adventure Works]&lt;/span&gt;;&lt;/pre&gt;

&lt;p&gt;&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;This turns out to be a big mistake. The execution time jumps 18-fold to 1 sec 796 ms, and the all the perfmon counters jump as well, with number of SE queries going to 1189 from 1. I have written about this &lt;a href="http://sqlblog.com/blogs/mosha/archive/2006/10/09/mdx-nonempty-exists-and-evil-nonemptycrossjoin.aspx"&gt;before&lt;/a&gt;, but it is worthwhile to reiterate. Unfortunately, this kind of wrong advice “reduce the space with NonEmpty before applying computation” is a popular myth. I see it mentioned in various presentations, and it even shows up in otherwise good “&lt;a href="http://www.sqlserveranalysisservices.com/OLAPPapers/MDX%20Performance%20Hintsv2.htm"&gt;MDX Performance Hints&lt;/a&gt;” document. The irony is that the tip which is supposed to improve performance, actually makes it worse. The reason here is twofold:&lt;/p&gt;

&lt;p&gt;1) Since NonEmpty is used inside calculated member, it will be computed every time this calculated member is invoked. No block mode. Each call to NonEmpty triggers at least one SE query. This is why we see so many SE queries in perfmon – one for every cell which uses AvgProductSales calculated member.&lt;/p&gt;

&lt;p&gt;2) The result of NonEmpty could be a set of any shape and form. It is very difficult for Avg to work with such unpredictable sets. In our example we got lucky, that since we only had single hierarchy inside NonEmpty – we didn’t end up with arbitrary shape set – otherwise performance would’ve been even worse.&lt;/p&gt;

&lt;p&gt;So the guidance is to never use NonEmpty inside calculations, but it is OK to use NonEmpty while building axes or named sets – because then it will be only called once and not for every cell.&lt;/p&gt;

&lt;h3&gt;3. Aggregating over Union&lt;/h3&gt;

&lt;p&gt;Scenario: Running sum by day. Typical and very sensible approach to this problem is to call Sum over all the preceding days. Assuming that our calculation will only ever be called on Date attribute (this is easy to ensure with SCOPE assignment, but for simplicity I will put it here as regular calculated member):&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH 
  MEMBER &lt;/span&gt;[Measures].RunSales &lt;span style="color:blue;"&gt;AS 
    &lt;/span&gt;Sum
    (
      &lt;span style="color:blue;"&gt;NULL &lt;/span&gt;: &lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:blue;"&gt;CurrentMember
     &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Sales Amount]
    &lt;/span&gt;) 
&lt;span style="color:blue;"&gt;SELECT 
  &lt;/span&gt;&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.RunSales &lt;span style="color:blue;"&gt;ON &lt;/span&gt;&lt;span style="color:silver;"&gt;0
 &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:blue;"&gt;MEMBERS ON &lt;/span&gt;&lt;span style="color:silver;"&gt;1
&lt;/span&gt;&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;&lt;span style="color:navy;"&gt;[Adventure Works]&lt;/span&gt;;&lt;/pre&gt;

&lt;p&gt;This works OK – about 2 seconds execution time, and in block mode. We still want to optimize it. The common optimization technique for running sum is to do summation at higher grains whenever possible. I.e. instead of running over days, we would like to run over months until the current month, and only from there over the remaining days in this month. This approach makes a lot of sense, but the implementation should be done carefully. Let’s see what will happen if we modify the set to be union of months and days:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH 
  MEMBER &lt;/span&gt;[Measures].RunSales &lt;span style="color:blue;"&gt;AS 
    &lt;/span&gt;Sum
    (
      Union
      (
        &lt;span style="color:blue;"&gt;NULL &lt;/span&gt;: &lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar]&lt;/span&gt;.&lt;span style="color:blue;"&gt;Parent&lt;/span&gt;.&lt;span style="color:blue;"&gt;PrevMember
       &lt;/span&gt;,
        &lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar]&lt;/span&gt;.&lt;span style="color:blue;"&gt;FirstSibling &lt;/span&gt;: &lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar]&lt;/span&gt;.&lt;span style="color:blue;"&gt;CurrentMember
      &lt;/span&gt;)
     ,&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Sales Amount]
    &lt;/span&gt;) 
&lt;span style="color:blue;"&gt;SELECT 
  &lt;/span&gt;&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.RunSales &lt;span style="color:blue;"&gt;ON &lt;/span&gt;&lt;span style="color:silver;"&gt;0
 &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:blue;"&gt;MEMBERS ON &lt;/span&gt;&lt;span style="color:silver;"&gt;1
&lt;/span&gt;&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;&lt;span style="color:navy;"&gt;[Adventure Works]&lt;/span&gt;;&lt;/pre&gt;

&lt;p&gt;&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;The results are disastrous ! The execution time rose above 3 seconds, but more alarmingly, both number of cells calculated and SE queries jumped to thousands. Why is that ? After all, if we read the official documentation, it says that Sum over Union is in fact optimized for block mode, but the results we see here clearly show the opposite. The root cause is a little bit deeper. It is true that Sum(Union()) is optimized, but one of the other conditions for aggregation functions to work optimally, is to have the input set in a good shape, i.e. not arbitrary shaped set. Our Union combines sets from different granularities, so we end up with the mixed grain set, which is classified as arbitrary shape. How can we fix it ? One way is to break single Sum into two Sum’s, each one over the set of uniform single grain, i.e.&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH 
  MEMBER &lt;/span&gt;[Measures].RunSales &lt;span style="color:blue;"&gt;AS 
      &lt;/span&gt;Sum
      (
        &lt;span style="color:blue;"&gt;NULL &lt;/span&gt;: &lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar]&lt;/span&gt;.&lt;span style="color:blue;"&gt;Parent&lt;/span&gt;.&lt;span style="color:blue;"&gt;PrevMember
       &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Sales Amount]
      &lt;/span&gt;)
    + 
      Sum
      (
        &lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar]&lt;/span&gt;.&lt;span style="color:blue;"&gt;FirstSibling &lt;/span&gt;: &lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar]&lt;/span&gt;.&lt;span style="color:blue;"&gt;CurrentMember
       &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Sales Amount]
      &lt;/span&gt;) 
&lt;span style="color:blue;"&gt;SELECT 
  &lt;/span&gt;&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.RunSales &lt;span style="color:blue;"&gt;ON &lt;/span&gt;&lt;span style="color:silver;"&gt;0
 &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:blue;"&gt;MEMBERS ON &lt;/span&gt;&lt;span style="color:silver;"&gt;1
&lt;/span&gt;&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;&lt;span style="color:navy;"&gt;[Adventure Works]&lt;/span&gt;;&lt;/pre&gt;

&lt;p&gt;&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;Now the results are great – the runtime is about 400 ms, and other stats look good too. It is possible to optimize it even further using techniques described in the “&lt;a href="http://sqlblog.com/blogs/mosha/archive/2008/03/28/take-advantage-of-fe-caching-to-optimize-mdx-performance.aspx"&gt;Take advantage of FE caching to optimize MDX performance&lt;/a&gt;” article. We notice that the first component of plus is the same for all days in the month, so we can take advantage of it by caching it at the first day and reusing for all other days as following:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH 
  MEMBER &lt;/span&gt;[Measures].RunMonthSales &lt;span style="color:blue;"&gt;AS
      &lt;/span&gt;Sum
      (
        &lt;span style="color:blue;"&gt;NULL &lt;/span&gt;: &lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar]&lt;/span&gt;.&lt;span style="color:blue;"&gt;CurrentMember
       &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Sales Amount]
      &lt;/span&gt;)
  &lt;span style="color:blue;"&gt;MEMBER &lt;/span&gt;[Measures].RunSales &lt;span style="color:blue;"&gt;AS 
      &lt;/span&gt;(&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.RunMonthSales, &lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar]&lt;/span&gt;.&lt;span style="color:blue;"&gt;Parent&lt;/span&gt;.&lt;span style="color:blue;"&gt;PrevMember&lt;/span&gt;)
    + 
      Sum
      (
        &lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar]&lt;/span&gt;.&lt;span style="color:blue;"&gt;FirstSibling &lt;/span&gt;: &lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar]&lt;/span&gt;.&lt;span style="color:blue;"&gt;CurrentMember
       &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Sales Amount]
      &lt;/span&gt;) 
&lt;span style="color:blue;"&gt;SELECT 
  &lt;/span&gt;&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.RunSales &lt;span style="color:blue;"&gt;ON &lt;/span&gt;&lt;span style="color:silver;"&gt;0
 &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:blue;"&gt;MEMBERS ON &lt;/span&gt;&lt;span style="color:silver;"&gt;1
&lt;/span&gt;&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;&lt;span style="color:navy;"&gt;[Adventure Works]&lt;/span&gt;;&lt;/pre&gt;

&lt;p&gt;&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;This gives better runtime of 280 ms, on bigger cubes the difference will be more significant.&lt;/p&gt;

&lt;h3&gt;4. Aggregating over Exists or CrossJoin ?&lt;/h3&gt;

&lt;p&gt;To be written when I get more time.&lt;/p&gt;

&lt;h3&gt;5. Aggregting over Descendants or EXISTING ?&lt;/h3&gt;

&lt;p&gt;To be written when I get more time.&lt;/p&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=9604" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/mosha/archive/tags/mdx/default.aspx">mdx</category><category domain="http://sqlblog.com/blogs/mosha/archive/tags/performance/default.aspx">performance</category></item><item><title>Get most out of partition slices</title><link>http://sqlblog.com/blogs/mosha/archive/2008/10/14/get-most-out-of-partition-slices.aspx</link><pubDate>Tue, 14 Oct 2008 06:36:33 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:9471</guid><dc:creator>mosha</dc:creator><slash:comments>3</slash:comments><comments>http://sqlblog.com/blogs/mosha/comments/9471.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/mosha/commentrss.aspx?PostID=9471</wfw:commentRss><description>&lt;p&gt;Setting partition slice has always been an important optimization technique in Analysis Services. Every presentation talked about it and every whitepaper mentioned it, for example the &lt;a href="http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ansvcspg.mspx"&gt;Microsoft SQL Server 2000 Analysis Services Performance Guide&lt;/a&gt; contains a chapter appropriately named “Define the Data Slice for Each Partition”, here is a quote from it:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;em&gt;“If the data slice value for a partition is set properly, Analysis Services can quickly eliminate irrelevant partitions from the query processing and significantly reduce the amount of physical I/O and processor time needed for many queries issued against MOLAP and HOLAP partitions &amp;lt;skip&amp;gt; &lt;/em&gt;&lt;/p&gt;    &lt;p&gt;&lt;em&gt;Caution: Creating a partition without setting the data slice is not a good practice, and can result in considerable overhead being added to the system (artificially increasing response times). Without the data slice, Analysis Services cannot limit a query to the appropriate partitions and must scan each partition even if zero cells will be returned. &lt;/em&gt;&lt;/p&gt;    &lt;p&gt;&lt;em&gt;The data slice enables Analysis Services to determine which partitions contain data relevant to the query”&lt;/em&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;However, after AS2005 was released, there was a new tip making it into presentations - “For MOLAP partitions no need to specify slice property, it is detected automatically”. Even though I briefly debunked it before &lt;a href="http://sqlblog.com/blogs/mosha/archive/2007/06/02/notes-about-notes-from-microsoft-bi-conference.aspx"&gt;here&lt;/a&gt;,&amp;#160; this rumor just kept popping up, and it even made it as far as official “&lt;a href="http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/olapdbpssas2005.mspx"&gt;OLAP Design Best Practices for Analysis Services 2005&lt;/a&gt;” document – here is the quote:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;em&gt;“For MOLAP partitions, you do not have to specify the slice because the server will be able to figure out the relevant partitions after the data has been processed”&lt;/em&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Even worse, the &lt;a href="http://msdn.microsoft.com/en-us/library/ms174764(SQL.90).aspx"&gt;official documentation&lt;/a&gt; on data slices goes as far as stating:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;“&lt;em&gt;Data slices are applicable only to objects that use the ROLAP storage mode”&lt;/em&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;But this is completely not true, and many people already discovered it. This Jesse Orosz’s &lt;a href="http://jesseorosz.spaces.live.com/blog/cns!E322FD91218E57CF!304.entry"&gt;blog entry&lt;/a&gt; has great description of 4 reasons why setting partition slice in MOLAP is important. In my blog I want to focus on #3 from his list – automatic slice detection is not perfect.&lt;/p&gt;  &lt;p&gt;Here is how automatic slice detection works – during building indexes for the partition, for every attribute it determines the minimum and maximum data id values in this attribute (for more detailed explanation see section “Partition Slice” in the &lt;a href="http://www.amazon.com/Microsoft-Server-2005-Analysis-Services/dp/0672327821"&gt;Microsoft SQL Server 2005 Analysis Services book&lt;/a&gt;, page 355). It is possible to discover what values exactly Analysis Services detected, either doing it hard way as described &lt;a href="http://www.ssas-info.com/analysis-services-faq/29-mgmt/289-how-to-find-slice-values-that-analysis-services-assigned-partition-dimensions"&gt;here&lt;/a&gt;, or doing it easy way as described &lt;a href="http://cwebbbi.spaces.live.com/Blog/cns!7B84B0F2C239489A!1478.entry"&gt;here&lt;/a&gt;. In either case, it is clear, that since slice autodetection operates with ranges of data ids, it is totally possible that partition which contains only two attribute members, but one with the lowest possible data id (2), and another one with highest possible data id, and as a result the range will cover all possible values, i.e. will be useless.&lt;/p&gt;  &lt;p&gt;On the other hand, if for certain attribute the data in partition contains only single value of member’s data id, then the range will be very effective – since such a range will be equivalent to slice on a single member.&lt;/p&gt;  &lt;p&gt;So this must be the root of all the confusion in documentation and in whitepapers. Prior to AS2005, partition slice could have only been a single member. And specifying partition slice of a single member is really not necessary in MOLAP, since it will always be automatically discovered by autoslice (well, except for other 3 reasons that Jesse lists in his blog). But starting with AS2005, it is possible to specify MDX sets for partition slice, and that’s exactly the case where setting partition slice manually makes sense.&lt;/p&gt;  &lt;p&gt;Analysis Services Stored Procedures Project on Codeplex features &lt;a href="http://www.codeplex.com/ASStoredProcedures/Wiki/View.aspx?title=PartitionHealthCheck"&gt;PartitionHealthCheck&lt;/a&gt; function, which shows the overlaps in data id ranges between partitions, but it is no clear what action the user can take when he discovers that the autodetected ranges are not optimal. User doesn’t have direct control over assignments of data ids to dimension members. What this blog entry is going to explore is how to avoid relying on autoslice altogether, and put the precise control over partition slices into the hands of the user.&lt;/p&gt;  &lt;h3&gt;Almost related attributes&lt;/h3&gt;  &lt;p&gt;First scenario to consider is what I call “almost related attributes”. I.e. we have attributes, which are not truly related to each other, but, on the other hand are not completely unrelated. Examples of such attributes are Zip code to State – it is almost many to one, but sometimes there are Zip codes which cross State boundaries. Another example is Week to Year. Year mostly contains 52 weeks, but the weeks at the beginning and end of year can cross to previous or next year.&amp;#160; &lt;/p&gt;  &lt;p&gt;Let’s use Adventure Works 2008 as our sample database. We have ‘Calendar Year’ and ‘Calendar Week’ attributes. Let’s first query for specific Calendar Year:&lt;/p&gt;  &lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;SELECT 
  &lt;/span&gt;&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Reseller Sales Amount] &lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;&lt;span style="color:silver;"&gt;0
&lt;/span&gt;&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;&lt;span style="color:navy;"&gt;[Adventure Works]
&lt;/span&gt;&lt;span style="color:blue;"&gt;WHERE 
  &lt;/span&gt;&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar Year]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar Year]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[CY 2004]&lt;/span&gt;;&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;Running this query under &lt;a href="http://sqlblog.com/blogs/mosha/archive/2008/09/21/hierarchical-trace-profiling-with-mdx-studio.aspx"&gt;Hierarchical Profiler&lt;/a&gt; of &lt;a href="http://www.mdxstudio.com"&gt;MDX Studio&lt;/a&gt; shows that only one partition – Reseller_Sales_2004 – was read for that query. This is because that partition contains data just for year 2004, and therefore the automatic slice detection worked just fine. But let’s query for a specific week in 2004:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;SELECT 
  &lt;/span&gt;&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Reseller Sales Amount] &lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;&lt;span style="color:silver;"&gt;0
&lt;/span&gt;&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;&lt;span style="color:navy;"&gt;[Adventure Works]
&lt;/span&gt;&lt;span style="color:blue;"&gt;WHERE 
  &lt;/span&gt;&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar Week]&lt;/span&gt;.&amp;amp;[17]&amp;amp;[2004];&lt;/pre&gt;

&lt;p&gt;&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;Of course, we would like again to read just 2004 partition, but this time profiler shows that also partitions Reseller_Sales_2002 and Reseller_Sales_2003 were touched. This is pretty bad, this week 17 of 2004 is well inside year 2004 with no hope of crossing into 2003, let alone 2002 !&lt;/p&gt;

&lt;p&gt;So why did automatic slice detection did such a poor job here. To answer this question, let’s look at data id values for the Calendar Week attribute:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH MEMBER &lt;/span&gt;[Measures].CalendarWeekDataID &lt;span style="color:blue;"&gt;AS
  &lt;/span&gt;DataId(&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar Week]&lt;/span&gt;.&lt;span style="color:blue;"&gt;CurrentMember&lt;/span&gt;)
 ,CAPTION = &lt;span style="color:red;"&gt;'Data ID of Calendar Week'
&lt;/span&gt;&lt;span style="color:blue;"&gt;SELECT &lt;/span&gt;&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[CalendarWeekDataID] &lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;&lt;span style="color:silver;"&gt;0
&lt;/span&gt;, &lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar Week]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar Week] &lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;&lt;span style="color:silver;"&gt;1
&lt;/span&gt;&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;&lt;span style="color:navy;"&gt;[Adventure Works]&lt;/span&gt;;&lt;/pre&gt;

&lt;p&gt;This query shows the data id for every calendar week, and the problem is now obvious. The calendar weeks don’t have nice ordering by the years, instead first we get Week 1 of years 2002, 2003 and 2004 (this week didn’t exist in 2001), then Week 2 for these years etc. No wonder the range of data ids for calendar week, included week 17 of 2004 also in 2003 and 2002. We could fix the Calendar Week attribute, but we would still be at the mercy of Analysis Services about how it decides to assign data ids. Instead, we can explicitly set the slice on every partition in the ‘Reseller Sales’ measure group.&lt;/p&gt;

&lt;p&gt;And what is better way to build this slice if not by using another MDX query !&lt;/p&gt;

&lt;p&gt;&amp;#160;&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH 
  SET &lt;/span&gt;WeeksIn2004 &lt;span style="color:blue;"&gt;AS 
    &lt;/span&gt;Exists(
       &lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar Week]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar Week]&lt;/span&gt;.&lt;span style="color:blue;"&gt;MEMBERS
      &lt;/span&gt;,&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar Year]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar Year]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[CY 2004]&lt;/span&gt;)
  &lt;span style="color:blue;"&gt;MEMBER &lt;/span&gt;[Measures].PartitionSlice &lt;span style="color:blue;"&gt;AS
   &lt;/span&gt;&lt;span style="color:red;"&gt;&amp;quot;{&amp;quot; &lt;/span&gt;+
   Generate(
     WeeksIn2004 &lt;span style="color:blue;"&gt;AS &lt;/span&gt;itr, 
         &lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar Week]&lt;/span&gt;.&lt;span style="color:blue;"&gt;CurrentMember&lt;/span&gt;.&lt;span style="color:blue;"&gt;UniqueName 
       &lt;/span&gt;+ IIF(itr.&lt;span style="color:blue;"&gt;CurrentOrdinal &lt;/span&gt;= WeeksIn2004.&lt;span style="color:blue;"&gt;Count&lt;/span&gt;, &lt;span style="color:red;"&gt;&amp;quot;&amp;quot;&lt;/span&gt;, &lt;span style="color:red;"&gt;&amp;quot;,&amp;quot;&lt;/span&gt;))
   + &lt;span style="color:red;"&gt;&amp;quot;}&amp;quot;
&lt;/span&gt;&lt;span style="color:blue;"&gt;SELECT &lt;/span&gt;&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.PartitionSlice &lt;span style="color:blue;"&gt;ON &lt;/span&gt;&lt;span style="color:silver;"&gt;0
&lt;/span&gt;&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;[Adventure Works]&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;This returns long string representing MDX set with all the weeks in 2004. We just need to make sure that the resulting string is XML’ized, i.e. ‘&amp;amp;’ is replaced with ‘&amp;amp;amp;’ if it is to be pasted directly into XMLA script for ALTER partition, and we are good to go.&lt;/p&gt;

&lt;p&gt;When we apply this procedure to every partition, rerunning the query&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;SELECT 
  &lt;/span&gt;&lt;span style="color:navy;"&gt;[Measures]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Reseller Sales Amount] &lt;/span&gt;&lt;span style="color:blue;"&gt;ON &lt;/span&gt;&lt;span style="color:silver;"&gt;0
&lt;/span&gt;&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;&lt;span style="color:navy;"&gt;[Adventure Works]
&lt;/span&gt;&lt;span style="color:blue;"&gt;WHERE 
  &lt;/span&gt;&lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar Week]&lt;/span&gt;.&amp;amp;[17]&amp;amp;[2004];&lt;/pre&gt;

&lt;p&gt;&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;Now only touches 2004 partition.&lt;/p&gt;

&lt;h3&gt;Related attributes&lt;/h3&gt;

&lt;p&gt;So far we considered “almost related attributes”, as for truly related attributes one may think there is no problem. But in fact, in Adventure Works 2008, Calendar Week 2004 is defined as related to Calendar Year. So why when we sliced on &lt;span style="color:navy;"&gt;[Date]&lt;/span&gt;.&lt;span style="color:navy;"&gt;[Calendar Week]&lt;/span&gt;.&amp;amp;[17]&amp;amp;[2004] it didn’t decode the year ? If we run coordinate decoding in MDX Studio, it does properly show ‘CY 2004’ being decoded. The problem here lies in the fact that FE does coordinate decoding lazily, only if it really needs to. This makes sense – otherwise there will be a lot of time spent decoding attributes that nobody needs. So in our query, FE noticed that there were no calculations looking at Calendar Year’s CurrentMember, and it left it non decoded, as showed by verbose data in profiler in MDX Studio:&lt;/p&gt;

&lt;pre&gt;Dimension 9 [Date] (0 0 0 0 0 0 0 0 0 0 52 0 0 0 0 0 0 0 0 0 0)  

[Calendar Week]:[Week 17 CY 2004]  
[Calendar Year]:0  &lt;/pre&gt;

&lt;p&gt;But in this case, it looks like SE could benefit from attribute decoding on query subcube – this could be useful performance optimization.&lt;/p&gt;

&lt;h3&gt;What about multiple attributes&lt;/h3&gt;

&lt;p&gt;We now successfully dealt with Calendar Week attribute, but Date dimension contains plenty of other attributes. If their Data IDs are as messed up as with Calendar Week – we need to do something about them. Unfortunately here we run into unintentional, yet very unfortunate limitation of Analysis Services. It is impossible to set partition slice by multiple attributes if at least one of them is a set and not a single member (which is the only useful case for us !). This limitation stems from unfortunate combination of few different factors:&lt;/p&gt;

&lt;p&gt;1. Sets used in partition slice have to comply with limitations of SetToStr(…, CONSTRAINED) (see &lt;a href="http://msdn.microsoft.com/en-us/library/microsoft.analysisservices.partition.slice(ide).aspx"&gt;documentation&lt;/a&gt;). There is a good reason for that – partition slices are computed when cube is not really ready for querying, so putting just any MDX set could trigger FE wanting to get data from the cube, which is not yet ready with unpredictable results.&lt;/p&gt;

&lt;p&gt;2. SetToStr(…, CONSTRAINED) doesn’t allow crossjoin of sets using neither CrossJoin function, nor * operator, no tuple-like form – i.e. it is not possible to specify crossjoin&lt;/p&gt;

&lt;p&gt;3. While it is possible to specify set of tuples for SetToStr(…, CONSTRAINED), it still doesn’t help, since FE is not capable of building a subcube out of set of tuples (instead it immediatelly resorts to dreaded arbitrary shaped form).&lt;/p&gt;

&lt;p&gt;It would be useful if any one of these limitations was relaxed – because it would enable true great control over partition slices in non-trivial cubes.&lt;/p&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=9471" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/mosha/archive/tags/mdx/default.aspx">mdx</category><category domain="http://sqlblog.com/blogs/mosha/archive/tags/performance/default.aspx">performance</category><category domain="http://sqlblog.com/blogs/mosha/archive/tags/ssas/default.aspx">ssas</category></item><item><title>Product Volatility: Optimizing MDX with MDX Studio</title><link>http://sqlblog.com/blogs/mosha/archive/2008/07/29/product-volatility-optimizing-mdx-with-mdx-studio.aspx</link><pubDate>Wed, 30 Jul 2008 03:38:16 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:8120</guid><dc:creator>mosha</dc:creator><slash:comments>0</slash:comments><comments>http://sqlblog.com/blogs/mosha/comments/8120.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/mosha/commentrss.aspx?PostID=8120</wfw:commentRss><description>&lt;p&gt;&lt;/p&gt;  &lt;p&gt;In this article we will show how to take typical, everyday MDX and optimize it by simply following advices of MDX Studio Analyzer.&lt;/p&gt;  &lt;p&gt;As a starting point, I will take “&lt;a href="http://thomasianalytics.spaces.live.com/Blog/cns!B6B6A40B93AE1393!589.entry?9ad3d6b0"&gt;Product Volatility&lt;/a&gt;” example as published by Thomas Ivarson in his blog. It is a perfect example, because Thomas took a real business problem, and implemented it using very clean and straightforward MDX, just like I expect many MDX practitioners do. This MDX can be optimized, but some of these optimizations might not be obvious. Running MDX Studio Analyzer and following its advices takes the guesswork out of equation.&lt;/p&gt;  &lt;p&gt;Here is the MDX that computes total number of products, number of products with sales increased from last month and number of products with sales decreased from last month:&lt;/p&gt;  &lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH 
  MEMBER &lt;/span&gt;Measures.TotNumberOfProducts &lt;span style="color:blue;"&gt;AS 
    Count
    &lt;/span&gt;(
      &lt;span style="color:blue;"&gt;Descendants
      &lt;/span&gt;(
        [Product].[Product Categories].&lt;span style="color:maroon;"&gt;CurrentMember
       &lt;/span&gt;,[Product].[Product Categories].[Product]
      )
    ) 
  &lt;span style="color:blue;"&gt;MEMBER &lt;/span&gt;Measures.NumberOfSoldProductsIncreasing &lt;span style="color:blue;"&gt;AS 
    Count
    &lt;/span&gt;(
      &lt;span style="color:maroon;"&gt;Filter
      &lt;/span&gt;(
        &lt;span style="color:maroon;"&gt;NonEmpty
        &lt;/span&gt;(
          &lt;span style="color:blue;"&gt;Descendants
          &lt;/span&gt;(
            [Product].[Product Categories].&lt;span style="color:maroon;"&gt;CurrentMember
           &lt;/span&gt;,[Product].[Product Categories].[Product]
          )
         ,[Measures].[Internet Sales Amount]
        )
       ,
            (
              [Date].[Calendar].&lt;span style="color:maroon;"&gt;CurrentMember
             &lt;/span&gt;,[Measures].[Internet Sales Amount]
            )
          - 
            (
              [Date].[Calendar].&lt;span style="color:maroon;"&gt;PrevMember
             &lt;/span&gt;,[Measures].[Internet Sales Amount]
            )
        &amp;gt; 0
      )
    ) 
  &lt;span style="color:blue;"&gt;MEMBER &lt;/span&gt;Measures.NumberOfSoldProductsDecreasing &lt;span style="color:blue;"&gt;AS 
    Count
    &lt;/span&gt;(
      &lt;span style="color:maroon;"&gt;Filter
      &lt;/span&gt;(
        &lt;span style="color:maroon;"&gt;NonEmpty
        &lt;/span&gt;(
          &lt;span style="color:blue;"&gt;Descendants
          &lt;/span&gt;(
            [Product].[Product Categories].&lt;span style="color:maroon;"&gt;CurrentMember
           &lt;/span&gt;,[Product].[Product Categories].[Product]
          )
         ,[Measures].[Internet Sales Amount]
        )
       ,
            (
              [Date].[Calendar].&lt;span style="color:maroon;"&gt;CurrentMember
             &lt;/span&gt;,[Measures].[Internet Sales Amount]
            )
          - 
            (
              [Date].[Calendar].&lt;span style="color:maroon;"&gt;PrevMember
             &lt;/span&gt;,[Measures].[Internet Sales Amount]
            )
        &amp;lt; 0
      )
    ) 
&lt;span style="color:blue;"&gt;SELECT
  &lt;/span&gt;{
    [Measures].[TotNumberOfProducts]
   ,[Measures].[NumberOfSoldProductsIncreasing]
   ,[Measures].[NumberOfSoldProductsDecreasing]
  } &lt;span style="color:blue;"&gt;ON &lt;/span&gt;0
 ,[Date].[Calendar].[Month] &lt;span style="color:blue;"&gt;ON &lt;/span&gt;1
&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;[Adventure Works]&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;Executing this query with hot cache in MDX Studio produces the following perfmon counter statistics:&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;font face="Lucida Console" size="2"&gt;Time&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; : 359 ms 
    &lt;br /&gt;Calc covers&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; : 85 

    &lt;br /&gt;Cells calculated : 7672 

    &lt;br /&gt;Sonar subcubes&amp;#160;&amp;#160; : 79 

    &lt;br /&gt;SE queries&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; : 76 

    &lt;br /&gt;Cache hits&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; : 76 

    &lt;br /&gt;Cache misses&amp;#160;&amp;#160;&amp;#160;&amp;#160; : 3 

    &lt;br /&gt;Cache inserts&amp;#160;&amp;#160;&amp;#160; : 3 

    &lt;br /&gt;Cache lookups&amp;#160;&amp;#160;&amp;#160; : 79&lt;/font&gt;&lt;/p&gt;

&lt;p&gt;The time looks good, but only because both the cube and the query are small. The number of SE queries should raise a red flag here. With only 114 cells in the result, why there were 76 SE queries ? Obviously, on more serious cubes, this will become a major performance issue. So, let’s run MDX Studio Analyzer and see what it has to say about this query. We get back the following list:&lt;/p&gt;

&lt;table style="border-right:1px solid;border-top:1px solid;font-size:10pt;border-left:1px solid;border-bottom:1px solid;font-family:calibri;" cellspacing="0" cellpadding="2"&gt;
    &lt;tr&gt;
      &lt;td&gt;Line&lt;/td&gt;

      &lt;td&gt;Col&lt;/td&gt;

      &lt;td&gt;Message&lt;/td&gt;

      &lt;td&gt;Link&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td style="width:50px;"&gt;3&lt;/td&gt;

      &lt;td style="width:50px;"&gt;6&lt;/td&gt;

      &lt;td&gt;If you are trying to count number of members in current selection - consider introducing special measure group for this dimension&lt;/td&gt;

      &lt;td&gt;&lt;a href="http://sqlblog.com/blogs/mosha/archive/2007/05/27/counting-days-in-mdx.aspx"&gt;More Info&lt;/a&gt;&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td style="width:50px;"&gt;7&lt;/td&gt;

      &lt;td style="width:50px;"&gt;41&lt;/td&gt;

      &lt;td&gt;MDX function 'CurrentMember' may raise an error or produce non-desired result when user applies multiselect&lt;/td&gt;

      &lt;td&gt;&lt;a href="http://sqlblog.com/blogs/mosha/archive/2007/01/13/multiselect-friendly-mdx-for-calculations-looking-at-current-coordinate.aspx"&gt;More Info&lt;/a&gt;&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td style="width:50px;"&gt;12&lt;/td&gt;

      &lt;td style="width:50px;"&gt;6&lt;/td&gt;

      &lt;td&gt;Consider rewriting Count(Filter(set, cond)) construct as Sum(set, summator) with summator IIF(cond, 1, NULL)&lt;/td&gt;

      &lt;td&gt;&lt;a href="http://sqlblog.com/blogs/mosha/archive/2007/11/22/optimizing-count-filter-expressions-in-mdx.aspx"&gt;More Info&lt;/a&gt;&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td style="width:50px;"&gt;20&lt;/td&gt;

      &lt;td style="width:50px;"&gt;45&lt;/td&gt;

      &lt;td&gt;MDX function 'CurrentMember' may raise an error or produce non-desired result when user applies multiselect&lt;/td&gt;

      &lt;td&gt;&lt;a href="http://sqlblog.com/blogs/mosha/archive/2007/01/13/multiselect-friendly-mdx-for-calculations-looking-at-current-coordinate.aspx"&gt;More Info&lt;/a&gt;&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td style="width:50px;"&gt;27&lt;/td&gt;

      &lt;td style="width:50px;"&gt;34&lt;/td&gt;

      &lt;td&gt;MDX function 'CurrentMember' may raise an error or produce non-desired result when user applies multiselect&lt;/td&gt;

      &lt;td&gt;&lt;a href="http://sqlblog.com/blogs/mosha/archive/2007/01/13/multiselect-friendly-mdx-for-calculations-looking-at-current-coordinate.aspx"&gt;More Info&lt;/a&gt;&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td style="width:50px;"&gt;32&lt;/td&gt;

      &lt;td style="width:50px;"&gt;34&lt;/td&gt;

      &lt;td&gt;MDX function 'PrevMember' may raise an error or produce non-desired result when user applies multiselect&lt;/td&gt;

      &lt;td&gt;&lt;a href="http://sqlblog.com/blogs/mosha/archive/2007/01/13/multiselect-friendly-mdx-for-calculations-looking-at-current-coordinate.aspx"&gt;More Info&lt;/a&gt;&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td style="width:50px;"&gt;39&lt;/td&gt;

      &lt;td style="width:50px;"&gt;6&lt;/td&gt;

      &lt;td&gt;Consider rewriting Count(Filter(set, cond)) construct as Sum(set, summator) with summator IIF(cond, 1, NULL)&lt;/td&gt;

      &lt;td&gt;&lt;a href="http://sqlblog.com/blogs/mosha/archive/2007/11/22/optimizing-count-filter-expressions-in-mdx.aspx"&gt;More Info&lt;/a&gt;&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td style="width:50px;"&gt;47&lt;/td&gt;

      &lt;td style="width:50px;"&gt;45&lt;/td&gt;

      &lt;td&gt;MDX function 'CurrentMember' may raise an error or produce non-desired result when user applies multiselect&lt;/td&gt;

      &lt;td&gt;&lt;a href="http://sqlblog.com/blogs/mosha/archive/2007/01/13/multiselect-friendly-mdx-for-calculations-looking-at-current-coordinate.aspx"&gt;More Info&lt;/a&gt;&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
      &lt;td style="width:50px;"&gt;53&lt;/td&gt;

      &lt;td style="width:50px;"&gt;14&lt;/td&gt;

      &lt;td&gt;Same expression was used before at Line 26 Column 14. Consider eliminating common subexpressions for better performance and to take advantage of cache&lt;/td&gt;

      &lt;td&gt;&amp;#160;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/table&gt;

&lt;p&gt;Let’s go over this list and fix issue by issue.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Line 3, Col 6: &lt;em&gt;If you are trying to count number of members in current selection - consider introducing special measure group for this dimension&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;The link points to “&lt;a href="http://sqlblog.com/blogs/mosha/archive/2007/05/27/counting-days-in-mdx.aspx"&gt;Counting days in MDX&lt;/a&gt;” blog. While we count here Products and not Days, the same technique applies here. These types of questions are best done not in MDX, but by introducing measure group, which only contains Products dimension, and single measure [Number Of Products] of type Count. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Line 7, Col 41: MDX function 'CurrentMember' may raise an error or produce non-desired result when user applies multiselect&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;Now, this is no longer a problem because of how we solved the previous warning ! Indeed, if we wanted to see total number of products across Bikes and Accessories categories, we would’ve added the following WHERE clause:&lt;/p&gt;

&lt;pre&gt;WHERE {[Product].[Category].&amp;amp;[1],[Product].[Category].&amp;amp;[4]}&lt;/pre&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;It fails with the original query, but works fine with Count measure [Number of Products]&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lines 12/39, Col 6: Consider rewriting Count(Filter(set, cond)) construct as Sum(set, summator) with summator IIF(cond, 1, NULL)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is a common construct, and as shown in the link to the “&lt;a href="http://sqlblog.com/blogs/mosha/archive/2007/11/22/optimizing-count-filter-expressions-in-mdx.aspx"&gt;Optimizing Count(Filter(…)) expressions in MDX&lt;/a&gt;”, it performs much better if rewritten to use Sum. But before we do this rewrite, let’s take a look at the next warning:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Line 53, Col 14: Same expression was used before at Line 26 Column 14. Consider eliminating common subexpressions for better performance and to take advantage of cache&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If we click on this message, the expression in question is the following:&lt;/p&gt;

&lt;pre&gt;            (
              [Date].[Calendar].CurrentMember
             ,[Measures].[Internet Sales Amount]
            )
          - 
            (
              [Date].[Calendar].PrevMember
             ,[Measures].[Internet Sales Amount]
            )&lt;/pre&gt;

&lt;p&gt;Indeed, this expression represents Sales Growth, so we can isolate it into separate calculated member. But before we write the exact expression for the calculated member, let’s look deeper into the original expression. It features Count(Filter(NonEmpty(…))) construct, i.e. only products which had sales in current month are considered for increasing/decreasing calculation. I.e. even if product had sales in the previous month, but not in the current month, it won’t be counted in the number of products with decreasing sales. In order to repeat the same logic, the expression would be&lt;/p&gt;

&lt;pre&gt;  [Sales Growth] =
    IIF
    (
      IsEmpty([Measures].[Internet Sales Amount])
     ,NULL
     ,
        [Measures].[Internet Sales Amount]
      - 
        (
          [Date].[Calendar].PrevMember
         ,[Measures].[Internet Sales Amount]
        )
    ) &lt;/pre&gt;

&lt;p&gt;For more details why IIF(cond, NULL, exp) is good for performance, read “&lt;a href="http://sqlblog.com/blogs/mosha/archive/2007/01/28/performance-of-iif-function-in-mdx.aspx"&gt;Performance of IIF function in MDX&lt;/a&gt;” blog. Now, putting all of together so far, we will get the following query:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH 
  MEMBER &lt;/span&gt;[Sales Growth] &lt;span style="color:blue;"&gt;AS 
    &lt;/span&gt;&lt;span style="color:maroon;"&gt;IIF
    &lt;/span&gt;(
      &lt;span style="color:maroon;"&gt;IsEmpty&lt;/span&gt;([Measures].[Internet Sales Amount])
     ,&lt;span style="color:blue;"&gt;NULL
     &lt;/span&gt;,
        [Measures].[Internet Sales Amount]
      - 
        (
          [Date].[Calendar].&lt;span style="color:maroon;"&gt;PrevMember
         &lt;/span&gt;,[Measures].[Internet Sales Amount]
        )
    ) 
  &lt;span style="color:blue;"&gt;MEMBER &lt;/span&gt;PositiveGrowth &lt;span style="color:blue;"&gt;AS 
    &lt;/span&gt;&lt;span style="color:maroon;"&gt;IIF
    &lt;/span&gt;(
      [Measures].[Sales Growth] &amp;gt; 0
     ,1
     ,&lt;span style="color:blue;"&gt;NULL
    &lt;/span&gt;) 
  &lt;span style="color:blue;"&gt;MEMBER &lt;/span&gt;Measures.NumberOfSoldProductsIncreasing &lt;span style="color:blue;"&gt;AS 
    &lt;/span&gt;&lt;span style="color:maroon;"&gt;Sum
    &lt;/span&gt;(
      &lt;span style="color:blue;"&gt;Descendants
      &lt;/span&gt;(
        [Product].[Product Categories].&lt;span style="color:maroon;"&gt;CurrentMember
       &lt;/span&gt;,[Product].[Product Categories].[Product]
      )
     ,[Measures].[PositiveGrowth]
    ) 
  &lt;span style="color:blue;"&gt;MEMBER &lt;/span&gt;NegativeGrowth &lt;span style="color:blue;"&gt;AS 
    &lt;/span&gt;&lt;span style="color:maroon;"&gt;IIF
    &lt;/span&gt;(
      [Measures].[Sales Growth] &amp;lt; 0
     ,1
     ,&lt;span style="color:blue;"&gt;NULL
    &lt;/span&gt;) 
  &lt;span style="color:blue;"&gt;MEMBER &lt;/span&gt;Measures.NumberOfSoldProductsDecreasing &lt;span style="color:blue;"&gt;AS 
    &lt;/span&gt;&lt;span style="color:maroon;"&gt;Sum
    &lt;/span&gt;(
      &lt;span style="color:blue;"&gt;Descendants
      &lt;/span&gt;(
        [Product].[Product Categories].&lt;span style="color:maroon;"&gt;CurrentMember
       &lt;/span&gt;,[Product].[Product Categories].[Product]
      )
     ,[Measures].[NegativeGrowth]
    ) 
&lt;span style="color:blue;"&gt;SELECT
  &lt;/span&gt;{
    [Measures].[Number Of Products]
   ,[Measures].[NumberOfSoldProductsIncreasing]
   ,[Measures].[NumberOfSoldProductsDecreasing]
  } &lt;span style="color:blue;"&gt;ON &lt;/span&gt;0
 ,[Date].[Calendar].[Month] &lt;span style="color:blue;"&gt;ON &lt;/span&gt;1
&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;[Adventure Works];&lt;/pre&gt;

&lt;p&gt;When executed in MDX Studio, we get the following perfmon statistics:&lt;/p&gt;

&lt;p&gt;&lt;font face="Lucida Console" size="2"&gt;Time&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; : 171 ms 
    &lt;br /&gt;Calc covers&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; : 15 

    &lt;br /&gt;Cells calculated : 1993 

    &lt;br /&gt;Sonar subcubes&amp;#160;&amp;#160; : 6 

    &lt;br /&gt;SE queries&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; : 3 

    &lt;br /&gt;Cache hits&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; : 7 

    &lt;br /&gt;Cache misses&amp;#160;&amp;#160;&amp;#160;&amp;#160; : 5 

    &lt;br /&gt;Cache inserts&amp;#160;&amp;#160;&amp;#160; : 5 

    &lt;br /&gt;Cache lookups&amp;#160;&amp;#160;&amp;#160; : 12&lt;/font&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;This is now much better than before, there are only 3 SE queries. All of the performance warnings have been addressed, but we still have warnings about CurrentMember not working well with multiselect. Indeed, when we put multiselect on product categories on Bikes and Accessories, we get errors for number of increasing and decreasing products. Links to “&lt;a href="http://sqlblog.com/blogs/mosha/archive/2005/11/18/writing-multiselect-friendly-mdx-calculations.aspx"&gt;Writing multiselect friendly MDX calculations&lt;/a&gt;” and “&lt;a href="http://sqlblog.com/blogs/mosha/archive/2007/01/13/multiselect-friendly-mdx-for-calculations-looking-at-current-coordinate.aspx"&gt;Multiselect friendly MDX for calculations looking at current coordinate&lt;/a&gt;” blogs are helpful to understand what the problem is. Unfortunately, there is no universal solution, and each case needs to be evaluated differently. &lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;In this scenario, we can use apply to Sum(Descendants(…), exp) very similar technique that we applied to Count(Descendants(…)). The idea is not to use Descendants function, which forces us to use CurrentMember, which in turn fails on multiselect, but to put the expression “exp” directly on the Product attribute. In order to do that, we need to create fake measure in the measure group - “Num Products Increased”. It is bound to NULL expression, so this measure will always be NULL, and therefore won’t take any space in storage. But since this measure will have aggregation function Sum, it will properly aggregate when assigned expressions inside MDX Script.&lt;/p&gt;

&lt;p&gt;We will use MDX Studio to emulate MDX Script too (MDX Studio has a feature, where it automatically uses cube selected in the dropdown list to resolve current cube used in script statements).&lt;/p&gt;

&lt;p&gt;Here is the MDX:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;CREATE 
  &lt;/span&gt;[Sales Growth] = 
    &lt;span style="color:maroon;"&gt;IIF
    &lt;/span&gt;(
      &lt;span style="color:maroon;"&gt;IsEmpty&lt;/span&gt;([Measures].[Internet Sales Amount])
     ,&lt;span style="color:blue;"&gt;NULL
     &lt;/span&gt;,
        [Measures].[Internet Sales Amount]
      - 
        (
          [Date].[Calendar].&lt;span style="color:maroon;"&gt;PrevMember
         &lt;/span&gt;,[Measures].[Internet Sales Amount]
        )
    );&lt;/pre&gt;

&lt;pre class="code"&gt;
(
  [Product].[Product Categories].[Product]
 ,[Measures].[Num Products Increased]
) = 
  &lt;span style="color:maroon;"&gt;IIF
  &lt;/span&gt;(
    [Measures].[Sales Growth] &amp;gt; 0
   ,1
   ,&lt;span style="color:blue;"&gt;NULL
  &lt;/span&gt;);&lt;/pre&gt;

&lt;pre class="code"&gt;
&lt;span style="color:blue;"&gt;SELECT
  &lt;/span&gt;{
    [Measures].[Number Of Products]
   ,[Measures].[Num Products Increased]
  } &lt;span style="color:blue;"&gt;ON &lt;/span&gt;0
 ,[Date].[Calendar].[Month] &lt;span style="color:blue;"&gt;ON &lt;/span&gt;1
&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;[Adventure Works]
&lt;span style="color:blue;"&gt;WHERE 
  &lt;/span&gt;{
    [Product].[Category].&amp;amp;[1]
   ,[Product].[Category].&amp;amp;[4]
  };&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;It’s performance characteristics are even slightly better than the query before:&lt;/p&gt;

&lt;p&gt;&lt;font face="Lucida Console" size="2"&gt;Time&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; : 31 ms
    &lt;br /&gt;Calc covers&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; : 6

    &lt;br /&gt;Cells calculated : 76

    &lt;br /&gt;Sonar subcubes&amp;#160;&amp;#160; : 1

    &lt;br /&gt;SE queries&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; : 1

    &lt;br /&gt;Cache hits&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160;&amp;#160; : 2

    &lt;br /&gt;Cache misses&amp;#160;&amp;#160;&amp;#160;&amp;#160; : 0

    &lt;br /&gt;Cache inserts&amp;#160;&amp;#160;&amp;#160; : 0

    &lt;br /&gt;Cache lookups&amp;#160;&amp;#160;&amp;#160; : 2&lt;/font&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;Not to mention the fact that it works just fine with multiselect on products ! This is a great example, where fixing MDX to work properly in multiselect scenarios, has the side effect of improving performance. &lt;/p&gt;

&lt;p&gt;We still have warnings about PrevMember with respect to multiselect on the Time dimension. It is more difficult problem, since the definition of “Sales Growth” when there is multiselect on the Time dimension is ambiguous.&lt;/p&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=8120" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/mosha/archive/tags/mdx/default.aspx">mdx</category><category domain="http://sqlblog.com/blogs/mosha/archive/tags/mdx+studio/default.aspx">mdx studio</category><category domain="http://sqlblog.com/blogs/mosha/archive/tags/performance/default.aspx">performance</category></item><item><title>MDX and partitioning</title><link>http://sqlblog.com/blogs/mosha/archive/2007/12/18/mdx-and-partitioning.aspx</link><pubDate>Wed, 19 Dec 2007 04:21:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:4093</guid><dc:creator>mosha</dc:creator><slash:comments>5</slash:comments><comments>http://sqlblog.com/blogs/mosha/comments/4093.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/mosha/commentrss.aspx?PostID=4093</wfw:commentRss><description>&lt;P&gt;Partitions have been a built-in feature of Analysis Services ever since the first version. Naturally, all the engine subsystems, including the MDX query optimizer,&amp;nbsp;work well with partitions. There are rare cases, however, when over-aggressive prefetching can generate query plan which is not optimal with respect to the partitioning scheme. Today we will review one such scenario and see how MDX could be rewritten to suit the partitioning scheme better.&lt;/P&gt;
&lt;P&gt;Let's consider scenario where we need to determine the last date for which there is data in certain measure group. The example will be built around Adventure Works sample database, using Date dimension and Internet Sales Amount as a measure. &lt;/P&gt;
&lt;P&gt;&lt;EM&gt;One more thing to note: while writing this post I used MDX Studio which simplifies many operations, so when below I say "look at the set", it means inspecting it in the Watch window of MDX Studio, when I say "clear the cache", it means using MDX Studio's cache clear button etc. The only other tool that needs to be run is Profiler, since AS traces are not yet integrated into MDX Studio, but the MDX Studio development team is working on this feature. Also, in order to be able to run MDX Script statements from within MDX Studio, I added "Cube=Adventure Works" in the Connection Properties field of the Connect dialog.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;Due to limitations of sqlblog.com, in all examples below remove the space before the square braket in the&amp;nbsp;[Date ] fragments&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The most straightforward way to compute the last date is by using the following formula:&lt;/P&gt;&lt;PRE&gt;CREATE SET CurrentCube.LastDate as Tail(NonEmpty([Date ].[Date ].[Date ], [Measures].[Internet Sales Amount]), 1)
&lt;/PRE&gt;
&lt;P&gt;We can see that the result set will have a single member in it - July 31, 2004. But if we execute this statement over the clean cache, we can see the following events in the trace:&lt;/P&gt;&lt;PRE&gt;Query Begin&amp;nbsp;&amp;nbsp;&amp;nbsp; 0 - MDXQuery&amp;nbsp;&amp;nbsp;&amp;nbsp; MDX Studio v0.2.6.0&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR&gt;Progress Report Begin&amp;nbsp;&amp;nbsp;&amp;nbsp; 14 - Query&amp;nbsp;&amp;nbsp;&amp;nbsp; Started reading data from the 'Internet_Sales_2001' partition.&amp;nbsp;&amp;nbsp;&lt;BR&gt;Progress Report Begin&amp;nbsp;&amp;nbsp;&amp;nbsp; 14 - Query&amp;nbsp;&amp;nbsp;&amp;nbsp; Started reading data from the 'Internet_Sales_2002' partition.&amp;nbsp;&amp;nbsp;&lt;BR&gt;Progress Report Begin&amp;nbsp;&amp;nbsp;&amp;nbsp; 14 - Query&amp;nbsp;&amp;nbsp;&amp;nbsp; Started reading data from the 'Internet_Sales_2003' partition.&amp;nbsp;&amp;nbsp;&lt;BR&gt;Progress Report End&amp;nbsp;&amp;nbsp;&amp;nbsp; 14 - Query&amp;nbsp;&amp;nbsp;&amp;nbsp; Finished reading data from the 'Internet_Sales_2001' partition.&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;BR&gt;Progress Report Begin&amp;nbsp;&amp;nbsp;&amp;nbsp; 14 - Query&amp;nbsp;&amp;nbsp;&amp;nbsp; Started reading data from the 'Internet_Sales_2004' partition.&amp;nbsp;&amp;nbsp;&lt;BR&gt;Progress Report End&amp;nbsp;&amp;nbsp;&amp;nbsp; 14 - Query&amp;nbsp;&amp;nbsp;&amp;nbsp; Finished reading data from the 'Internet_Sales_2002' partition.&amp;nbsp;&amp;nbsp;&lt;BR&gt;Progress Report End&amp;nbsp;&amp;nbsp;&amp;nbsp; 14 - Query&amp;nbsp;&amp;nbsp;&amp;nbsp; Finished reading data from the 'Internet_Sales_2003' partition.&amp;nbsp;&amp;nbsp;&lt;BR&gt;Progress Report End&amp;nbsp;&amp;nbsp;&amp;nbsp; 14 - Query&amp;nbsp;&amp;nbsp;&amp;nbsp; Finished reading data from the 'Internet_Sales_2004' partition.&amp;nbsp;&amp;nbsp;&lt;BR&gt;Query End&amp;nbsp;&amp;nbsp;&amp;nbsp; 0 - MDXQuery&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/PRE&gt;
&lt;P&gt;We see that all partitions got queried, even though the last non empty date is in 2004, and therefore it should've been enough to query just the Internet_Sales_2004 partition. Unfortunately, the query optimizer gets carried away in this example, decides to ignore the outer Tail(..., 1) and executes NonEmpty over all dates literally over all dates. So, we need a different strategy here. The second obvious strategy for finding last non empty date is the recursive approach - start from the last date in the dimension and move back by one date until we run into a date which has data. The formula for that is &lt;/P&gt;&lt;PRE&gt;CREATE
 MEMBER CurrentCube.LastDateIndex AS
   Iif(
    IsEmpty([Measures].[Internet Sales Amount]), 
    [Date ].[Calendar].PrevMember,
    Rank([Date ].[Calendar].CurrentMember, [Date ].[Calendar].[Date ])
   )
 SET LastDate AS [Date ].[Calendar].[Date ].Item((LastDateIndex, Tail([Date ].[Calendar].[Date ],1).Item(0))-1)&lt;/PRE&gt;
&lt;P&gt;Here we implemented the recursive logic inside calculated member LastDateIndex, and then we position it on the last date (using Tail(1) over level of dates for that). The result is the same as before, but trace looks very different:&lt;/P&gt;&lt;PRE&gt;Query Begin	0 - MDXQuery	MDX Studio v0.2.6.0		
Query Subcube	2 - Non-cache data August 31, 2004	
Query Subcube	2 - Non-cache data August 30, 2004	
Query Subcube	2 - Non-cache data August 29, 2004	
Query Subcube	2 - Non-cache data August 28, 2004	
Query Subcube	2 - Non-cache data August 27, 2004	
Query Subcube	2 - Non-cache data August 26, 2004	
Query Subcube	2 - Non-cache data August 25, 2004	
Query Subcube	2 - Non-cache data August 24, 2004	
Query Subcube	2 - Non-cache data August 23, 2004	
Query Subcube	2 - Non-cache data August 22, 2004	
Query Subcube	2 - Non-cache data August 21, 2004	
Query Subcube	2 - Non-cache data August 20, 2004	
Query Subcube	2 - Non-cache data August 19, 2004	
Query Subcube	2 - Non-cache data August 18, 2004	
Query Subcube	2 - Non-cache data August 17, 2004	
Query Subcube	2 - Non-cache data August 16, 2004	
Query Subcube	2 - Non-cache data August 15, 2004	
Query Subcube	2 - Non-cache data August 14, 2004	
Query Subcube	2 - Non-cache data August 13, 2004	
Query Subcube	2 - Non-cache data August 12, 2004	
Query Subcube	2 - Non-cache data August 11, 2004	
Query Subcube	2 - Non-cache data August 10, 2004	
Query Subcube	2 - Non-cache data August 9, 2004	
Query Subcube	2 - Non-cache data August 8, 2004	
Query Subcube	2 - Non-cache data August 7, 2004	
Query Subcube	2 - Non-cache data August 6, 2004	
Query Subcube	2 - Non-cache data August 5, 2004	
Query Subcube	2 - Non-cache data August 4, 2004	
Query Subcube	2 - Non-cache data August 3, 2004	
Query Subcube	2 - Non-cache data August 2, 2004	
Query Subcube	2 - Non-cache data August 1, 2004	
Progress Report Begin	14 - Query	Started reading data from the 'Internet_Sales_2004' partition.
Progress Report End	14 - Query	Finished reading data from the 'Internet_Sales_2004' partition.
Query Subcube	2 - Non-cache data July 31, 2004	
Query End	0 - MDXQuery	
&lt;/PRE&gt;
&lt;P&gt;Now we see lots of non-cached Query Subcube&amp;nbsp;requests for the dates going from August 31, 2004 down to July 31, 2004. Interestingly, only the last one triggers reading of the partition, since for the others, partition autoslice automatically detects that they don't belong to any partition. So in certain sense this is a good result, since we only touched one partition, but in another sense, this query plan with lots of Query Subcube requests doesn't look good. We were lucky, that in Adventure Works there were only 31 days at the end of Date dimension without data, so recursion went 31 step only. But it is not uncommon to have&amp;nbsp;Date dimensions&amp;nbsp;looking several years ahead, in which case recursion could go on for thousands of steps, and overhead of thousands of Query Subcube requests could become substantial, not to mention the overhead of the depth&amp;nbsp;of recursion.&lt;/P&gt;
&lt;P&gt;This leads us to another approach. Instead of iterating by days, we can iterate by partitions. Since in Adventure Works partitioning scheme is by years, we will iterate by years, and then find the last non empty date within a year. The MDX that implements this approach is below:&lt;/P&gt;&lt;PRE&gt;CREATE
 MEMBER CurrentCube.LastYearIndex AS
  Iif(
   IsEmpty([Measures].[Internet Sales Amount]), 
   [Date ].[Calendar Year].PrevMember,
   Rank([Date ].[Calendar Year].CurrentMember, [Date ].[Calendar Year].[Calendar Year])
  )
 SET LastYear AS [Date ].[Calendar Year].[Calendar Year].Item((LastYearIndex, [Date ].[Calendar Year].LastChild)-1)
 SET LastDate AS Tail(NonEmpty([Date ].[Date ].[Date ].MEMBERS, (LastYear, [Measures].[Internet Sales Amount])), 1)
&lt;/PRE&gt;
&lt;P&gt;Again, this gives us the desired result, and now the trace looks like following:&lt;/P&gt;&lt;PRE&gt;Query Begin	0 - MDXQuery	MDX Studio v0.2.6.0
Progress Report Begin	14 - Query	Started reading data from the 'Internet_Sales_2004' partition.
Progress Report End	14 - Query	Finished reading data from the 'Internet_Sales_2004' partition.
Query Subcube	2 - Non-cache data
00000000,000,00000,00,000000000000100000,000000000000000000,000000000000000000,0000000000000000000000,000000000000000000000,00,10
Progress Report Begin	14 - Query	Started reading data from the 'Internet_Sales_2004' partition.
Progress Report End	14 - Query	Finished reading data from the 'Internet_Sales_2004' partition.
Query Subcube	2 - Non-cache data
00000000,000,00000,00,010000000000100000,000000000000000000,000000000000000000,0000000000000000000000,000000000000000000000,00,10
Query End	0 - MDXQuery
&lt;/PRE&gt;
&lt;P&gt;We eliminated lots of Query Subcube requests, but we still have two, and now each one triggers query of partition. Why there are two of them ? This is easily answered by inspecting the Query Subcube granularity bitmasks. We see that the difference between them is in the Date dimension. First Query Subcube doesn't have granularity on Date attribute, while the second one has. It is clear now, that the first Query Subcube request is triggered by IsEmpty([Measures].[Internet Sales Amount]) check inside Iif - because it happens at the Year granularity, and the second Query Subcube is due to NonEmpty over dates with the filter on the specific year.&lt;/P&gt;
&lt;P&gt;We don't like the fact that we go to the partition twice, essentially for the same data but on different granularity. So we can do a little trick in MDX. Since we will have to go to the Date granularity within a year anyway, let's do it sooner rather than later - and expand granularity already inside Iif check.&amp;nbsp;The resulting MDX will look like following:&lt;/P&gt;&lt;PRE&gt;CREATE
 MEMBER CurrentCube.LastYearIndex AS
  Iif(
   IsEmpty(Aggregate(Descendants([Date ].[Calendar], [Date ].[Calendar].[Date ]), [Measures].[Internet Sales Amount])),
   [Date ].[Calendar Year].PrevMember,
   Rank([Date ].[Calendar Year].CurrentMember, [Date ].[Calendar Year].[Calendar Year])
  )
 SET LastYear AS [Date ].[Calendar Year].[Calendar Year].Item((LastYearIndex, [Date ].[Calendar Year].LastChild)-1)
 SET LastDate AS Tail(NonEmpty([Date ].[Date ].[Date ].MEMBERS, (LastYear, [Measures].[Internet Sales Amount])), 1)
&lt;/PRE&gt;
&lt;P&gt;Let's see what we get in the trace now:&lt;/P&gt;&lt;PRE&gt;Query Begin	0 - MDXQuery	MDX Studio v0.2.6.0
Progress Report Begin	14 - Query	Started reading data from the 'Internet_Sales_2004' partition.
Progress Report End	14 - Query	Finished reading data from the 'Internet_Sales_2004' partition.
Query Subcube	2 - Non-cache data
00000000,000,00000,00,011010000001100000,000000000000000000,000000000000000000,0000000000000000000000,000000000000000000000,00,10
Query Subcube	1 - Cache data
00000000,000,00000,00,010000000000100000,000000000000000000,000000000000000000,0000000000000000000000,000000000000000000000,00,10
Query End	0 - MDXQuery
&lt;/PRE&gt;
&lt;P&gt;This is perfect - exactly what we wanted. The second Query Subcube which used to be Non-cache data, now turned into cached one, and there is only one query to only one partition going on.&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;This post was written as a result of &lt;/EM&gt;&lt;A class="" href="http://www.sql.ru/forum/actualthread.aspx?tid=506477"&gt;&lt;EM&gt;discussion&lt;/EM&gt;&lt;/A&gt;&lt;EM&gt; on sql.ru forum&lt;/EM&gt;&lt;/P&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=4093" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/mosha/archive/tags/mdx/default.aspx">mdx</category><category domain="http://sqlblog.com/blogs/mosha/archive/tags/performance/default.aspx">performance</category></item><item><title>Optimizing Count(Filter(...)) expressions in MDX</title><link>http://sqlblog.com/blogs/mosha/archive/2007/11/22/optimizing-count-filter-expressions-in-mdx.aspx</link><pubDate>Thu, 22 Nov 2007 08:23:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:3480</guid><dc:creator>mosha</dc:creator><slash:comments>17</slash:comments><comments>http://sqlblog.com/blogs/mosha/comments/3480.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/mosha/commentrss.aspx?PostID=3480</wfw:commentRss><description>&lt;P&gt;As the readers of my blog know, bulk evaluation mode (called "block computation mode" in Katmai) delivers much better performance in MDX than the cell-by-cell evaluation mode. Therefore the most important optimization technique with MDX in Analysis Services is to rewrite MDX in such&amp;nbsp;a way that makes block computations possible. Easy to say, but not always easy to do. With the release of Katmai's CTP5, Microsoft published the &lt;A href="http://msdn2.microsoft.com/en-us/library/bb934106(SQL.100).aspx"&gt;BOL article&lt;/A&gt; outlining conditions when block computations are and are not possible. Chris Webb picked on this article and mentioned in his &lt;A href="http://cwebbbi.spaces.live.com/blog/cns!7B84B0F2C239489A!1502.entry"&gt;blog&lt;/A&gt;, that "&lt;EM&gt;the list of set functions is a bit limited (where is Filter?)&lt;/EM&gt;". In reply I said that I didn't think that Filter function was that common inside the MDX calculations, but &lt;A href="http://www.artisconsulting.com/Blogs/tabid/94/BlogId/3/Default.aspx"&gt;Greg Galloway&lt;/A&gt; immediatelly came up with a good example, one that involves Count(Filter(...)). This is indeed a common calculation - every time we want to know how many entities are there which satisfy certain condition. (Greg's example was to find out how many physicians performed 10 or more cases during certain time period.) &lt;/P&gt;
&lt;P&gt;Let's build an example using Adventure Works sample cube and see how we can optimize it. In Adventure Works terms, our task would be to find the number of products which had more than 5 orders placed over the Internet. Such calculation can be written as&lt;/P&gt;&lt;PRE&gt;Count(Filter([Product].[Product].[Product], [Measures].[Internet Order Quantity] &amp;gt; 5))&lt;/PRE&gt;
&lt;P&gt;Indeed, neither in AS2005 nor in AS2008 the Filter function is optimized to work in the block computation mode, therefore the query involving this calculation will execute in the cell by cell mode: &lt;PRE&gt;WITH MEMBER [Measures].[High Volume Products Count] AS &lt;BR&gt;&amp;nbsp;Count(Filter([Product].[Product].[Product], [Measures].[Internet Order Quantity] &amp;gt; 5))&lt;BR&gt;SELECT [Customer].[Customer Geography].[Country] ON 0&lt;BR&gt;, [Date].[Calendar].[Date].MEMBERS ON 1&lt;BR&gt;FROM [Adventure Works]&lt;BR&gt;WHERE [Measures].[High Volume Products Count]&lt;/PRE&gt;
&lt;P&gt;If we execute this query in &lt;A href="http://www.mosha.com/msolap/mdxstudio.htm"&gt;MDX Studio&lt;/A&gt;, we will get the following stats:&lt;/P&gt;&lt;PRE&gt;Time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; : 30 sec 781 ms&lt;BR&gt;Calc covers&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; : 4&lt;BR&gt;Cells calculated : 4217436&lt;BR&gt;Sonar subcubes&amp;nbsp;&amp;nbsp; : 2&lt;BR&gt;SE queries&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; : 1&lt;BR&gt;Cache hits&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; : 1&lt;BR&gt;Cache misses&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; : 1&lt;BR&gt;Cache inserts&amp;nbsp;&amp;nbsp;&amp;nbsp; : 1&lt;BR&gt;Cache lookups&amp;nbsp;&amp;nbsp;&amp;nbsp; : 2&lt;BR&gt;Memory Usage KB&amp;nbsp; : 4160 &lt;/PRE&gt;
&lt;P&gt;The best&amp;nbsp;hint&amp;nbsp;here that this query indeed executed in the cell-by-cell mode (beyong the slow execution time) is value of "Cells calculated" perfmon counter. Now,&amp;nbsp;the way most people approach optimization for&amp;nbsp;such MDX is trying to reduce the number of cells to iterate. One way to do it is to eliminate manually all the empty cells from Filter. I.e., if value of [Internet Order Quantity] is NULL for certain product, it is definitely less than 5. With this in mind, one possible rewrite for the query would be&lt;/P&gt;&lt;PRE&gt;WITH MEMBER [Measures].[High Volume Products Count] AS 
 Count(Filter(
  Exists([Product].[Product].[Product],,"Internet Sales")
  ,[Measures].[Internet Order Quantity] &amp;gt; 5))
SELECT [Customer].[Customer Geography].[Country] ON 0
, [Date].[Calendar].[Date].MEMBERS ON 1
FROM [Adventure Works]
WHERE [Measures].[High Volume Products Count]
&lt;/PRE&gt;
&lt;P&gt;Here Exists inside Filter eliminates the empty space. Let's run this again in MDX Studio and look at the new stats &lt;PRE&gt;Time             : 12 sec 46 ms
Calc covers      : 954
Cells calculated : 50036
Sonar subcubes   : 7900
SE queries       : 7899
Cache hits       : 7899
Cache misses     : 1
Cache inserts    : 1
Cache lookups    : 7900
Memory Usage KB  : 43936
&lt;/PRE&gt;
&lt;P&gt;The time indeed decreased, from 30 seconds to 12 seconds. This is good, but something doesn't add up. The number of cells went down from 4 million to only 50 thousands, i.e. about 85 times, yet the execution time decreased by only 2 times. The explanation is that putting Exists (or NonEmpty) inside MDX calculation is usually a bad idea - since now for every cell, there is an SE query being sent (it can be seen through "Query Subcube" event in trace too). In our case there were 7900 such queries sent. Because the set inside Exists was always the same and the context was fixed, only 1 such query actually had to go to disk, and other 7899 hit the cache, yet the overhead of issuing SE query is non-trivial, even when it is answered from the cache. 
&lt;P&gt;Conclusion is that even though we get some performance gain from this approach, it doesn't play well with the rest of the system, and only drives us further from the goal of switching to the superior block computation mode. We need to rewrite the calculation in such a way that we eliminate Filter. Fortunately, it is possible to do. Let's recall that Count function returns number of tuples in the set, and Filter return set of tuples which satisfy certain condition. I.e. we are counting how many tuples satisfy a condition. If we convert count to sum, and sum up 1's every time when the condition is met and 0's every time when the condition is not met, we will get the same result. I.e. &lt;PRE&gt;Count(Filter(set, condition)) = Sum(set, Iif(condition, 1, 0))&lt;/PRE&gt;
&lt;P&gt;Using this formula we can rewrite our calculation as following:&lt;/P&gt;&lt;PRE&gt;WITH MEMBER [Measures].[High Volume Products Count] AS 
 Sum(
   [Product].[Product].[Product], 
   Iif([Measures].[Internet Order Quantity] &amp;gt; 5,1,0))
SELECT [Customer].[Customer Geography].[Country] ON 0
, [Date].[Calendar].[Date].MEMBERS ON 1
FROM [Adventure Works]
WHERE [Measures].[High Volume Products Count]
&lt;/PRE&gt;
&lt;P&gt;Executing this query yields the following stats: &lt;PRE&gt;Time             : 6 sec 375 ms
Calc covers      : 4
Cells calculated : 6948
Sonar subcubes   : 1
SE queries       : 1
Cache hits       : 1
Cache misses     : 1
Cache inserts    : 1
Cache lookups    : 2
Memory Usage KB  : 0
&lt;/PRE&gt;
&lt;P&gt;This is definitely much better. The time is down to 6 seconds, and we can tell that within every cell the Sum works very efficiently, because we only see 6948 cell calculated (this is the exact number of cell in the resulting cellset). Yet, we are not in true block computation mode yet. We are still running Sum for every cell, instead of computing the entire query in one operation. What prevents us from doing it now ? Now the problem is with Iif function inside the Sum. I have written about &lt;A href="http://www.sqljunkies.com/WebLog/mosha/archive/2007/01/28/iif_performance.aspx"&gt;Iif function and its interaction with block mode&lt;/A&gt; in the past. If we reread that article, we will see that we are in the scenario where condition inside Iif looks at the cell values and not at the attribute coordinates, and the only thing we can do here is to have one of the branches to return NULL. And it probably makes sense to define our calculated member to return NULL instead of 0 when there are no products which satisfy our condition. After this rewrite we get &lt;PRE&gt;WITH MEMBER [Measures].[High Volume Products Count] AS 
 Sum(
   [Product].[Product].[Product], 
   Iif([Measures].[Internet Order Quantity] &amp;gt; 5,1,NULL))
SELECT [Customer].[Customer Geography].[Country] ON 0
, [Date].[Calendar].[Date].MEMBERS ON 1
FROM [Adventure Works]
WHERE [Measures].[High Volume Products Count]
&lt;/PRE&gt;
&lt;P&gt;This gets us down to 4 seconds, which is yet another improvement, but still doesn't yet reach the best execution plan. Now it is time to use the performance optimization hints. Looking at the expression Iif([Measures].[Internet Order Quantity] &amp;gt; 5,1,NULL) we observe that it is guaranteed to be NULL when [Measures].[Internet Order Quantity] is NULL. Therefore we can define the MDX script flavor of NON_EMPTY_BEHAVIOR for it. Since NEB cannot be defined on subexpressions, we will separate it into special calculated measure. Let's write the following fragment&amp;nbsp;inside MDX Script:&lt;/P&gt;&lt;PRE&gt;CREATE HIDDEN Summator;
[Measures].[Summator] = Iif([Measures].[Internet Order Quantity] &amp;gt; 5,1,NULL);
NON_EMPTY_BEHAVIOR([Measures].[Summator]) = [Measures].[Internet Order Quantity];
&lt;/PRE&gt;
&lt;P&gt;And then use it inside our calculation: &lt;PRE&gt;WITH 
MEMBER [Measures].[High Volume Products Count] AS 
 Sum([Product].[Product].[Product], [Measures].[Summator])
SELECT [Customer].[Customer Geography].[Country] ON 0
, [Date].[Calendar].[Date].MEMBERS ON 1
FROM [Adventure Works]
WHERE [Measures].[High Volume Products Count]
&lt;/PRE&gt;
&lt;P&gt;Finally, when we execute this query, it finishes in about 0.2 second (218 milliseconds). This is the performance we expect to observe from the block computation mode. From 32 seconds to 0.2 seconds - improvement of over 160 times ! 
&lt;P&gt;Now the even better news are, that starting with Katmai November CTP5 (which was released couple of days ago, so I finally can publicly speak about it), the trick with explicitly defining NON_EMPTY_BEHAVIOR is not required, the engine recognizes it itself. Therefore, even our previous query (the one which took 4 seconds in AS2005) returns in 0.2 seconds in AS2008. (This improvement in query optimizer in Katmai is just a tiny little tip of the iceberg around improvements in block computation query plans, and I intend to cover this subject in more depth in the upcoming blogs, if I will have enough time for that).&lt;/P&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=3480" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/mosha/archive/tags/katmai/default.aspx">katmai</category><category domain="http://sqlblog.com/blogs/mosha/archive/tags/mdx/default.aspx">mdx</category><category domain="http://sqlblog.com/blogs/mosha/archive/tags/performance/default.aspx">performance</category></item><item><title>Notes about Notes from Microsoft BI Conference</title><link>http://sqlblog.com/blogs/mosha/archive/2007/06/02/notes-about-notes-from-microsoft-bi-conference.aspx</link><pubDate>Sat, 02 Jun 2007 06:16:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:1388</guid><dc:creator>mosha</dc:creator><slash:comments>1</slash:comments><comments>http://sqlblog.com/blogs/mosha/comments/1388.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/mosha/commentrss.aspx?PostID=1388</wfw:commentRss><description>&lt;P&gt;I was reading the blog post "&lt;A href="http://www.ssas-info.com/VidasMatelisBlog/?p=20"&gt;My notes from Microsoft BI Conference&lt;/A&gt;" by Vidas Matelis. It is a&amp;nbsp;collection of notes recorded at different presentations, chalk talks etc. I felt that I agree with some of them, disagree with some, and have no opinion about some. I felt like it would be useful to write down my thoughts about these notes. Of course, when I don't agree with them - it doesn't mean that I am right - I am merely expressing my opinion on a matter, which is subjective.&amp;nbsp;I also apologize upfront if any of my comments sound critical - my intent is not to critique, but to open a discussion,&amp;nbsp;as an old saying goes "The truth is born in dispute". I will be citing the selected notes from Vidas's blog and then add my commentary.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Attribute relationship is important. I have heard that at least 10 times in different sessions.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;Indeed. If I were asked what is the single most important advice I could give to the cube designers - it would be to define correct attribute relationships in the dimensions. Note the accent on the word 'correct'. Lately, way too often I stumble upon cubes where people have heard how important attribute relationships are, so that they marked attributes related even if there was no true 1-to-many relationship between them. This could lead to disastrous results. If attributes are not 100% 1-to-many, the numbers in the cube could and will be wrong. They could even be non-deterministic ! Remember, it is very important to define relationships and convert hierarchies to be natural, but only if these relationships truly exist in the data.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;For MOLAP partitions no need to specify slice property, it is detected automatically.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;I don't agree with this statement. It is only true for the single member slices. I.e. if we have daily partitions, it is not necessary to tell Analysis Services what day the MOLAP partition was processed for, because it will detect it automatically. But if the slice consists of more than one member - Analysis Services might not pick it up with great precision. What AS will do is to detect a range of Data IDs per attribute. Since the user has no control over how Data IDs are assigned to members, if partition has slice of only two members, it may turn out that their Data IDs are first and the last in the attribute, and the range will cover the entire space.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;If your partition is at the day or week level, no aggregations&amp;nbsp;are possible for month level. Aggregations cannot cross partitions.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;It is true that aggregations cannot cross partitions. But, of course, it is not true that aggregations are not possible at higher levels. What this note probably meant to say was that it isn't worth to set aggregations at the level above the one where partition is sliced by a single member.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Each KPI formula is defined as hidden calculated measures, except when it is just reference to another measure. For better performance consider creating&amp;nbsp;real calculated measures with properly designed calculations and then use these measures in KPI definitions.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;Yes, KPIs may create hidden calculated measures, and not just for the KPI value, but for other properties as well - i.e. Goal, Trend etc. I have written about it here - &lt;A href="http://www.sqljunkies.com/WebLog/mosha/archive/2005/12/31/cube_init.aspx"&gt;www.sqljunkies.com/WebLog/mosha/archive/2005/12/31/cube_init.aspx&lt;/A&gt;. What is not clear from this note is how creating real calculated measures is going to help performance. The truth is - it won't help by itself. What this note probably meant was that if the calculated measures are created manually in the MDX Script, the cube designer has more control. For example, it is possible to specify performance related properties such as NON_EMPTY_BEHAVIOR, or non-performance properties such as FORMAT_STRING. It is possible to use this calculated measure inside SCOPE statement etc. So it gives flexibility both in functionality and in making performance optimizations.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;In multiple sessions I have heard talking about&amp;nbsp;”VisualTotals” problem. Basically best performance can be achieved for natural hierarchy totals - when children’s parent contains total value. When you use any&amp;nbsp;filters, roles security based on username, subselect statements, etc, visual totals are calculated and performance will be affected.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;There is some confusion going on inside this note. For example, what "roles security based on username" has to do with Visual Totals ? What's true is that Visual Totals have different forms - query based with subselects and "Default Visual Mode" connection string property, session based with named sets using VisualTotals function and CREATE SUBCUBE statement, there are Visual Totals which can be defined in the dimension security. But they are not inherently performance destroying. Different forms of Visual Totals have different semantics and different effect on performance and on caching. Putting blank statement that they are bad for performance just isn't right.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;In pre SP2 release you were able to increase YTD calculation performance by using fancy logic (Mosha blog example). After SP2 best performance can be achieved by using simple YTD() function.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;I want to clarify this. The comment about "Mosha blog example" probably refers to the following blog post: &lt;A href="http://www.sqljunkies.com/WebLog/mosha/archive/2006/11/17/rsum_performance.aspx"&gt;www.sqljunkies.com/WebLog/mosha/archive/2006/11/17/rsum_performance.aspx&lt;/A&gt;. But this is exactly where I describe what a great performance SP2 delivers by using simple YTD() function ! I never fancied computing running sum using fancy logic. I got confused with either Richard Tkachuk or Chris Webb, both of whom blogged about fancy ways to compute running sum. I intentionally don't link to their posts, because I believe that the simpler the MDX expression is - the better.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Excel 2007 uses subselects and that means visual totals. But you can run run excel in backward compatibility mode&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;More confusion about Visual Totals. Yes Excel 2007 uses subselects. Yes it means Visual Totals. But even when you run Excel in backward compatibility mode - you still going to get Visual Totals !!! Not through subselects, but through session property "Default Visual Mode". What this note probably meant to say, that when you run Excel in backward compatibility mode, you don't get subselects.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;64bit machines does not run faster than 32bit. Many older 64 bit machines actually run slower than 32bit. That was fixed recently. 64 bit machines just lets you use more memory.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;This is controversial statement. I am sure that in the scenarios that the presenter talked about this is true. However, there are also pieces of the engine code, which are explicitly optimized for 64bit, and&amp;nbsp;where the&amp;nbsp;performance of 64bit is better than 32bit, so it isn't just memory. But YMMV...&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;In real life don’t just do full process on cube. Plan how you can&amp;nbsp;improve processing. It is recommended that you&amp;nbsp;submit in separate transactions ProcessData and ProcessIndexes statements instead of ProcessFull.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;I wish there were more explanation behind this statement. Real life is complicated. Perhaps sometimes separating ProcessData and ProcessIndexes is the right thing, especially when the goal is to reduce processing window. But running queries on top of the data which doesn't have indexes could be disastrous for performance. So do the right thing for your setup. Don't just plan how you improve processing, plan how you improve overall system performance, and do what is more important in your scenario (for some people it is processing, for some it is queries, for some it is mix).&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Process partition clears cache for all partitions in affected measure group.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;This statement makes&amp;nbsp;it look like&amp;nbsp;there are separate caches per partition. In reality data caches are per measure group (for SE) and per cube (for FE).&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;If you are using crossjoin, put larger set always as first parameter of crossjoin&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;I was really puzzled by this one. There must be a justification for it, but I cannot think of it. However, I do agree that order of sets in crossjoin does matter a lot for performance, but it isn't the sizes of sets which are important in my opinion. It is complexity of MDX expressions behind them. The reason is very simple - CrossJoin function usually uses nested&amp;nbsp;loop join algorithm. Therefore the sets with more complex MDX expressions should be the outer sets, and the simpler ones should be inner sets. This can be illustrated on the example from Adventure Works. The first query, where complex Filter is the outer parameter of CrossJoin&amp;nbsp;returns instantaneously&lt;/P&gt;&lt;PRE&gt;WITH MEMBER Measures.Test AS 
Count(
 Crossjoin(
  Filter([Customer].[Customer Geography].[Customer], LEFT([Customer].[Customer Geography].CurrentMember.Name,3) = "Mar")
  ,[Promotion].[Promotions].[Promotion]
 )
)
SELECT Test ON 0
FROM [Adventure Works]
&lt;/PRE&gt;
&lt;P&gt;While changing the order of sets in CrossJoin and putting complex Filter as inner parameter like in the following query, executes for long 6 seconds:&lt;/P&gt;&lt;PRE&gt;WITH MEMBER Measures.Test AS 
Count(
 Crossjoin(
  [Promotion].[Promotions].[Promotion]
  ,Filter([Customer].[Customer Geography].[Customer], LEFT([Customer].[Customer Geography].CurrentMember.Name,3) = "Mar")
 )
)
SELECT Test ON 0
FROM [Adventure Works]
&lt;/PRE&gt;
&lt;UL&gt;
&lt;LI&gt;Exist function works on dimensions, not cubes, so they are faster&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;First, Exist&lt;STRONG&gt;s&lt;/STRONG&gt; function can work on either dimensions or measure groups. Second, the argument about "faster" is not clear to me. What is faster than what ?&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Fastest filter example: Exists(Customer.Members, Gender.[Male])&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;Here is even faster expression: Descendants( Gender_Customer_Hierarchy.Gender.[Male], Gender_Customer_Hierarchy.Customer, SELF ). I.e. if you have user natural hierarchy from Gender to Customer (it is natural because customer can have only one gender!), then using Descendants will always be better than Exists. Exists, however, is more universal, since it is usually used on top of attribute hierarchies and doesn't depend on the user hierarchies. But morale here is if you did define user hierarchies - don't be afraid to use them, since they will give boost in performance.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;If you have defined MeasureExpression, no aggregations are used.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;That is no aggregation are used to derive the results at higher granularities, but aggregations can be used if the granularity is the same. Measure expressions in this respect behave just like Distinct Count and Many to Many dimensions.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Kernel profiler could be used to show CPU usage per each Stored Procedure. This might help you find slowly performing Stored Procedures.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;Another best practice is to integrate stored procedures with Trace through Context.TraceEvent method. This will give a lot of insight into performance of stored procedure. More details are in my "&lt;A href="http://www.sqljunkies.com/WebLog/mosha/archive/2007/04/19/stored_procs_best_practices.aspx"&gt;Best practices for server ADOMD.NET stored procedures&lt;/A&gt;" blog.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Best performance can be achieved using&amp;nbsp;standard aggregation functions instead of rewriting them (Example LastChild). But some of them require Enterprise Edition of SQL Server.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;LastChild is not such a good example here, because performance differences between semiadditive measure LastChild and carefully written MDX are not that big. Better example is LastNonEmptyChild, where no matter how it is reimplemented in MDX, the built-in semiadditive measure will always be significantly better. But there is also a counterexample. It is possible to implement AverageOfChildren functionality in such a way that it will perform better than built-in semiadditive function. More details are in this blog post: &lt;A href="http://www.sqljunkies.com/WebLog/mosha/archive/2007/05/27/counting_days_mdx.aspx"&gt;www.sqljunkies.com/WebLog/mosha/archive/2007/05/27/counting_days_mdx.aspx&lt;/A&gt;.&amp;nbsp;(it is not pure MDX solution since it requires creation of auxiliary measure group, but still).&lt;/P&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=1388" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/mosha/archive/tags/mdx/default.aspx">mdx</category><category domain="http://sqlblog.com/blogs/mosha/archive/tags/performance/default.aspx">performance</category><category domain="http://sqlblog.com/blogs/mosha/archive/tags/ssas/default.aspx">ssas</category></item><item><title>Best practices for server ADOMD.NET stored procedures</title><link>http://sqlblog.com/blogs/mosha/archive/2007/04/19/best-practices-for-server-adomd-net-stored-procedures.aspx</link><pubDate>Thu, 19 Apr 2007 06:26:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:1158</guid><dc:creator>mosha</dc:creator><slash:comments>0</slash:comments><comments>http://sqlblog.com/blogs/mosha/comments/1158.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/mosha/commentrss.aspx?PostID=1158</wfw:commentRss><description>&lt;P&gt;In this article we will discuss some of the best practices around writing efficient Analysis Services stored procedures using server ADOMD.NET. These days there is plenty of information about writing efficient MDX - in books, whitepapers, blogs, forums etc. Just recently Microsoft released two documents - &lt;A href="http://sqlblog.com/controlpanel/blogs/download.microsoft.com/download/8/5/e/85eea4fa-b3bb-4426-97d0-7f7151b2011c/SSAS2005PerfGuide.doc"&gt;Performance Guide&lt;/A&gt; and &lt;A href="http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/olapdbpssas2005.mspx"&gt;Design Best Practices&lt;/A&gt; (and I have seen another one, still in the works, which goes deeper into MDX internals than anything else previously published). But there is no similar information or resources about how to write stored procedures, which are the essential extensibility mechanism in Analysis Services. There is &lt;A href="http://www.codeplex.com/Wiki/View.aspx?ProjectName=ASStoredProcedures"&gt;Analysis Services Stored Procedures&lt;/A&gt; open source project on CodePlex, which features plenty of source code for different stored procedures, but it is not easy for someone looking at it to extract best practices out of the code. This article's goal is to collect some of such best practices into single concise source.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Always traverse set with iterators and avoid accessing tuples by index or getting count of tuples&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;To demonstrate why this is important, let's pick a simple stored procedure which computes the sum of expression values over a set. It does exactly the same as built-in MDX function Sum, we will use it here because it is simple enough but illustrates the point well.&lt;/P&gt;
&lt;P&gt;We will implement this stored procedure using two methods - IterSum will use tuple iterator and IndexSum one will access tuples by index.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;        public decimal IterSum(Set InputSet, Expression Expr)
        {
            decimal sum = 0;
            foreach ( Tuple t in InputSet.Tuples )
            {
                sum += (decimal)Expr.Calculate(t);
            }

            return sum;
        }

        public decimal IndexSum(Set InputSet, Expression Expr)
        {
            decimal sum = 0;
            int c = InputSet.Tuples.Count;
            for (int i = 0; i &amp;lt; c; i++ )
            {
                Tuple t = InputSet.Tuples[i];
                sum += (decimal)Expr.Calculate(t);
            }

            return sum;
        }

&lt;/PRE&gt;
&lt;P&gt;Now, let's test the performance of these functions. We will start with the following simple queries:&lt;/P&gt;&lt;PRE&gt;with 
set QuerySet as [Customer].[Customer].[Customer].MEMBERS
member measures.SprocSum as ASSP.ASStoredProcs.SetOp.IndexSum(QuerySet, 1)
select SprocSum on 0
from [Adventure Works]

with 
set QuerySet as [Customer].[Customer].[Customer].MEMBERS
member measures.SprocSum as ASSP.ASStoredProcs.SetOp.IndexSum(QuerySet, 1)
select SprocSum on 0
from [Adventure Works]
&lt;/PRE&gt;
&lt;P&gt;We have chosen to use "1" as an expression in order not to spend time in retrieving data from the cube, but focusing purely on the performance of the stored procedure code (there is also another reason for this choice, explained later in the article). Both queries return immediately - just like everybody would expect. So far so good. Let's change queries a little bit and pass set directly instead of using named set:&lt;/P&gt;&lt;PRE&gt;with 
member measures.SprocSum as ASSP.ASStoredProcs.SetOp.IterSum([Customer].[Customer].[Customer].MEMBERS, 1)
select SprocSum on 0
from [Adventure Works]

with 
member measures.SprocSum as ASSP.ASStoredProcs.SetOp.IndexSum([Customer].[Customer].[Customer].MEMBERS, 1)
select SprocSum on 0
from [Adventure Works]
&lt;/PRE&gt;
&lt;P&gt;Now the first query still returns immediately, but the second one took 29 seconds ! Let's take it even further - we will add Filter(..., true) around the set, which pretty much does nothing semantically.&lt;/P&gt;&lt;PRE&gt;with 
member measures.SprocSum as ASSP.ASStoredProcs.SetOp.IterSum(Filter([Customer].[Customer].[Customer].MEMBERS, true), 1)
select SprocSum on 0
from [Adventure Works]

with 
member measures.SprocSum as ASSP.ASStoredProcs.SetOp.IndexSum(Filter([Customer].[Customer].[Customer].MEMBERS, true), 1)
select SprocSum on 0
from [Adventure Works]
&lt;/PRE&gt;
&lt;P&gt;The first query is still immediate, but the second one now raises to whooping 2 minutes 49 seconds !!!&lt;/P&gt;
&lt;P&gt;What is going on here ? The short explanation is that not all set are equal as far as their internal implementation in Analysis Services goes. Some of them have efficient enumerators while other have less efficient enumerators and yet others have completely inefficient enumerators ! However, all sets have pretty reasonable iterators. To get more detailed explanation about set architecture in AS, I recommend reading Irina Gorbach's chapter 28 from the "&lt;A href="http://www.amazon.com/dp/0672327821?tag=tser-20&amp;amp;camp=14573&amp;amp;creative=327641&amp;amp;linkCode=as1&amp;amp;creativeASIN=0672327821&amp;amp;adid=1RRKKGS4T6STX0CQRS7W&amp;amp;"&gt;Microsoft SQL Server 2005 Analysis Services&lt;/A&gt;" book, the explanation is at pages 513-514 in English edition, and in pages 577-579 in Russian edition (by the way, if you can get Russian edition - I recommend it over the English one - it is more complete, has deeper explanations and fixes most of the errors in the English edition).&lt;/P&gt;
&lt;P&gt;But the lesson is clear - always iterate over set tuples using "foreach ( Tuple t in myset.Tuples )" construct, and avoid both Tuples.Count and Tuples[index] constructs. (There is also another reason - there is a bug with how tuples are created from access by index in tuples collection, and this bug can lead to internal exceptions in many scenarios). But what if you need to access the tuples in random order and/or more than once ? This leads us to the next best practice&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Cache cell values if you need to access them more than once or if you need to get data for all tuples of the set in random order&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The example that we will chose here will be implementation of Order(set, expr, BDESC) function. Sorting algorithms require in average &lt;EM&gt;O(n*log(n))&lt;/EM&gt; accesses to array values for the array of &lt;EM&gt;n&lt;/EM&gt; elements, so clearly we will need to access tuple's data more than once. On top of it all sorting algorithms will touch data in pretty much random order related to their original positions. This is a classic scenario where caching makes sense. The code below demonstrates it.&lt;/P&gt;&lt;PRE&gt;01        private class TupleValue : System.IComparable
02        {
03            private Tuple _Tuple;
04            internal Tuple Tuple
05            {
06                get { return _Tuple; }
07                private set { _Tuple = value; }
08            }
09            private decimal _Value;
10
11            public TupleValue(Tuple t, decimal v)
12            {
13                _Tuple = t;
14                _Value = v;
15            }
16            
17            public int CompareTo(object obj)
18            {
19                if (obj is TupleValue)
20                {
21                    TupleValue tv = (TupleValue)obj;
22                    return tv._Value.CompareTo(_Value);
23                }
24                throw new System.ArgumentException("object is not a TupleValue");
25            }
26        }
27
28       public Set Order(Set InputSet, Expression SortExpression)
29       {
30           List TupleValues = new List();
31
32           int i = 0;
33           foreach (Tuple t in InputSet.Tuples)
34           {
35               TupleValues.Add(new TupleValue(t, (decimal)SortExpression.Calculate(t)));
36               i++;
37           }
38
39           int cTuples = i;
40
41           TupleValues.Sort();
42
43           SetBuilder sb = new SetBuilder();
44
45           for (i = 0; i &amp;lt; cTuples; i++)
46           {
47               sb.Add(TupleValues[i].Tuple);
48           }
49
50           return sb.ToSet();
51       }
&lt;/PRE&gt;
&lt;P&gt;It seems a bit long, but the interesting code is quite short. Line 33 iterates through the set using tuple iterator, and on line 35 populates the cache of TupleValues, where TupleValue is a pair of Tuple and value of expression evaluated at it. Then at line 41 we call .NET built-in Sort method which uses TupleValue's implementation of IComparable interface at lines 19 through 24. Let's check how well this code executes. We will compare our stored procedure with built-in MDX function Order. We will sort a reasonable big set - 166,356 tuples. In order to measure the time spent on the server and not Management Studio trying to render in UI all these tuples, we will wrap Order with Count to return a single value. &lt;PRE&gt;with 
member y as count(Order(
   [Customer].[Customer].[Customer].MEMBERS
  *[Product].[Category].[Category].MEMBERS
  *[Product].[Style].[Style].MEMBERS
, Measures.[Internet Sales Amount], BDESC))
select y on 0
from [Adventure Works]

with 
member y as count(ASSP.ASStoredProcs.SetOp.[Order](
   [Customer].[Customer].[Customer].MEMBERS
  *[Product].[Category].[Category].MEMBERS
  *[Product].[Style].[Style].MEMBERS
, Measures.[Internet Sales Amount]))
select y on 0
from [Adventure Works]
&lt;/PRE&gt;
&lt;P&gt;The results are somewhat surprising. The built-in MDX function Order finishes in 40 seconds. And our stored procedure which does the same thing finishes in only 14 seconds ! We were able to beat internals of AS engine with our stored procedure (the reason we were able to outperform built-in Order is because of performance bug with MDX's Order. Normally stored procedure should never perform better than corresponding MDX function). 
&lt;P&gt;Caching is definitely a widely used technique to improve performance, but there are certain caveats to it. The developer should watch the size of the cache not to overflow beyond system limits. In the above implementation we cache liberally both Tuple object and its value, and this can add up to quite a lot of memory over huge sets. There are, of course, techniques to deal with this issue, but we won't discuss them here, since these techniques are well described in the literature dedicated to .NET programming. 
&lt;P&gt;Now, this stored procedure, even though it was faster than built-in Order, still took a while to execute - 14 seconds. What if user wasn't willing to wait that long. How do we ensure that it can be canceled immediately. This leads us to the next best practice 
&lt;P&gt;&lt;STRONG&gt;Inject liberally "Context.CheckCancelled()" statement in the inner loops which do not call into server ADOMD.NET methods &lt;/STRONG&gt;
&lt;P&gt;Context.CheckCancelled() is a method which raises "Server: The operation has been canceled" exception if the user has issued command on the session. It is very cheap - so cheap, that it is practically free. It can be put inside every inner loop in the stored procedure. However, it doesn't buy anything, if that inner loop already calls some server ADOMD.NET method. For example, the loop at lines 33-37 calls into several ADOMD.NET methods - it calls Next on the tuple iterator and Calculate on the expression object. Since both of these methods check for cancellation anyway, additional call to CheckCancelled is redundant. However, inside the code of List.Sort method, there are no more calls into server ADOMD.NET, since the data is cached. We cannot modify the code of List.Sort, but we still get called back from there into TupleValue.CompareTo method at line 17 - so it is a great candidate to insert call to Context.CheckCancelled. 
&lt;P&gt;But how much time we spent inside Sorting vs. populating the cache with data ? The only way to answer it - is to profile our stored procedure, and this leads us to the next best practice 
&lt;P&gt;&lt;STRONG&gt;Integrate with AS Trace and SQL Profiler through the Context.TraceEvent method&lt;/STRONG&gt; 
&lt;P&gt;The best way to understand where the time is spent inside execution of MDX query is, of course, by inspecting various events in AS Trace through tool such as SQL Profiler. The server object model provides nice integration with trace through the Context.TraceEvent method. This method allows to inject "User defined event" into the trace, and provides for custom event subclass, integer and string data. We can write to trace before and after our two main loops in order to find out how much time we spend in each. So the code with both CheckCancelled and TraceEvents added will look like below: &lt;PRE&gt;01        private class TupleValue : System.IComparable
02        {
03            private Tuple _Tuple;
04            internal Tuple Tuple
05            {
06                get { return _Tuple; }
07                private set { _Tuple = value; }
08            }
09            private decimal _Value;
10
11            public TupleValue(Tuple t, decimal v)
12            {
13                _Tuple = t;
14                _Value = v;
15            }
16            
17            public int CompareTo(object obj)
18            {
19                if (obj is TupleValue)
20                {
21                    Context.CheckCancelled();
22                    TupleValue tv = (TupleValue)obj;
23                    return tv._Value.CompareTo(_Value);
24                }
25                throw new System.ArgumentException("object is not a TupleValue");
26            }
27        }
28
29        public Set Order(Set InputSet, Expression SortExpression)
30        {
31            List&amp;lt;tuplevalue&amp;gt; TupleValues = new List&amp;lt;tuplevalue&amp;gt;();
32
33            Context.TraceEvent(100, 0, "Start getting data");
34
35            int i = 0;
36            foreach (Tuple t in InputSet.Tuples)
37            {
38                TupleValues.Add(new TupleValue(t, (decimal)SortExpression.Calculate(t)));
39                i++;
40            }
41
42            int cTuples = i;
43
44            Context.TraceEvent(100, cTuples, "Finish getting data for " + cTuples.ToString() + " tuples");
45
46            Context.TraceEvent(200, 0, "Start sorting");
47            TupleValues.Sort();
48            Context.TraceEvent(200, 0, "Finish sorting");
49
40            SetBuilder sb = new SetBuilder();
51
52            for (i = 0; i &amp;lt; cTuples; i++)
53            {
54                sb.Add(TupleValues[i].Tuple);
55            }
56
57            return sb.ToSet();
58        }
&lt;/PRE&gt;
&lt;P&gt;At line 21 we added CheckCancelled call, at lines 33 and 44 calls to TraceEvent to track start and finish of populating the cache - writing the number of tuples that went into cache, and at lines 46 and 48 - calls to TraceEvent to track time spent in sorting. The results can be seen in the following screenshot: 
&lt;P&gt;&lt;IMG src="http://www.mosha.com/msolap/images/sproc_trace.PNG"&gt;&lt;/P&gt;
&lt;P&gt;The important column here is CurrentTime - it is the one which allows us to calculate how much time was spent where. As expected - all the time was spent getting the data and populating the cache, and sort itself was instantaneous. Tracing is a great way to get insight into execution, but for performance reasons it should be kept out of the inner loops, and perhaps some of it even disabled in the release version.&lt;/P&gt;
&lt;P&gt;Well, we are down to the last best practice, which is&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Use the appropriate ImpersonatioInfo on the assembly for right balance between security and performance&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;ImpersonationInfo property of the assembly tells AS what principal should be impersonated when the stored procedure code is executed. In other words it determines the content of the thread token during stored procedure execution. The security implications of ImpersonationInfo are well understood. But what about performance implications - are there any ? In fact there are. When stored procedure calls into server ADOMD.NET method, the real server code gets execution. This code may read from files, create new threads and do all kinds of things that server needs to do in order to get data or perform the requested operation. Every time server codes interacts with OS, the OS checks the current thread token to determine the access rights. If the current thread token would remain the one under which stored procedure was executing - there is high likelihood that OS related operations will fail, because the user that was impersonated probably don't have permissions to read server data files etc. Therefore, every time there is a call from stored procedure to ADOMD.NET, it first must revert the currently impersonated thread token to the server, and before returning control back to the stored procedure - set it back. These transitions do take their time, although differences in time vary between different impersonation modes, system and network configurations, security settings etc. It is safe to say though, that the only impersonation mode which is free of such transitions is ImpersonateServiceAccount, since it essentially tells AS to use the same thread token both inside server code and inside stored procedure code. While from security point of view ImpersonateServiceAccount seems to be least safe, it must be considered in combination with assembly's PermissionSet. When PermissionSet is set to Safe, the ImpersonationInfo is not very important, because PermissionSet=Safe guarantees that the stored procedure cannot go outside of server's sandbox anyway - it cannot touch file system or network by itself, it cannot interact with OS etc. So unless stored procedure requires more broad PermissionSet, it is usually OK to use ImpersonationInfo=ImpersonateServiceAccount. Certainly this is true with all the examples in this article. But, of course, each case should be considered separately in order to ensure secure environment.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=1158" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/mosha/archive/tags/ADOMD.NET/default.aspx">ADOMD.NET</category><category domain="http://sqlblog.com/blogs/mosha/archive/tags/mdx/default.aspx">mdx</category><category domain="http://sqlblog.com/blogs/mosha/archive/tags/performance/default.aspx">performance</category><category domain="http://sqlblog.com/blogs/mosha/archive/tags/stored+procedure/default.aspx">stored procedure</category></item><item><title>Averages, ratios, division by zero and NON_EMPTY_BEHAVIOR</title><link>http://sqlblog.com/blogs/mosha/archive/2007/03/06/averages-ratios-division-by-zero-and-non-empty-behavior.aspx</link><pubDate>Wed, 07 Mar 2007 02:58:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:924</guid><dc:creator>mosha</dc:creator><slash:comments>0</slash:comments><comments>http://sqlblog.com/blogs/mosha/comments/924.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/mosha/commentrss.aspx?PostID=924</wfw:commentRss><description>&lt;P&gt;How to get averages and other ratio based MDX calculations demonstrate great performance ? I thought I covered this subject in great depth before, but I probably wasn't concise enough. Last week I was asked to help with performance to very big and important customer (who shall remain unnamed). When I arrived on site, I found the project staff to be very knowledgeable about Analysis Services and MDX. The&amp;nbsp;model was designed following all the best practices, they have studied the &lt;A href="http://download.microsoft.com/download/8/5/e/85eea4fa-b3bb-4426-97d0-7f7151b2011c/SSAS2005PerfGuide.doc"&gt;Performance Guide&lt;/A&gt; and implemented all the relevant changes. Yet they couldn't get a very simple MDX calculation of average to perform well in medium size Excel reports. They also studied all of my earlier blog on this subject ("&lt;A href="http://www.sqljunkies.com/WebLog/mosha/archive/2005/06/30/mdx_is_empty.aspx"&gt;How to check if cell is empty&lt;/A&gt;",&amp;nbsp;"&lt;A href="http://www.sqljunkies.com/WebLog/mosha/archive/2006/11/05/non_empty_behavior.aspx"&gt;Budget Variance and NEB&lt;/A&gt;"&amp;nbsp;and&amp;nbsp;"&lt;A href="http://sqlblog.com/blogs/mosha/archive/2007/01/28/performance-of-iif-function-in-mdx.aspx"&gt;Performance of IIF&lt;/A&gt;"&amp;nbsp;blogs), yet it wasn't working for them. We have found a solution which satisfied both the business needs and delivered great performance, so I thought I should share it, since it is such a common scenario.&lt;/P&gt;
&lt;P&gt;The problem applies to pretty much any kind of ratio, but we will pick the example of average to make it more specific. Let's suppose, that we keep in the data warehouse two measures about the product sales - SalesAmount and SalesCount. Now we want to compute the SalesAverage, so the most straightforward way to do it in MDX Script would be:&lt;/P&gt;&lt;PRE&gt;Create SalesAverage = Measures.SalesAmount / Measures.SalesCount;&lt;/PRE&gt;
&lt;P&gt;The careful designer, however, would&amp;nbsp;be concerned about division by zero, so following all the best practices the more robust expression would seem to be:&lt;/P&gt;&lt;PRE&gt;Create SalesAverage = IIF( Measures.SalesCount = 0, NULL, Measures.SalesAmount / Measures.SalesCount );&lt;/PRE&gt;
&lt;P&gt;Everything seems to be fine so far. But if we will put the SalesAverage measure into the report which crossjoins several attributes using NON EMPTY, we will find out that even though the result of the report is small, it still takes too long. This is, of course, a well known situation, and NON_EMPTY_BEHAVIOR performance hint is the usual way of dealing with it. So everybody's first impulse would be to define NON_EMPTY_BEHAVIOR&amp;nbsp;either to&amp;nbsp;Measures.SalesAmount or to Measures.SalesCount. But is it correct ? &lt;/P&gt;
&lt;P&gt;&lt;I&gt;Before reading any further, it is strongly advised that the reader familiarizes himself with the section of &lt;A href="http://download.microsoft.com/download/8/5/e/85eea4fa-b3bb-4426-97d0-7f7151b2011c/SSAS2005PerfGuide.doc"&gt;Performance Guide&lt;/A&gt; which explains NON_EMPTY_BEHAVIOR taxonomy. &lt;/I&gt;&lt;/P&gt;
&lt;P&gt;The answer to this question is "it depends". It depends on how SalesCount was built and what it was built on.&amp;nbsp; But first let's review what would happen&amp;nbsp;when SalesCount has a value of 0. Then according to our expression, the SalesAverage is NULL. This violates the contract of the second taxonomy of NON_EMPTY_BEHAVIOR which states that calculated measure is NULL &lt;B&gt;if and only if&lt;/B&gt; the measure defined as NON_EMPTY_BEHAVIOR is NULL. Here we violate the "and only if" part of this contract. Indeed, we got SalesAverage to be NULL while the base measure is not NULL - it is 0, which is different from NULL. Trying to force NON_EMPTY_BEHAVIOR here would result in undeterministic wrong results! The irony is that the&amp;nbsp;other taxonomy of NON_EMPTY_BEHAVIOR is not as restrictive and only requires calculation result to be NULL if the expression of NON_EMPTY_BEHAVIOR is NULL. It is certainly my wish and desire that both taxonomies get unified and both&amp;nbsp;not require the "and only if" part. Since this is not a very difficult change, I hope that this improvement will be implemented in the next version of Analysis Services. I would definitely vote for this feature on connect. Anyway, getting&amp;nbsp;back to our problem that we have to solve with the current version. If SalesCount&amp;nbsp;can never be 0, but only NULL, then we do not have a problem. If the SalesCount was defined as a measure with AggregationFunction=Count, counting transactions in the fact table - it&amp;nbsp;seems to be&amp;nbsp;safe. Even if it was defined as AggregationFunction=Sum on top of column in the fact table which never gets value of 0, then it is safe again. Unfortunately, in our case, the SalesCount measure could have values of 0 in the fact table. &lt;/P&gt;
&lt;P&gt;One proposed solution to resolve this was to rewrite the MDX as following&lt;/P&gt;&lt;PRE&gt;Create SalesAverage = IIF( NOT IsEmpty(Measures.SalesCount) AND Measures.SalesCount = 0, 
  0, 
  Measures.SalesAmount / Measures.SalesCount );&lt;/PRE&gt;
&lt;P&gt;This trick guarantees, that when SalesCount is equal to 0 (but not NULL), the ratio translates into 0, therefore it is safe to define NON_EMPTY_BEHAVIOR = Measures.SalesCount. However, the business people didn't accept such modification. They wanted the averages to be NULL and filtered out by NON EMPTY whenever SalesCount was either 0 or NULL. &lt;/P&gt;
&lt;P&gt;Given that requirement, I proposed another solution. &lt;/P&gt;
&lt;P&gt;1. Go to DSV, and insert the named calculation which converted 0's in the fact table for SalesCount column to NULL&lt;/P&gt;
&lt;P&gt;2. Changed the source of the SalesCount measure to the named calculation created as step 1&lt;/P&gt;
&lt;P&gt;3. Changed SalesCount's NullProcessing property to 'Preserve' value&lt;/P&gt;
&lt;P&gt;Making these changes ensured that SalesCount measure will never be 0 in the cube, since all the 0's were translated to NULLs and measure was set up to preserve these NULLs. Now finally it was OK to define NON_EMPTY_BEHAVIOR to be Measures.SalesCount.&lt;/P&gt;
&lt;P&gt;But this is not the end of it. Now we notice that since SalesCount is never zero, we will never get division by zero. SalesCount still can be NULL though, and so we still have to deal with division by NULL. But while NULL behaves similarly to zero in arithmetic computations, they are not identical. The most important difference is that 0/0 results in undefined value #1.IND, but NULL/NULL results in NULL. Since SalesAmount and SalesCount come from the same measure group, when the transaction is missing from the fact table - both measures are NULL together, so we are in the NULL/NULL case. This leaves only scenarios when SalesCount is 0 in the fact table. But when the SalesCount is 0, the SalesAmount is 0 as well, since we can only get zero money if we sold zero items. Therefore doing the same trick in DSV named calculation and NullProcessing=Preserve, we get SalesAmount to be NULL always when SalesCount is NULL.&lt;/P&gt;
&lt;P&gt;Therefore we don't even need to check for the division by zero, since it is always safe. The final MDX Script would be:&lt;/P&gt;&lt;PRE&gt;CREATE MEMBER CurrentCube.Measures.SalesAverage AS 
  Measures.SalesAmount / Measures.SalesCount
  , NON_EMPTY_BEHAVIOR = Measures.SalesCount;&lt;/PRE&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=924" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/mosha/archive/tags/mdx/default.aspx">mdx</category><category domain="http://sqlblog.com/blogs/mosha/archive/tags/performance/default.aspx">performance</category></item><item><title>Performance of IIF function in MDX</title><link>http://sqlblog.com/blogs/mosha/archive/2007/01/28/performance-of-iif-function-in-mdx.aspx</link><pubDate>Mon, 29 Jan 2007 05:20:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:7362</guid><dc:creator>mosha</dc:creator><slash:comments>3</slash:comments><comments>http://sqlblog.com/blogs/mosha/comments/7362.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/mosha/commentrss.aspx?PostID=7362</wfw:commentRss><description>&lt;p&gt;IIF is one of the most popular MDX functions, (this was proved by &lt;a href="http://sqlblog.com/WebLog/mosha/archive/2006/11/29/popular_mdx_functions_as2005.aspx"&gt;these surveys&lt;/a&gt;). Yet, it can cause significant performance degradation, which is often blamed on other parts of the system. Many times it is simple to rewrite the MDX expression to get rid of IIF altogether, and other times it is possible to slightly change the IIF to increase performance. While I and others mentioned this fact several times in blog entries, forum postings etc, people keep using and using it (and using it wrong). What is worse, after seeing CASE operator in samples and BOL, people start using CASE instead of IIF. From the performance point of view, CASE is always worse than IIF, at least IIF is possible to optimize in some case, but never CASE. So I decided to dedicate special blog to go over IIF usage and demonstrate with examples how horrible it can be and how to fix it.&lt;/p&gt;  &lt;p&gt;To make it easier to demonstrate this on standard Adventure Works cube, the examples are somewhat artificial, but it is not hard to imagine how real-life example would fit into the same pattern.&lt;/p&gt;  &lt;p&gt;Let's assume that we want to compute 'Normalized Cost' measure, which is the same as Standard Product Cost measure, except for when there is no promotion, i.e. when Promotion Type is No Discount, it should be Freight Cost plus Standard Product Cost. Using the IIF function, the solution would look like following:&lt;/p&gt;  &lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;CREATE &lt;/span&gt;[Normalized Cost] = 
  &lt;span style="color:maroon;"&gt;Iif&lt;/span&gt;( [Promotion].[Promotion Type].&lt;span style="color:maroon;"&gt;CurrentMember &lt;/span&gt;&lt;span style="color:blue;"&gt;IS &lt;/span&gt;[Promotion].[Promotion Type].&amp;amp;[No Discount]
      ,[Measures].[Internet Standard Product Cost]+[Measures].[Internet Freight Cost]
      ,[Measures].[Internet Standard Product Cost]
  );&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;Let's test performance of this approach. To make the performance measurements non-trivial, we will also introduce into the query YTD-like calculation. The query that we are measuring will look the following then:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH 
MEMBER &lt;/span&gt;Measures.[Normalized Cost] &lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;span style="color:maroon;"&gt;Iif&lt;/span&gt;(
  [Promotion].[Promotion Type].&lt;span style="color:maroon;"&gt;CurrentMember &lt;/span&gt;&lt;span style="color:blue;"&gt;IS &lt;/span&gt;[Promotion].[Promotion Type].&amp;amp;[No Discount]
  ,[Measures].[Internet Standard Product Cost]+[Measures].[Internet Freight Cost]
  ,[Measures].[Internet Standard Product Cost])
&lt;span style="color:blue;"&gt;MEMBER &lt;/span&gt;[Ship Date].[Date].RSum &lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;span style="color:maroon;"&gt;Sum&lt;/span&gt;([Ship Date].[Date].[Date].&lt;span style="color:blue;"&gt;MEMBERS&lt;/span&gt;), &lt;span style="color:blue;"&gt;SOLVE_ORDER&lt;/span&gt;=10
&lt;span style="color:blue;"&gt;SELECT 
 &lt;/span&gt;[Promotion].[Promotion Type].[Promotion Type].&lt;span style="color:blue;"&gt;MEMBERS on &lt;/span&gt;0
 ,[Product].[Subcategory].[Subcategory].&lt;span style="color:blue;"&gt;MEMBERS&lt;/span&gt;*[Customer].[State-Province].[State-Province].&lt;span style="color:blue;"&gt;MEMBERS ON &lt;/span&gt;1
&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;[Adventure Works]
&lt;span style="color:blue;"&gt;WHERE &lt;/span&gt;([Ship Date].[Date].RSum, Measures.[Normalized Cost])&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;On my laptop, this query executed for 36 seconds. (Note, that we have included Promotion Type attribute on axis, in order for both formulas to participate in the query, and make it more difficult for query optimizer, although it actually doesn't matter here. The SOLVE_ORDER=10 is needed in order to prevent query optimizer to perform another optimization related to running sum which will mask inefficiencies of IIF. After all, we want to see the case where optimizer's tricks don't work, and IIF performs badly).&lt;/p&gt;

&lt;p&gt;Now, we will use the fact, that if condition of IIF only looks at member coordinates and not at the cell values, it almost always can be rewritten using one or more SCOPE statements. In this simple, but common, case, single assignment is sufficient:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;CREATE &lt;/span&gt;[Normalized Cost] = [Measures].[Internet Standard Product Cost];
([Promotion].[Promotion Type].&amp;amp;[No Discount], Measures.[Normalized Cost] =
  [Measures].[Internet Standard Product Cost]+[Measures].[Internet Freight Cost];&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;This is the proper way to put it inside MDX Script, but since the previous query was written using query calculated members, and there is no syntax for SCOPE or assignment statements inside query, we can for testing purposes translate SCOPE into WITH CELL CALCULATION clause. Our query will look then the following:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH 
MEMBER &lt;/span&gt;Measures.[Normalized Cost] &lt;span style="color:blue;"&gt;AS &lt;/span&gt;[Measures].[Internet Standard Product Cost]
&lt;span style="color:blue;"&gt;CELL CALCULATION &lt;/span&gt;ScopeEmulator 
  &lt;span style="color:blue;"&gt;FOR &lt;/span&gt;'([Promotion].[Promotion Type].&amp;amp;[No Discount],measures.[Normalized Cost])' 
  &lt;span style="color:blue;"&gt;AS &lt;/span&gt;[Measures].[Internet Freight Cost]+[Measures].[Internet Standard Product Cost]
&lt;span style="color:blue;"&gt;MEMBER &lt;/span&gt;[Ship Date].[Date].RSum &lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;span style="color:maroon;"&gt;Sum&lt;/span&gt;([Ship Date].[Date].[Date].&lt;span style="color:blue;"&gt;MEMBERS&lt;/span&gt;), &lt;span style="color:blue;"&gt;SOLVE_ORDER&lt;/span&gt;=10
&lt;span style="color:blue;"&gt;SELECT 
 &lt;/span&gt;[Promotion].[Promotion Type].[Promotion Type].&lt;span style="color:blue;"&gt;MEMBERS on &lt;/span&gt;0
 ,[Product].[Subcategory].[Subcategory].&lt;span style="color:blue;"&gt;MEMBERS&lt;/span&gt;*[Customer].[State-Province].[State-Province].&lt;span style="color:blue;"&gt;MEMBERS ON &lt;/span&gt;1
&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;[Adventure Works]
&lt;span style="color:blue;"&gt;WHERE &lt;/span&gt;([Ship Date].[Date].RSum, Measures.[Normalized Cost])&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;This query is equivalent to the first one, but executes in 0.36 seconds - exactly 100 times faster ! The secret of why the second query is much faster is simple. Since SCOPE statement defines static subcubes on which formulas apply, the query optimizer is able to divide the space described by the query into two parts - one part where one formula applies, and another part, sliced by No Discount, where the second formula applies. Then query optimizer is able to apply bulk evaluation to both subspaces, and the results are computed lighting fast. (for more information about bulk evaluation, read &lt;a href="http://sqlblog.com/WebLog/mosha/archive/2006/11/05/non_empty_behavior.aspx"&gt;this article&lt;/a&gt;). With IIF, no such division is possible, since query optimizer is not smart enough to understand that the condition inside IIF is static one. Therefore, query optimizer is afraid to choose the bulk evaluation plan, because it is afraid to compute too many unneeded cells. As we know, computing results cell by cell is much slower than doing computing all of them in bulk. Now, there are posts floating around newsgroups and forums about using magic &amp;quot;Cache Policy=9&amp;quot; in the connection string. Indeed, if it is used the first query completes under a second. But this is very dangerous path on step on. Cache Policy=9 forces query optimizer to choose bulk evaluation plan always and unconditionally. While in many cases (including this one), it is the best query plan, there are lots of cases when doing bulk evaluation is slower and less scalable. It is best to leave such decisions to the optimizer which one of dozens built-in execution plans should be applied to which subspace and which subexpression. Also, Cache Policy=9 is documented in BOL as reserved for future use, which means it is not officially supported, even though information about it leaked into the forums. It is only supported if it is recommended by the product support professional. It is our hope, that in the next version, query optimizer will become even more intelligent, and will be able to pick better execution plans automatically.&lt;/p&gt;

&lt;p&gt;Now, as we mentioned above, IIF can only be replaced with SCOPE, if the condition considers only attribute coordinates, and not the cell values. So what if the condition needs to look at cell values ? While there is no general technique to battle with it, there are still some simple tips which can be applied. Let's consider the classic IIF of check for division by zero. I have &lt;a href="http://sqlblog.com/WebLog/mosha/archive/2005/06/30/mdx_is_empty.aspx"&gt;written on this subject before&lt;/a&gt;, but it was focused more on correctness aspects rather than on performance aspects.&lt;/p&gt;

&lt;p&gt;Let's say we want to compute ratio of sales to cost. &lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;CREATE &lt;/span&gt;Ratio = [Measures].[Internet Standard Product Cost]/[Measures].[Internet Sales Amount];&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;But we are concerned that when Internet Sales Amount is 0, we might get division by zero. In reality, in Adventure Works, Internet Sales Amount is never 0 inside fact table. It can be NULL, but since Internet Standard Product Cost is a measure in the same measure group, and neither one of them is nullable, then we know that whenever Internet Sales Amount is NULL, then Internet Standard Product Cost is NULL as well, and NULL/NULL is defined as NULL in MDX. So in our example there is actually no need to check anything in the first place - because there never will be division by zero ! To test performance of simple division, let's run the following MDX query. It doesn't make much sense to sum up ratios, but it is an easy way to force computation of division over huge set.&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH 
MEMBER &lt;/span&gt;Measures.Ratio &lt;span style="color:blue;"&gt;AS &lt;/span&gt;[Measures].[Internet Standard Product Cost]/[Measures].[Internet Sales Amount]
&lt;span style="color:blue;"&gt;MEMBER &lt;/span&gt;Measures.TestPerf &lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;span style="color:maroon;"&gt;SUM&lt;/span&gt;(
 ([Product].[Subcategory].[Subcategory].&lt;span style="color:blue;"&gt;MEMBERS
 &lt;/span&gt;,[Customer].[State-Province].[State-Province].&lt;span style="color:blue;"&gt;MEMBERS
 &lt;/span&gt;,[Customer].[Customer].[Customer].&lt;span style="color:blue;"&gt;MEMBERS
 &lt;/span&gt;,[Promotion].[Promotion Type].[Promotion Type].&lt;span style="color:blue;"&gt;MEMBERS
 &lt;/span&gt;,[Ship Date].[Date].[Date].&lt;span style="color:blue;"&gt;MEMBERS&lt;/span&gt;), Measures.Ratio)
&lt;span style="color:blue;"&gt;SELECT &lt;/span&gt;TestPerf &lt;span style="color:blue;"&gt;ON &lt;/span&gt;0
&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;[Adventure Works]&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;This query executed in only 0.1 second, and it is clear that it is because the bulk evaluation query plan was applied. Now, let's say that it was possible for Internet Sales Amount to become 0 for whatever reason (for example, 0's could happen in the fact table). Then, we would need to guard against it. But what should we return in case it could be division by zero ? This really depends on the business requirements now. Sometimes, the requirement is to return &amp;quot;NA&amp;quot; string. So the expression gets coded as following:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;CREATE &lt;/span&gt;Ratio = 
  &lt;span style="color:maroon;"&gt;IIF&lt;/span&gt;([Measures].[Internet Sales Amount]=0
  ,&amp;quot;NA&amp;quot;
  ,[Measures].[Internet Standard Product Cost]/[Measures].[Internet Sales Amount]);&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;Unfortunately, this is a poor choice both from performance and from functionality point of view. If any MDX expression will ever use Ratio, it runs at risk of performing arithmetic operations between numbers and string &amp;quot;NA&amp;quot;, which in best case will trigger error &amp;quot;Invalid data types used in the operation&amp;quot;, and in worst case may lead to the wrong results. Performance of such expression can degrade significantly. I won't even dare to run the original query here, but even running over reduced set&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH 
MEMBER &lt;/span&gt;Measures.Ratio &lt;span style="color:blue;"&gt;AS 
  &lt;/span&gt;&lt;span style="color:maroon;"&gt;IIF&lt;/span&gt;([Measures].[Internet Sales Amount]=0
  ,&amp;quot;NA&amp;quot;
  ,[Measures].[Internet Standard Product Cost]/[Measures].[Internet Sales Amount])
&lt;span style="color:blue;"&gt;MEMBER &lt;/span&gt;Measures.TestPerf &lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;span style="color:maroon;"&gt;SUM&lt;/span&gt;(
  ([Product].[Subcategory].[Subcategory].&lt;span style="color:blue;"&gt;MEMBERS
  &lt;/span&gt;,[Customer].[State-Province].[State-Province].&lt;span style="color:blue;"&gt;MEMBERS
  &lt;/span&gt;,[Promotion].[Promotion Type].[Promotion Type].&lt;span style="color:blue;"&gt;MEMBERS
  &lt;/span&gt;,[Ship Date].[Date].[Date].&lt;span style="color:blue;"&gt;MEMBERS&lt;/span&gt;), Measures.Ratio)
&lt;span style="color:blue;"&gt;SELECT &lt;/span&gt;TestPerf &lt;span style="color:blue;"&gt;ON &lt;/span&gt;0
&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;[Adventure Works]&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;it still finishes only after 64 seconds (and the result is not good - it is empty, because of mixed operations between strings and numbers). So since some of the performance can be explained by operations on the strings, how about using 0 instead of &amp;quot;NA&amp;quot; ?&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH 
MEMBER &lt;/span&gt;Measures.Ratio &lt;span style="color:blue;"&gt;AS 
  &lt;/span&gt;&lt;span style="color:maroon;"&gt;IIF&lt;/span&gt;([Measures].[Internet Sales Amount]=0
  ,0
  ,[Measures].[Internet Standard Product Cost]/[Measures].[Internet Sales Amount])
&lt;span style="color:blue;"&gt;MEMBER &lt;/span&gt;Measures.TestPerf &lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;span style="color:maroon;"&gt;SUM&lt;/span&gt;(
  ([Product].[Subcategory].[Subcategory].&lt;span style="color:blue;"&gt;MEMBERS
  &lt;/span&gt;,[Customer].[State-Province].[State-Province].&lt;span style="color:blue;"&gt;MEMBERS
  &lt;/span&gt;,[Promotion].[Promotion Type].[Promotion Type].&lt;span style="color:blue;"&gt;MEMBERS
  &lt;/span&gt;,[Ship Date].[Date].[Date].&lt;span style="color:blue;"&gt;MEMBERS&lt;/span&gt;), Measures.Ratio)
&lt;span style="color:blue;"&gt;SELECT &lt;/span&gt;TestPerf &lt;span style="color:blue;"&gt;ON &lt;/span&gt;0
&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;[Adventure Works]&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;This finishes in 35 seconds, which is a little better, but still not good. And if we will use the original query &lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH 
MEMBER &lt;/span&gt;Measures.Ratio &lt;span style="color:blue;"&gt;AS 
  &lt;/span&gt;&lt;span style="color:maroon;"&gt;IIF&lt;/span&gt;([Measures].[Internet Sales Amount]=0
  ,0
  ,[Measures].[Internet Standard Product Cost]/[Measures].[Internet Sales Amount])
&lt;span style="color:blue;"&gt;MEMBER &lt;/span&gt;Measures.TestPerf &lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;span style="color:maroon;"&gt;SUM&lt;/span&gt;(
  ([Product].[Subcategory].[Subcategory].&lt;span style="color:blue;"&gt;MEMBERS
  &lt;/span&gt;,[Customer].[State-Province].[State-Province].&lt;span style="color:blue;"&gt;MEMBERS
  &lt;/span&gt;,[Customer].[Customer].[Customer].&lt;span style="color:blue;"&gt;MEMBERS
  &lt;/span&gt;,[Promotion].[Promotion Type].[Promotion Type].&lt;span style="color:blue;"&gt;MEMBERS
  &lt;/span&gt;,[Ship Date].[Date].[Date].&lt;span style="color:blue;"&gt;MEMBERS&lt;/span&gt;), Measures.Ratio)
&lt;span style="color:blue;"&gt;SELECT &lt;/span&gt;TestPerf &lt;span style="color:blue;"&gt;ON &lt;/span&gt;0
&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;[Adventure Works]&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;We will get the famous error &lt;em&gt;#Error The expression contains a function that cannot operate on a set with more than 4,294,967,296 tuples.&lt;/em&gt; 

  &lt;br /&gt;This error happens when some operation in MDX requires iteration over set which has more than 4GB tuples. Usually this occurs in the scenario, where &amp;quot;bad&amp;quot; calculation needs to be evaluated over huge set. We notice, that the crossjoin inside SUM is indeed huge - it covers more than 4GB tuples. And because our calculated member uses IIF, the bulk evaluation mode is not applied, and instead query processor has to iterate cell by cell over the entire set. There are more than 4GB tuples to iterate, therefore iteration fails. Fortunately, there is a simple solution for this case. Remember the discussion about sparse and dense spaces that we had in &lt;a href="http://sqlblog.com/WebLog/mosha/archive/2005/12/06/multiplication_perf.aspx"&gt;this article&lt;/a&gt;. Constant expression 0 is dense over any space, but constant expression NULL is sparse over any space. Therefore the simple change to use NULL instead of 0, lets query optimizer to use bulk evaluation mode, since it knows that the &amp;quot;then&amp;quot; branch of IIF is always trivially sparse (always contains no data)&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH 
MEMBER &lt;/span&gt;Measures.Ratio &lt;span style="color:blue;"&gt;AS 
  &lt;/span&gt;&lt;span style="color:maroon;"&gt;IIF&lt;/span&gt;([Measures].[Internet Sales Amount]=0
  ,&lt;span style="color:blue;"&gt;NULL
  &lt;/span&gt;,[Measures].[Internet Standard Product Cost]/[Measures].[Internet Sales Amount])
&lt;span style="color:blue;"&gt;MEMBER &lt;/span&gt;Measures.TestPerf &lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;span style="color:maroon;"&gt;SUM&lt;/span&gt;(
  ([Product].[Subcategory].[Subcategory].&lt;span style="color:blue;"&gt;MEMBERS
  &lt;/span&gt;,[Customer].[State-Province].[State-Province].&lt;span style="color:blue;"&gt;MEMBERS
  &lt;/span&gt;,[Customer].[Customer].[Customer].&lt;span style="color:blue;"&gt;MEMBERS
  &lt;/span&gt;,[Promotion].[Promotion Type].[Promotion Type].&lt;span style="color:blue;"&gt;MEMBERS
  &lt;/span&gt;,[Ship Date].[Date].[Date].&lt;span style="color:blue;"&gt;MEMBERS&lt;/span&gt;), Measures.Ratio)
&lt;span style="color:blue;"&gt;SELECT &lt;/span&gt;TestPerf &lt;span style="color:blue;"&gt;ON &lt;/span&gt;0
&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;[Adventure Works]&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;This query now finishes in 0.1 seconds, which is sure indication that bulk evaluation was used. But let's go back to the business requirements. What if business users don't want to see NULL, what if they do want to see &amp;quot;NA&amp;quot; instead. Well, this kind of formatting is best done by FORMAT_STRING logic, which exists exactly for that purpose - format raw cell values into something nice to display to the end user. We can use FORMAT_STRING to convert NULLs ratios into &amp;quot;NA&amp;quot;, as in following example: &lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH 
MEMBER &lt;/span&gt;Measures.Ratio &lt;span style="color:blue;"&gt;AS 
  &lt;/span&gt;&lt;span style="color:maroon;"&gt;IIF&lt;/span&gt;([Measures].[Internet Sales Amount]=0,&lt;span style="color:blue;"&gt;NULL&lt;/span&gt;,[Measures].[Internet Standard Product Cost]/[Measures].[Internet Sales Amount])
  ,&lt;span style="color:blue;"&gt;FORMAT_STRING&lt;/span&gt;='#.#;;;NA'
&lt;span style="color:blue;"&gt;SELECT 
&lt;/span&gt;Measures.Ratio &lt;span style="color:blue;"&gt;ON &lt;/span&gt;0
,[Promotion].[Promotion Type].[Promotion Type].&lt;span style="color:blue;"&gt;MEMBERS ON &lt;/span&gt;1
&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;[Adventure Works]&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;In conclusion, my advice is to try to avoid IIF whenever possible, and if it is not possible to avoid, use it in such a way that bulk evaluation is possible. I also recommend to never use CASE clause for performance critical calculations.&lt;/p&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=7362" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/mosha/archive/tags/mdx/default.aspx">mdx</category><category domain="http://sqlblog.com/blogs/mosha/archive/tags/performance/default.aspx">performance</category></item><item><title>Ranking in MDX</title><link>http://sqlblog.com/blogs/mosha/archive/2006/03/14/ranking-in-mdx.aspx</link><pubDate>Tue, 14 Mar 2006 09:23:00 GMT</pubDate><guid isPermaLink="false">21093a07-8b3d-42db-8cbf-3350fcbf5496:7673</guid><dc:creator>mosha</dc:creator><slash:comments>5</slash:comments><comments>http://sqlblog.com/blogs/mosha/comments/7673.aspx</comments><wfw:commentRss>http://sqlblog.com/blogs/mosha/commentrss.aspx?PostID=7673</wfw:commentRss><description>&lt;p&gt;Well, apparently I am one of the industry influencers ! At least this is what Kimball Group thinks, because they sent me their just published &amp;quot;The Microsoft Data Warehouse Toolkit: With SQL Server 2005 and the Microsoft Business Intelligence Toolset&amp;quot; book by Joy Mundy and Warren Thornthwaite. According to the message, they have sent it to industry influencers so we can voice our opinions, and, well, influence the industry. I started to go through the book, and it looks like another classic from Kimball Group, something that every DBA who does BI on Microsoft platform has to read. This post is not a book review, since I didn’t have a chance to read it deeper. However, one thing caught my attention, and I thought that the issue is common enough to write an article about it. At page 402, where authors talk about building BI applications, there is a MDX expression for calculated member to compute Employee ranks based on their sales. With some minor modifications, the expression looks like following&lt;/p&gt;  &lt;pre class="code"&gt;&lt;span style="color:maroon;"&gt;RANK&lt;/span&gt;(
  [Employee].[Employee].&lt;span style="color:maroon;"&gt;CurrentMember
 &lt;/span&gt;,&lt;span style="color:blue;"&gt;ORDER&lt;/span&gt;(
    [Employee].[Employee].[Employee].&lt;span style="color:blue;"&gt;MEMBERS
   &lt;/span&gt;,[Measures].[Reseller Sales Amount]
   ,&lt;span style="color:blue;"&gt;BDESC&lt;/span&gt;))&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;It is not uncommon to see people implementing Rank using such construct, i.e. Rank over Order. Unfortunately, this construct has severe performance problem. Due to dynamic nature of the expression, it requires that the ordering of all employees should be done for every single cell where the calculated member is evaluated. Let’s try to rank all of our employees using this formula: &lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH MEMBER &lt;/span&gt;[Measures].[Employee Rank] &lt;span style="color:blue;"&gt;AS 
 &lt;/span&gt;&lt;span style="color:maroon;"&gt;RANK&lt;/span&gt;([Employee].[Employee].&lt;span style="color:maroon;"&gt;CurrentMember&lt;/span&gt;, &lt;span style="color:blue;"&gt;ORDER&lt;/span&gt;([Employee].[Employee].[Employee].&lt;span style="color:blue;"&gt;MEMBERS&lt;/span&gt;, [Measures].[Reseller Sales Amount], &lt;span style="color:blue;"&gt;BDESC&lt;/span&gt;))
&lt;span style="color:blue;"&gt;SELECT
&lt;/span&gt;[Measures].[Employee Rank] &lt;span style="color:blue;"&gt;ON &lt;/span&gt;0 
,[Employee].[Employee].[Employee].&lt;span style="color:blue;"&gt;MEMBERS on &lt;/span&gt;1
&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;[Adventure Works]&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;It takes about 1 minute and 20 seconds on my machine to finish this query, which is way too long. Let’s try to analyze the complexity of the involved computations. We will denote the cardinality of Employee attribute as &lt;i style="mso-bidi-font-style:normal;"&gt;n&lt;/i&gt;. Since we do ordering over all employees, it will take &lt;i&gt;O(n*log(n))&lt;/i&gt; and since we are doing it for every single employee, the resulting complexity will be &lt;i&gt;O(n&lt;sup&gt;2&lt;/sup&gt;*log(n))&lt;/i&gt;. So let’s try to improve on it. It seems first, that the root of the inefficiency is in the fact that we are doing full sorting, when we only need to find a position of one tuple (employee) in the would be ordered set. Of course, it is possible to implement this operation linearly – by simply scanning the set, and counting the number of elements that were smaller then ours. This looks like a perfect job for the stored procedure. I have already written how MDX can be extended with stored procedures using Server ADOMD.NET when I discussed stored procedures returning sets &lt;a href="http://sqlblog.com/blogs/mosha/archive/2005/03/31/random-sampling-of-mdx-set-using-stored-procedures-with-server-adomd-net.aspx"&gt;here&lt;/a&gt;, so now we have even more practical example – use stored procedure for calculated members.&lt;/p&gt;

&lt;p&gt;Below is the sample implementation of the above mentioned algorithm &lt;/p&gt;

&lt;pre&gt;using Microsoft.AnalysisServices.AdomdServer;

namespace MRank
{
    public class MRank
    {
        public int Rank(Tuple tuple, Set set, Expression exp)
        {
            int rank = 1;
            double valTuple = (double)exp.Calculate(tuple);
            foreach (Tuple t in set.Tuples)
            {
                double valCurrent = (double)exp.Calculate(t);
                if (valCurrent &amp;lt; valTuple)
                    rank++;
            }

            return rank;
        }
    }
}&lt;/pre&gt;

&lt;p&gt;Now we will register this assembly in the database, and send the following query:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH MEMBER &lt;/span&gt;[Measures].[Employee Rank] &lt;span style="color:blue;"&gt;as 
  &lt;/span&gt;mrank.mrank.mrank.[rank](
   [Employee].[Employee].&lt;span style="color:maroon;"&gt;CurrentMember&lt;/span&gt;, [Employee].[Employee].[Employee].&lt;span style="color:blue;"&gt;MEMBERS&lt;/span&gt;, -[Measures].[Reseller Sales Amount])
&lt;span style="color:blue;"&gt;SELECT &lt;/span&gt;[Measures].[Employee Rank] &lt;span style="color:blue;"&gt;ON &lt;/span&gt;0 
,[Employee].[Employee].[Employee].&lt;span style="color:blue;"&gt;MEMBERS ON &lt;/span&gt;1
&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;[Adventure Works]&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;This query returns in less then 20 seconds, which is 4 times faster then before, but still looks too high. Probably we are running into overhead of stored procedures, since it has to go through managed-native transition, and not just for every invocation of stored procedures, but every time ADOMD.NET calls into server, i.e. on methods such as Expression.Calculate. Luckily, Analysis Services 2005 provides native override of MDX function Rank with 3 parameters, where the third parameter is the expression by which tuple should be ranked, and it implements linear algorithm similar to the one in the stored procedure we just developed. So now we can rewrite our query as&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH MEMBER &lt;/span&gt;[Measures].[Employee Rank] &lt;span style="color:blue;"&gt;AS
&lt;/span&gt;&lt;span style="color:maroon;"&gt;RANK&lt;/span&gt;([Employee].[Employee].&lt;span style="color:maroon;"&gt;CurrentMember&lt;/span&gt;, [Employee].[Employee].[Employee].&lt;span style="color:blue;"&gt;MEMBERS&lt;/span&gt;, -[Measures].[Reseller Sales Amount])
&lt;span style="color:blue;"&gt;SELECT &lt;/span&gt;[Measures].[Employee Rank] &lt;span style="color:blue;"&gt;ON &lt;/span&gt;0 
,[Employee].[Employee].[Employee].&lt;span style="color:blue;"&gt;MEMBERS ON &lt;/span&gt;1
&lt;span style="color:blue;"&gt;FROM &lt;/span&gt;[Adventure Works]&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;

&lt;p&gt;This query now returns in about 6 seconds, which is still faster then before. Let’s do complexity analysis again. For every employee we scan all the employees, so we get &lt;i&gt;O(n&lt;sup&gt;2&lt;/sup&gt;)&lt;/i&gt; which is better then previous result of &lt;i&gt;O(n&lt;sup&gt;2&lt;/sup&gt;*log(n))&lt;/i&gt;. There are few things to notice here:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;First, the expression of Rank was -[Measures].[Reseller Sales Amount]. The reason we needed to prefix the Reseller Sales Amount measure with unary minus, was because we wanted ranking in descending order, i.e. from higher to lower, and since we were counting the tuples which had smaller value – this would produce ranking in ascending order, therefore we had to flip the expression. Those who read my whitepaper about AS caching, might remember, that since AS doesn’t cache subexpressions, but only full coordinates, using such subexpression in Rank might hurt performance, but actually here it doesn’t, because unary minus is such a trivial operation. &lt;/li&gt;

  &lt;li&gt;Second, perhaps more importantly, our last two queries produced different results from the first query on employees which had exactly same Sales. This is most evident on employees who sold nothing, so their Reseller Sales Amount was NULL. In Rank(Order()) case, such employees were ranked according to their position in the attribute hierarchy, since Order in MDX is deterministic function, and it preserves the original set order when keys are equal. However, Rank with expression, simply counts number of tuples with lesser value, therefore employees with equal values get equal rank. In reality, it usually doesn’t matter which way to go, and it is possible to develop another stored procedure which will take two expressions instead of one, and the second expression will be used when values of first expression are equal. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But probably the most important issue here is that the query still feels too slow at 6 seconds. We would expect this query to return instantaneously. And indeed, there is a query plan which would make it to be instantaneous. It is quite simple actually. Let’s sort the employees first, and then build a hash index to look up employee’s position in the sorted list. The way to express this query plan in MDX, is to create a named set. Named sets in MDX are always static, i.e. they are evaluated in the context in which they were defined (note that static doesn’t mean materialized, for example CrossJoin can be perfectly static, but it is never materialized). And the hash index is built on top of the static set. Therefore our query will become now:&lt;/p&gt;

&lt;pre class="code"&gt;&lt;span style="color:blue;"&gt;WITH 
SET &lt;/span&gt;OrderedEmployees &lt;span style="color:blue;"&gt;AS ORDER&lt;/span&gt;([Employee].[Employee].[Employee].&lt;span style="color:blue;"&gt;members&lt;/span&gt;, [Measures].[Reseller Sales Amount], &lt;span style="color:blue;"&gt;BDESC&lt;/span&gt;)
&lt;span style="color:blue;"&gt;MEMBER &lt;/span&gt;[Measures].[Employee Rank] &lt;span style="color:blue;"&gt;AS &lt;/span&gt;&lt;span style="color:maroon;"&gt;RANK&lt;/span&gt;([Employee].[Employee].&lt;span style="color:maroon;"&gt;CurrentMember&lt;/span&gt;, OrderedEmployees)
&lt;span style="color:blue;"&gt;SELECT &lt;/span&gt;[Measures].[Employee Rank] &lt;span style="color:blue;"&gt;ON &lt;/span&gt;0 
,[Employee].[Employee].[Employee].&lt;span style="color:blue;"&gt;MEMBERS ON &lt;/span&gt;1
&lt;span style="color:blue;"&gt;from &lt;/span&gt;[Adventure Works]&lt;/pre&gt;
&lt;a href="http://11011.net/software/vspaste"&gt;&lt;/a&gt;This query indeed executes instantly, and since the sorting is done only once, and then followed by one linear scan, the complexity ends up being just &lt;i&gt;O(log(n))&lt;/i&gt;. 

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;There is one additional question left here. It is not always possible to define such a named set, and since this last query is actually equivalent to the first query (as taken from the book), then why isn’t query optimizer chooses query plan for the original query to be identical to the one in the last query. The answer is that in the general case Rank(Order()) is not equivalent to the Rank over static ordered set, precisely because in the first case the Order is dynamic and might return different results for different cells. However, in our example, and quite often in practice, the inner Order sets is what AS query optimizer calls “context independent”, i.e. it always produces the same result for every employee. AS query optimizer has many places where it can deduct that subexpressions are context independent and compute them only once. Unfortunately, this isn’t the case yet for Rank(Order()) constructs, but since the infrastructure is there, we can hope that future versions of AS will implement this optimization, and then this article will become obsolete :)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; I just found out that the Chapter 6 &amp;quot;Sorting and Ranking in MDX&amp;quot; from the new MDX Solutions 2nd Edition book was put online by Wiley (probably for promotional reasons). It contains really good complimentary information to this article, so if you found the article useful, I recommend to read the chapter as well: &lt;a href="http://media.wiley.com/product_data/excerpt/80/04717480/0471748080.pdf"&gt;http://media.wiley.com/product_data/excerpt/80/04717480/0471748080.pdf&lt;/a&gt;&lt;/p&gt;&lt;img src="http://sqlblog.com/aggbug.aspx?PostID=7673" width="1" height="1"&gt;</description><category domain="http://sqlblog.com/blogs/mosha/archive/tags/mdx/default.aspx">mdx</category><category domain="http://sqlblog.com/blogs/mosha/archive/tags/performance/default.aspx">performance</category></item></channel></rss>