THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Microsoft OLAP by Mosha Pasumansky

Hierarchical trace profiling with MDX Studio

Profiling and tracing functionality of Analysis Services is very valuable in analyzing query performance. Trace contains wealth of information about what is happening in the server. However, I found out, that this information might not be very easy to digest. Profiler just shows flat list of lines and lines of trace events. However, there is an inherited hierarchical grouping of events. For example, within Non Empty processing, we may have multiple SE queries, each one triggering several partition queries etc. It is difficult to pick up this hierarchical grouping from the trace, partly because design of trace events is somewhat inconsistent. Some events have clear Begin/End boundaries – like QueryBegin/QueryEnd events, others have only one event at the end – like QuerySubcube, yet others in addition to Begin/End pair also have multiple “Current” events in the middle – like SerializeResultsCurrent. And while some important fragments of query execution have special events – like NON EMPTY and Serialize Results, others don’t – like calculate query named sets and axes etc. New release of MDX Studio (version 0.4.7 – can be downloaded here) solves the above mentioned problems as much as possible. It features the “Profile” button (right next to “Execute” and “Debug” buttons), which produces the hierarchical trace view for the query.

Let’s take a look at the following example:

    [Measures].[Reseller Sales Amount]
   ,[Measures].[Internet Sales Amount]
  } ON 0
    ) ON 1
FROM [Adventure Works];

If we profile it in MDX Studio we will get the following trace:

We see in this picture that the query execution consisted of two major steps – NON EMPTY and serialization of results. NON EMPTY took 14.6% of the time, and serialization of results took 85.34% of the time. Within NON EMPTY, there were 8 subcube events, 6 of them coming from cache and 2 requiring query to partitions, and querying of 4 partitions in Reseller Sales measure group took 17.02% of NON EMPTY and 2.5% of the total time, and the same for the Internet Sales measure group (not surprisingly, these numbers are in agreement with perfmon counters reported on the left).

It is well known, that performance of MDX query depends a lot on what data (and if any) is in the server’s cache. Therefore, MDX Studio in Profile mode executes query twice, once on cold caches and once on warm caches, and displays both hierarchical traces and perfmon stats for both. It also computes the ratio of time spent on cold caches and ratio spent on warm caches (sometimes this can serve as first approximation of SE vs. FE split, but more accurate calculation would be available in the next version of MDX Studio). For example here is how it looks for the following query:

SELECT [Measures].MEMBERS ON 0 FROM [Adventure Works]

Cold cache execution

Time              : 671 ms
Calc covers       : 254
Cells calculated  : 3734
Sonar subcubes    : 89
SE queries        : 200
Cache hits        : 146
Cache misses      : 57
Cache inserts     : 21
Cache lookups     : 203
Memory Usage KB   : 5108

Warm cache execution

Time              : 46 ms
Calc covers       : 30
Cells calculated  : 33
Sonar subcubes    : 1
SE queries        : 8
Cache hits        : 8
Cache misses      : 0
Cache inserts     : 0
Cache lookups     : 8
Memory Usage KB   : 0

Percent cold caches    :93.14 %
Percent warm caches    :6.86 %

In order to get accurate timing for the “Cold cache execution”, MDX Studio does couple of things. First, and obvious one, it clears all the caches. But this by itself is not enough. The problem is that ClearCache command also resets precomputed static results of MDX Script. They will be recalculated when next query executes, but we really don’t want to include time for MDX Script execution in the query execution times. Therefore, MDX Studio runs empty query first, just to make sure that MDX Script has executed.

Profiling is an important performance tuning tool, but it also can help to find semantic problems with MDX query. Let’s consider the following example – we want to show top 10 products by their sales. The following query is supposed to provide an answer:

  [Measures].[Internet Sales Amount] ON 0
  ) ON 1
FROM [Adventure Works];

Of course, seasoned MDX practitioner will immediately see the problem with the above query, but for someone less experienced, the problem might not be particularly obvious. Let’s profile this query and look at the hierarchical trace:

Now it is obvious to anybody looking at this picture, that there is something wrong here. We would expect to see querying the Internet_Sales_200X partitions, since we wanted to see products by sales, but why there is querying of Reseller_Sales_200X partitions, we didn’t ask anything in the query about Reseller Sales. Since these Reseller_Sales partitions show up under “Calculate axes” group of events, we know that there is something wrong with MDX defining the axes. Indeed, the expression

TopCount([Product].[Product].[Product].MEMBERS ,10 ,[Measures].CurrentMember)

is the problem here. It uses [Measures].CurrentMember as criteria of TopCount, but in MDX axes are computed independently of each other, so it doesn’t matter that we have [Internet Sales Amount] on columns, the TopCount is computed in the context of the default measure, which we now suspect must be from the Reseller Sales measure group. Indeed, if we start “Debug” mode in the MDX Studio, and navigate to [Measures].CurrentMember expression in the parse tree – we will see that it resolves to [Measures].[Reseller Sales]. Therefore changing query to

  [Measures].[Internet Sales Amount] ON 0
   ,[Measures].[Internet Sales Amount]
  ) ON 1
FROM [Adventure Works];
fixes the problem, which again can be verified in the hierarchical trace window.
Published Sunday, September 21, 2008 6:30 PM by mosha
Filed under:
Anonymous comments are disabled
Privacy Statement