THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Microsoft OLAP by Mosha Pasumansky

MDX in Katmai: Dynamic named sets

The September CTP of Katmai (speculated to be released during PASS) is expected to be the first Katmai CTP to contain significant improvements in the OLAP engine. The most important one (which is, in my opinion, is also the most important AS Katmai feature) is changes to the query optimizer, which are nicknamed "Block computation mode" in the whitepapers and webcasts. I can't wait for it to be released to start blogging about new execution plans. But while we all wait for this CTP, perhaps it is a good time to cover other changes which were already released in the previous CTPs. June CTP featured four relatively small additions to MDX, and I will cover today the most interesting one  - dynamic named sets. (The other three are - CREATE KPI statement, ability to specify additional properties such as display folder, associated measure group etc inside CREATE MEMBER statement, and ability to update calculated member "on the fly").

Semantic of dynamic named sets

Named sets have been part of MDX from the very first version. Named sets can be defined for the query (using the WITH SET clause), for the session (using CREATE SESSION SET in the session) or globally for the entire cube (using CREATE SET statement inside MDX Script of the cube). Named sets were always static, in a sense that they were calculated only once, in the context where they were defined. It wasn't a problem for query named sets, since their lifetime was lifetime of the query anyway, and it also usually wasn't a problem for session sets, since they were typically created by the client application, for example OWC creates session named sets to be used as axes in the future queries. However, the named sets defined in the cubes had a problem. The main reason to define named sets in the cube was to expose them to the end users. Most modern client tools such as Excel 2007 have UI to let the user use the cube named sets in the query. However, since these sets are computed only once, when the MDX Script is evaluated, their applicability was severely limited. Let's take an example to illustrate why. Suppose we want to find all the products which have negative gross profit margin. Normally, in MDX we could write the following statement:

SELECT 
 [Measures].[Gross Profit Margin] ON 0 
 , NON EMPTY [Product].[Subcategory].[Subcategory] HAVING [Measures].[Gross Profit Margin] < 0 ON 1 
FROM [Adventure Works]

This brings back 3 products - Caps, Jerseys and Touring Frames. Now let's slice by year 2004 and see how profit margins look there:

SELECT 
 [Measures].[Gross Profit Margin] ON 0
 , NON EMPTY [Product].[Subcategory].[Subcategory] HAVING [Measures].[Gross Profit Margin] < 0
ON 1 
FROM [Adventure Works]
WHERE [Date].[Calendar Year].&[2004]

The new result is only Jerseys and Road Frames. Apparently, Caps and Touring Frames made recovery in 2004. 

However, the end users are not expected to master writing MDX queries, even the simple ones above. The end users want to work in UI environment such as Excel. Therefore the cube designer can put the named set defining products with negative profit margin inside the MDX Script of the cube:

CREATE SET [Adventure Works].NegativeMarginProducts AS 
  Filter([Product].[Subcategory].[Subcategory], [Measures].[Gross Profit Margin] < 0)

Now, this set will appear in Excel 2007 under the Product dimension, and can be dropped into rows with single mouse click. Excel will generate the query like the following:

SELECT [Measures].[Gross Profit Margin] ON 0
, NON EMPTY NegativeMarginProducts ON 1 
FROM [Adventure Works]

This will work fine, the results are good. However, when the user slices by year 2004, the following query is generated

SELECT [Measures].[Gross Profit Margin] ON 0
, NON EMPTY NegativeMarginProducts ON 1 
FROM [Adventure Works]
WHERE [Date].[Calendar Year].&[2004]

And the result is no longer good. It still shows Caps, Jerseys and Touring Frames, even though Caps and Touring Frames have positive margins in 2004. The reason is simple - the named set NegativeMarginProducts was evaluated only once and in the context of [Date].[All Periods] - therefore the content of this named set remains the same regardless of how the current query is sliced.

And this is exactly the problem that Katmai's dynamic sets are designed to solve. Dynamic sets are not calculated once. They are calculated before each query, and, very important, in the context of that's query WHERE clause and subselects. In other words, using the MDX notation, we can say that using the following statement in the MDX Script

CREATE DYNAMIC SET <cube>.DynSet AS <expr>

and issuing queries

SELECT Function1(DynSet) FROM <cube>
SELECT Function2(DynSet) FROM <cube>

is functionally equivalent to issuing the following queries

WITH SET DynSet SELECT Functions1(DynSet) FROM <cube>
WITH SET DynSet SELECT Functions2(DynSet) FROM <cube>

However, dynamic sets are smarter than regular query named sets. Dynamic sets won't be evaluated at all if the query doesn't reference them directly or indirectly. Therefore, the performance of queries which don't need dynamic sets won't be compromised.

So in our scenario, we can write the following in the MDX Script.

CREATE DYNAMIC SET [Adventure Works].NegativeMarginProducts AS 
  Filter([Product].[Subcategory].[Subcategory], [Measures].[Gross Profit Margin] < 0)

And all the queries will now work as expected !

Using dynamic sets for performance optimizations

While the dynamic sets were designed to address the functionality limitations of the static sets, they also can be used as very efficient performance optimization technique. Let's use as an example the problem of doing efficient ranking by value in MDX. I have covered this subject in a dedicated blog post: http://sqljunkies.com/WebLog/mosha/archive/2006/03/14/mdx_ranking.aspx. The conclusion was, that the most efficient way was to define query named set and then use RANK over it. Below is the best MDX query:

WITH 
SET OrderedEmployees AS ORDER([Employee].[Employee].[Employee].members, [Measures].[Reseller Sales Amount], BDESC)
MEMBER [Measures].[Employee Rank] AS RANK([Employee].[Employee].CurrentMember, OrderedEmployees)
SELECT [Measures].[Employee Rank] ON 0 
,[Employee].[Employee].[Employee].MEMBERS ON 1
from [Adventure Works]

However, this technique does not always works. It can only be used when the user can write his own MDX query. One of the comments (from Ed) wonders exactly about this:

"Hi, Defining the set first definitely speeds this up. However, we are finding that doesn't work when delivering results via Excel and using filters. Because of the way Excel represents filters in MDX, the filter context is not applied to the set and thus the rank is always based on the unfiltered set. Any ideas on how to speed up RANK in this context?"

Well, I didn't have a good answer for this question back then, but thanks to Katmai's dynamic sets, there is a good answer now. Remember, that from the functionality point of view, defining dynamic set is like having a query named set for every query. Therefore, we could simply define OrderedEmployees as dynamic set

CREATE DYNAMIC SET [Adventure Works].OrderedEmployees AS 
  ORDER([Employee].[Employee].[Employee].members, [Measures].[Reseller Sales Amount], BDESC)

And we can use it inside calculated members 

WITH 
MEMBER [Measures].[Employee Rank] AS RANK([Employee].[Employee].CurrentMember, OrderedEmployees)
SELECT [Measures].[Employee Rank] ON 0 
,[Employee].[Employee].[Employee].MEMBERS ON 1
FROM [Adventure Works]

If there is a slice on a query now, it will be respected - note how the ranks change.

WITH 
MEMBER [Measures].[Employee Rank] AS RANK([Employee].[Employee].CurrentMember, OrderedEmployees)
SELECT [Measures].[Employee Rank] ON 0 
,[Employee].[Employee].[Employee].MEMBERS ON 1
FROM [Adventure Works]
WHERE [Ship Date].[Calendar].[Month].&[2002]&[1]

Using dynamic sets to detect subselects

I am a bit reluctant to write this section, mostly because the functionality of subselects may change in later Katmai CTPs. However, it didn't change yet in the current CTPs. If it will change, this section may become obsolete. But the current functionality is that unlike WHERE clause, subselects do not change the current coordinate. Therefore calculated members which rely on the current coordinate but not at the cell value in the coordinate, fail to work correctly in the presence of subselects. This problem is well known to the ReportBuilder users, since it uses subselects. Consequently, unlike WHERE clause, subselects do not affect the result of EXISTING operator. Therefore calculations designed to work correctly with multiselect by means of using EXISTING operator, do not work when multiselect is implemented through subselects. This is a problem for Excel 2007 users. (Now I have to make myself clear, that I don't consider the solution below a clean solution to multiselect. A while ago I said that I was looking into clean and elegant solution for multiselect, and I had in mind something completely different - but I still cannot really talk about it.). As an example, I will use the simple problem of counting days. Again this problem has much better solution, but for simplicity I will use the MDX approach.

The classic approach in MDX for counting days is the following expression:

Count(Descendants([Ship Date].[Calendar].CurrentMember,[Ship Date].[Calendar].[Date],SELF))

It is simple enough, and it has few problems - it doesn't work with subselects and it doesn't work with multiselect in WHERE. The multiselect in WHERE can be solved by using

Count(EXISTING [Ship Date].[Calendar].[Date])

But subselects still won't work. The following query returns 1158 instead of 31:

WITH MEMBER Measures.NumberOfDays AS Count(Descendants([Ship Date].[Calendar].CurrentMember,[Ship Date].[Calendar].[Date],SELF))
SELECT NumberOfDays ON 0
FROM 
(SELECT [Ship Date].[Calendar].[Month].&[2002]&[1]  ON 0
FROM
[Adventure Works])

The problem is that subselects are almost stealth. They cannot be detected with CurrentMember or EXISTING. However, they are not completely stealth. Subselects perform autoexist with query axes and query named sets. Remember how dynamic named sets are like adding query named set to every query. Therefore, we can define the following dynamic named set in the MDX Script:

CREATE DYNAMIC SET [Adventure Works].Days AS [Ship Date].[Calendar].[Date]

Now we are guarantee, that this set will be autoexisted with whatever subselects and WHERE we might have for the query. The following query returns correct result of 31:

WITH MEMBER Measures.NumberOfDays AS Count(Days)
SELECT NumberOfDays ON 0
FROM 
(SELECT [Ship Date].[Calendar].[Month].&[2002]&[1] ON 0
FROM
[Adventure Works])

And, of course, multiselect is no longer a problem even under subselects:

WITH MEMBER Measures.NumberOfDays AS Count(Days)
SELECT NumberOfDays ON 0
FROM 
(SELECT {[Ship Date].[Calendar].[Month].&[2002]&[1],[Ship Date].[Calendar].[Month].&[2002]&[2]} ON 0
FROM
[Adventure Works])

Conclusion

While they can be considered to be a small addition to MDX, dynamic named sets are very useful feature, which can be applied beyond the things it was originally designed for. It can also be used for improving performance and for solving some of the MDX semantic issues.

References

The official documentation on dynamic sets is part of CTP BOL under MDX Language Reference -> MDX Statement Reference -> MDX Data Definition Statements -> CREATE SET Statement (MDX). If you have it installed follow this path: ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10md_6mdxassl/html/eff51eeb-5e7e-4706-b861-c57b6f3f89f0.htm

For convenience, I copied the content of BOL inline below:

Syntax

 
CREATE [SESSION] [ STATIC | DYNAMIC ] [HIDDEN] SET 
   CURRENTCUBE | Cube_Name
            .Set_Name AS 'Set_Expression'
      [,Property_Name = Property_Value, ...n]

Arguments

Cube_Name

A valid string expression that provides the name of the cube.

Set_Name

A valid string expression that provides the name for the named set being created.

Set_Expression

A valid Multidimensional Expressions (MDX) expression that returns a set.

Property_Name

A valid string that provides the name of a set property.

Property_Value

A valid scalar expression that defines the set property's value.

Remarks

A named set is a set of dimension members (or an expression that defines a set) that you create to use again. For example, a named set makes it possible to define a set of dimension members that consists of the set of the top ten stores by sales. This set can be defined statically, or by means of a function like TopCount. This named set can then be used wherever the set of the top 10 stores is needed.

The CREATE SET statement creates a named set that remains available throughout the session, and therefore, can be used in multiple queries in a session. For more information, see Creating Session-Scoped Calculated Members (MDX).

You can also define a named set for use by a single query. To define such a set, you use the WITH clause in the SELECT statement. For more information about the WITH clause, see Creating Query-Scoped Named Sets (MDX).

The Set_Expression clause can contain any function that supports MDX syntax. Sets created with the CREATE SET statement that do not specify the SESSION clause have session scope. Use the WITH clause to create a set with query scope.

Specifying a cube other than the cube that is currently connected causes an error. Therefore, you should use CURRENTCUBE in place of a cube name to denote the current cube.

Scope

A user-defined set can occur within one of the scopes listed in the following table.

Query scope

The visibility and lifetime of the set is limited to the query. The set is defined in an individual query. Query scope overrides session scope. For more information, see Creating Query-Scoped Named Sets (MDX).

Session scope

The visibility and lifetime of the set is limited to the session in which it is created. (The lifetime is less than the session duration if a DROP SET statement is issued on the set.) The CREATE SET statement creates a set with session scope. Use the WITH clause to create a set with query scope.

Example

The following example creates a set called Core Products. The SELECT query then demonstrates calling the newly created set. The CREATE SET statement must be executed before the SELECT query can be executed - they cannot be executed in the same batch.

Copy Code
CREATE SET [Adventure Works].[Core Products] AS '{[Product].[Category].[Bikes]}'
    
SELECT [Core Products] ON 0
  FROM [Adventure Works]

Set Evaluation

Set evaluation can be defined to occur differently; it can be defined to occur only once at set creation or can be defined to occur every time the set is used.

STATIC

Indicates that the set is evaluated only once at the time the CREATE SET statement is evaluated.

DYNAMIC

Indicates that the set is to be evaluated every time it is used in a query.

Set Visibility

The set can be either visible or not to other users who query the cube.

HIDDEN

Specifies that the set is not visible to users who query the cube.

Standard Properties

Each set has a set of default properties. When a client application is connected to Microsoft Analysis Services, the default properties are either supported, or available to be supported, as the administrator chooses.

Property identifier Meaning

CAPTION

A string that the client application uses as the caption for the set.

DISPLAY_FOLDER

A string that identifies the path of the display folder that the client application uses to show the set. The folder level separator is defined by the client application. For the tools and clients supplied by Analysis Services, the backslash (\) is the level separator. To provide multiple display folders for a defined set, use a semicolon (;) to separate the folders.

 

Published Saturday, August 25, 2007 10:16 PM by mosha
Filed under: ,
Anonymous comments are disabled
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement