THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Microsoft OLAP by Mosha Pasumansky

Querying dimensions in MDX

In this article we will discuss and evaluate different methods Analysis Services 2005 provides for accessing dimension members. Requirements, flexibility and performance tradeoffs will be discussed.

In the OLAP system, the primary object for storing data is the cube. MDX as a language was designed in the first place for accessing cubes. Yet, dimensions are important for accessing data as well. There are plenty of scenarios where dimension rather then cube should be queried

  • Hierarchy browser - all client apps have those
  • UI for setting more sophisticated filters and/or axes then traditional drill up/down navigation
  • Reports and analysis on dimensions (i.e. "how many products weight more then certain threshold", "show all orders placed in last two weeks" etc)
  • UI for dimension writeback
  • etc

There is an old argument in the OLAP world - whether dimension members (more precisely attribute members) represent metadata or data. From one hand attribute members are really just coordinates in the multidimensional space to navigate to cells which contain the data. And without doubt, members of the Measures dimension are considered to be metadata by everybody. Some OLAP products treat all dimension members as metadata, for example in older versions of Essbase, the cube designer was expected to enter all dimension members by hand in the outline editor (I don't know whether it is true with the Essbase 7 'Ukraine'). If the designer wanted to automatically load prepared list of members from flat file, it had to be done with additional HIS tool, so clearly this scenario was treated as not mainline. Probably this approach works fine with small models typical for financial and planning cubes. But in sales and marketing applications, the cardinality of dimension can easily reach hundreds of thousands up to tens of millions members. This is why Analysis Services treats attribute members as data, which is loaded from the database. Only instead of fact table, dimension members are loaded from dimension table (of course, degenerate dimensions are loaded from fact table as well). Therefore, most of the methods for accessing dimension members are very similar to the accessing cube data, i.e. through MDX.

MDSCHEMA_MEMBERS schema rowset

Using schema rowsets to retrieve members pays tribute to the school of thought that dimension members are metadata, since schema rowsets are traditional way (both in SQL and in OLAP) to access metadata. Certainly, XMLA spec clearly thinks so, as it only has two methods - Discover for metadata and Execute for data. While AS offers MDSHEMA_MEMBERS schema rowset, I cannot think of a good reason why would application choose to use it for getting information about the members rather then inertia and, perhaps, some convenience. This method suffers from the following drawbacks:

  • Not flexible. The schema rowset format is rigid with fixed number of columns (with exception of dynamic columns for member properties in some scenarios). It is not possible to request less columns, and it is not possible to ask for more - what you get is what you get. The fact that it is not possible to ask for less is also potentially performance problem, because all the columns are present both in XMLA on the wire (but at least there compression can help), and in the memory - whether the schema rowset request is done through OLEDB or ADOMD.NET
  • Not flexible again. It is not possible to request arbitrary set of members. The TREE_OP restriction allows a little flexibility, but at the end of the day you can either ask for all members in the hierarchy/level or single member or children of single member or ancestors of single member.
  • No windowing support. Unlike MDX resultsets, the rowsets cannot be easily constrained - you are at mercy of set of builtin schema restrictions. I.e. if the result is really big, there is no way to get first 1000 rows, and then next 1000 rows etc. This can be very serious performance problem - imagine what would happen if user accidentally expanded in UI member which has millions of children in that hierarchy
  • Limited ability for engine to optimize. Since this is not MDX query, the MDX query optimizer cannot be fully applied (although internally code for MDSCHEMA_MEMBERS tried to hook up into lower level constructs that query optimizer operates with).
  • More complicated programming model. This is a subjective one, but personally I would rather deal with single way of accessing all the data then different ways for dimensions and for cubes

MDX to dimension cube

Promoting the line of thought that dimension members are data, Analysis Services 2005 automatically exposes every database dimension as dimension cube. Such cubes can be detected in MDSCHEMA_CUBES when the CUBE_SOURCE restriction is set to 2 (DIMENSION) - the default value is 1 (CUBE). The names of the cubes are the names of dimensions prefixed with the $ sign. I.e. for dimension Product there is $Product cube, for Customer there is $Customer cube etc. Dimension cube contains two cube dimensions - the database dimension it was built from and the Measures dimension. There is a single measure with the name of the dimension - but this measure has always value NULL. The only reason this measure is there, is because AS doesn't support cubes without measures, so it creates a fake one, but it doesn't affect performance in any way. Now, since dimension cube is a cube, it can be queried with full power of MDX. Few examples:

Return first 100 products

SELECT Head([Product].[Product].[Product].Members, 100) ON 0 FROM $Product

Get list of all Road Bikes together with their weights and colors

We can do this either using DIMENSION PROPERTIES clause of axis specification in SELECT statement

SELECT [Product].[Product Categories].[Subcategory].[Road Bikes].Children 
  DIMENSION PROPERTIES [Product].[Product Categories].[Product].[Weight], [Product].[Product Categories].[Product].[Color] ON 0
FROM $Product

Alternatively, since we have full power of MDX, we could use query calculated measures. The traditional, AS2000 style MDX would look like

WITH
  MEMBER Measures.ProductWeight AS [Product].[Product Categories].Properties("Weight")
  MEMBER Measures.ProductColor  AS [Product].[Product Categories].Properties("Color")
SELECT 
{ Measures.ProductWeight, Measures.ProductColor } ON 0
,[Product].[Product Categories].[Subcategory].[Road Bikes].Children ON 1
FROM $Product

However, as I already wrote in the past,  use of .Properties should be avoided as much as possible. Instead, we can take the advantage of the fact that in AS2005 attribute relationships are integrated deep inside MDX. I.e. when coordinate on the related attribute changes, all the relating attributes change automatically as well to reflect it. Therefore, even though Weight and Color attributes are not included in the Product Categories hierarchy, since they relate with Product attribute - when the coordinate on Product changes - so do coordinates on Weight and Color. This is very fundamental feature of UDM - it allows to achieve the claimed goal, that hierarchies in UDM are nothing but navigational paths. It is not important which hierarchy is used to navigate to the cell, what is important is attribute relationships. This feature is called sometimes "strong hierarchies" to allude to the fact that attribute relationships form an invisible grid of imaginary hierarchies which truly define coordinate transformations. With this knowledge, we can rewrite the previous query to become

WITH
  MEMBER Measures.ProductWeight AS [Product].[Weight].MEMBER_KEY
  MEMBER Measures.ProductColor  AS [Product].[Color].MEMBER_KEY
SELECT 
{ Measures.ProductWeight, Measures.ProductColor } ON 0
,[Product].[Product Categories].[Subcategory].[Road Bikes].Children ON 1
FROM $Product

Produce set of products which weight more then 400 pounds

SELECT Exists([Product].[Product].[Product].MEMBERS, [Product].[Weight].[Weight].[400] : NULL) ON 0
FROM $Product
or to get the same list but with weights 
SELECT CrossJoin([Product].[Product].[Product].MEMBERS, [Product].[Weight].[Weight].[400] : NULL) ON 0
FROM $Product

Get the count of products in certain [category|color|weight|style|etc...]

Getting count of members on the key attribute is often quite helpful, up to the point that we seriously considered to have the builtin measure inside dimension cube to return this count automatically, but ultimately decided against it, since it was so simple to do it through MDX. Here is an example breaking counts of products by color:

WITH MEMBER Measures.ProductCount AS (Existing [Product].[Product].Members).Count
SELECT ProductCount ON 0
,[Product].[Color].Members ON 1
FROM $Product

Through this simple query we learn, that there are only 2 Grey products out of 607, but there are 134 Black products. (255 products marked as NA)

So querying the dimension cube through MDX seems to be ideal solution, but it has its own drawbacks:

  • Dimension cubes are exposed only to database administrator.
  • Dimension cubes expose database dimensions, which could differ from cube dimension because cube dimensions can have different dimension security applied to them in different cubes. Also, cube dimension has a notion of granularity attribute per measure group, which might be important to reflect in UI
  • Querying dimension cubes doesn't allow to join multiple dimensions, since dimension cube contains a single database dimension
  • The predefined named sets are not available in the dimension cube, since they are defined inside MDX Scripts of the real cubes

Querying cube dimension

Techniques for querying the dimension inside context of the cube seem to be very similar to the techniques for querying inside the dimension cube. However, there is an important difference. The straightforward MDX statement like

SELECT Product.Category.Members ON 0 FROM [Adventure Works]

which would've been fine inside dimension cube, is not so good for accessing dimension members. The reason is that the real cube has real measures, which have real values. Even though the measures are not explicitly mentioned inside the query, the default measure is still set as a current measure coordinate. Therefore the above query fetches not only the required dimension members, but also cell values, which can be very expensive depending on the size of the cube and complexity of calculations within. So we need a way to tell AS not to bring cell values, just the axes. There are several ways of achieving this. One is to work with cell ranges and specify BeginRange and EndRange XMLA command properties in such a way that they define empty range. However, this requires some programming inside the application. It is preferable to do it at the level of MDX statement itself. Most often I've seen people doing something like following:

WITH MEMBER Measures.Nothing AS NULL 
SELECT Product.Category.Members ON 0 FROM [Adventure Works] WHERE Nothing

This works, but it isn't clean solution. Even though all returned cells will have value NULL, this solution still forces query engine to evaluate calculation (although a trivial one) for every member on the axis. When there are a lot of members, this can add up. Then there is an option of eliminating all cell properties but one - cell ordinal, i.e.

SELECT Product.Category.Members ON 0 FROM [Adventure Works] CELL PROPERTIES Cell_Ordinal

I guess this works, since neither Value nor Formatted_Value properties are ever computed, but I still don't think that this is the cleanest solution, since the cells are still present in the cellset, even though they are completely stripped of content. I prefer different solution. The cells in the resultset are defined at intersection of coordinates on all axes. Therefore, if one of the axes is empty - the intersection is empty - and there are no cells at all ! So the trick is to define one of the axes as empty set, as following:

SELECT {} ON 0
,Product.Category.Members ON 1 
FROM [Adventure Works]

In fact, inspection of queries sent from ADOMD.NET object model on certain member operations reveal that it uses this very method.

Search in the dimension

We briefly touched on the access methods for dimension members, but the really interesting discussion is how to form MDX queries to perform sophisticated searches inside attribute members, and do it in the most optimal way. This is a big subject, and I will cover it in the next chapter.

Published Wednesday, October 11, 2006 5:34 PM by mosha
Filed under:
Anonymous comments are disabled
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement