THE SQL Server Blog Spot on the Web
Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Microsoft OLAP by Mosha Pasumansky

  • Parent-Child Dimension Table Naturalizer

    Parent-Child dimensions is an important feature of Analysis Services. Parent-Child dimension allow flexibility in the dimension modeling for scenarios such as Bill Of Materials, Chart of Accounts, Employee Organization Structure and others. However, this flexibility comes with the cost. Overuse of parent-child can cause performance problems. Additionally, there are some semantic quirks which can cause problems with calculations – such as parent-child not decoding related attributes like the normal dimensions do. So in some cases it is advisable to convert parent-child hierarchy to the regular multi-level hierarchy. This process of conversion is not a trivial one and it includes several steps.

    Jon Burchel, Senior Support Escalation Engineer in Microsoft, looked deeper into this problem, and came up with the tool called “Analysis Services Dimension Table Naturalizer” (or PCDimNaturalizer), which automates conversion of parent-child dimensions to regular ones. He put the project on www.codeplex.com – meaning that all the source code is available as well.

    Unlike most open source projects (at least on codeplex), this one is extensively documented (I guess Jon, being a support engineer, really understands the value of good documentation).

    The PCDimNaturalizer can be used from command line, with UI and it also exposes object model allowing it to be embedded in the .NET application or SSIS package. I am sure this tool will be useful for many Analysis Services practicioners.

    PCDimNaturalizer project at codeplex: http://www.codeplex.com/PCDimNaturalize


  • Intellisense in MDX Studio

    With Visual Studio and other modern IDEs, Intellisense became a must have tool in the developer’s toolset, boosting productivity. MDX developers have asked for it, but full, context-dependant intellisense wasn’t available. Until now. MDX Studio now features comprehensive support for intellisense. The first version with usable intellisense support is 0.4.4. (As usual, please visit MDX Studio forum to leave feedback and to find out about new releases).

    The following screenshots show some (but certainly not all!) of the MDX Studio intellisense capabilities:

     

    User starts typing new MDX statement

    It picks the “SELECT” keyword from the first two letters “se”. User can either keep typing, or hit Space or Enter to do auto-complete this keyword.

     

    The auto-completion is aware of the cube metadata when MDX Studio is connected to the server. So when user typed “D”, the auto-completion list featured MDX functions starting with “D”, MDX keywords starting with “D” but also appropriate metadata objects, in this case – dimensions, starting with “D”.

    Once “Descendants” function is chosen, the tooltip shows what arguments it takes

    The auto-completion list is context-aware, since [Date] is dimension name, it now offers all of the hierarchies of the Date dimension.

     

    The auto-completion is also aware of MDX type system. Now it understands that [Date].[Calendar] is hierarchy, therefore it offers MDX methods and properties which can be applied to the hierarchy, as well as levels of this hierarchy.

    Intellisense is also location-aware. In this case, the user just typed the “FROM” keyword, so auto-completion list offers list of cubes in the current database which can be used in the FROM clause.

    There are more features in MDX Studio Intellisense, like parenthesis matching etc. Hopefully this will make MDX developers more productive.


  • SQL PASS Summit 2008 and MS BI Conference 2008

    SQL PASS Summit 2008 and Microsoft Business Intelligence Conference 2008 are two major conferences targeting very similar audiences. And this year both conferences are happening in the same location very close to each other (there is just one month separating them). Thus, many SQL BI professionals are forced to choose which one to attend. And this time the choice is not an easy one. Here is a comparison table:

      SQL PASS Summit 2008 Microsoft Business Intelligence Conference 2008

    Keynotes

    Ted Kummert, Tom Casey, David J. DeWitt

    http://summit2008.sqlpass.org/keynotes.html

    Kurt DelBene, Stephen Elop, Ted Kummert

    http://www.msbiconference.com/pages/speakers.aspx

    Tracks

    • Application Development
    • Business Intelligence
    • Database Application
    • Professional Development
    • Microsoft BI Platform and Infrastructure
    • Microsoft BI Clients and Applications
    • Deployment and Best Practices
    • Customer and Industry Solutions
    • Business Value of Business Intelligence
    • Partner Training Track

    Sessions

    Over 130 technical sessions

    http://summit2008.sqlpass.org/program-sessions.html

    65 breakout sessions

    http://www.msbiconference.com/pages/tracksandsessions.aspx

    Extras

    14 preconference full day seminars

    Chalk talks

    SSAS specific sessions

    • Optimizing Dimension Designs in SSAS , Dave Fackler
    • Creating an SSIS, SSAS and SSRS monitoring solution with SSIS, SSAS and SSRS , Chris Webb
    • Do It Right: Best Practices for Analysis Services 2005 and 2008, Craig Utley
    • Troubleshooting MDX Query Performance , Stacia Misner
    • Deep dive into MDX (precon) - Mosha Pasumansky
    • Avoiding Common SQL Server Analysis Services Mistakes, Craig Utley
    • Designing High Performance Cubes in SQL Server 2008 Analysis Services, T.K. Anand
    • SQL Server Analysis Services Performance Monitoring, Carl Rabeler
    • Manageability and Scalability Improvements in Microsoft SQL Server 2008 Analysis Services, T.K. Anand
    • Optimizing Query Performance in Microsoft SQL Server 2008 Analysis Services, T.K. Anand
    Dates November 18-21, 2008 October 6-8, 2008
    Location Washington State Convention & Trade Center,
    Seattle WA.
    Washington State Convention & Trade Center,
    Seattle WA.

    Last year MS BI Conference was criticized for lack of technical sessions, but this year the list looks pretty strong, and totally comparable to the PASS’s one. In fact, comparing the SSAS specific sessions, it’s easy to notice that there is overlap. Craig Utley’s session is probably going to be the same. Each conference also features dedicated session on MDX performance. But there is plenty of different content, of course. Each conference has its own extras. PASS packs two days of preconference seminars, which are full day sessions. MS BI Conference, on the other hand, offers so called “chalk talks”, which were very popular last year. This time they are supposedly better organized – i.e. held in the real rooms with not only whiteboard, but computer and projector as well (the list of chalk talks is supposed to be published by next week).

    Overall not an easy choice. Luckily for me, I don’t have to choose between the two – since both are in Seattle – I am going to attend both of them ! Actually, I have speaking engagements in both of them. In SQL PASS, I have “Deep dive to MDX” preconference seminar, (the same one I was supposed to do last year, but it got canceled). And in MS BI Conference, I will be hosting the chalk talk “Got MDX problems ? MDX Studio to the rescue !”.

    So make your decisions, the registrations are open !


  • Product Volatility: Optimizing MDX with MDX Studio

    In this article we will show how to take typical, everyday MDX and optimize it by simply following advices of MDX Studio Analyzer.

    As a starting point, I will take “Product Volatility” example as published by Thomas Ivarson in his blog. It is a perfect example, because Thomas took a real business problem, and implemented it using very clean and straightforward MDX, just like I expect many MDX practitioners do. This MDX can be optimized, but some of these optimizations might not be obvious. Running MDX Studio Analyzer and following its advices takes the guesswork out of equation.

    Here is the MDX that computes total number of products, number of products with sales increased from last month and number of products with sales decreased from last month:

    WITH 
      MEMBER Measures.TotNumberOfProducts AS 
        Count
        (
          Descendants
          (
            [Product].[Product Categories].CurrentMember
           ,[Product].[Product Categories].[Product]
          )
        ) 
      MEMBER Measures.NumberOfSoldProductsIncreasing AS 
        Count
        (
          Filter
          (
            NonEmpty
            (
              Descendants
              (
                [Product].[Product Categories].CurrentMember
               ,[Product].[Product Categories].[Product]
              )
             ,[Measures].[Internet Sales Amount]
            )
           ,
                (
                  [Date].[Calendar].CurrentMember
                 ,[Measures].[Internet Sales Amount]
                )
              - 
                (
                  [Date].[Calendar].PrevMember
                 ,[Measures].[Internet Sales Amount]
                )
            > 0
          )
        ) 
      MEMBER Measures.NumberOfSoldProductsDecreasing AS 
        Count
        (
          Filter
          (
            NonEmpty
            (
              Descendants
              (
                [Product].[Product Categories].CurrentMember
               ,[Product].[Product Categories].[Product]
              )
             ,[Measures].[Internet Sales Amount]
            )
           ,
                (
                  [Date].[Calendar].CurrentMember
                 ,[Measures].[Internet Sales Amount]
                )
              - 
                (
                  [Date].[Calendar].PrevMember
                 ,[Measures].[Internet Sales Amount]
                )
            < 0
          )
        ) 
    SELECT
      {
        [Measures].[TotNumberOfProducts]
       ,[Measures].[NumberOfSoldProductsIncreasing]
       ,[Measures].[NumberOfSoldProductsDecreasing]
      } ON 0
     ,[Date].[Calendar].[Month] ON 1
    FROM [Adventure Works]

    Executing this query with hot cache in MDX Studio produces the following perfmon counter statistics:

    Time             : 359 ms
    Calc covers      : 85
    Cells calculated : 7672
    Sonar subcubes   : 79
    SE queries       : 76
    Cache hits       : 76
    Cache misses     : 3
    Cache inserts    : 3
    Cache lookups    : 79

    The time looks good, but only because both the cube and the query are small. The number of SE queries should raise a red flag here. With only 114 cells in the result, why there were 76 SE queries ? Obviously, on more serious cubes, this will become a major performance issue. So, let’s run MDX Studio Analyzer and see what it has to say about this query. We get back the following list:

    Line Col Message Link
    3 6 If you are trying to count number of members in current selection - consider introducing special measure group for this dimension More Info
    7 41 MDX function 'CurrentMember' may raise an error or produce non-desired result when user applies multiselect More Info
    12 6 Consider rewriting Count(Filter(set, cond)) construct as Sum(set, summator) with summator IIF(cond, 1, NULL) More Info
    20 45 MDX function 'CurrentMember' may raise an error or produce non-desired result when user applies multiselect More Info
    27 34 MDX function 'CurrentMember' may raise an error or produce non-desired result when user applies multiselect More Info
    32 34 MDX function 'PrevMember' may raise an error or produce non-desired result when user applies multiselect More Info
    39 6 Consider rewriting Count(Filter(set, cond)) construct as Sum(set, summator) with summator IIF(cond, 1, NULL) More Info
    47 45 MDX function 'CurrentMember' may raise an error or produce non-desired result when user applies multiselect More Info
    53 14 Same expression was used before at Line 26 Column 14. Consider eliminating common subexpressions for better performance and to take advantage of cache  

    Let’s go over this list and fix issue by issue.

    Line 3, Col 6: If you are trying to count number of members in current selection - consider introducing special measure group for this dimension

    The link points to “Counting days in MDX” blog. While we count here Products and not Days, the same technique applies here. These types of questions are best done not in MDX, but by introducing measure group, which only contains Products dimension, and single measure [Number Of Products] of type Count.

    Line 7, Col 41: MDX function 'CurrentMember' may raise an error or produce non-desired result when user applies multiselect

    Now, this is no longer a problem because of how we solved the previous warning ! Indeed, if we wanted to see total number of products across Bikes and Accessories categories, we would’ve added the following WHERE clause:

    WHERE {[Product].[Category].&[1],[Product].[Category].&[4]}

    It fails with the original query, but works fine with Count measure [Number of Products]

    Lines 12/39, Col 6: Consider rewriting Count(Filter(set, cond)) construct as Sum(set, summator) with summator IIF(cond, 1, NULL)

    This is a common construct, and as shown in the link to the “Optimizing Count(Filter(…)) expressions in MDX”, it performs much better if rewritten to use Sum. But before we do this rewrite, let’s take a look at the next warning:

    Line 53, Col 14: Same expression was used before at Line 26 Column 14. Consider eliminating common subexpressions for better performance and to take advantage of cache

    If we click on this message, the expression in question is the following:

                (
                  [Date].[Calendar].CurrentMember
                 ,[Measures].[Internet Sales Amount]
                )
              - 
                (
                  [Date].[Calendar].PrevMember
                 ,[Measures].[Internet Sales Amount]
                )

    Indeed, this expression represents Sales Growth, so we can isolate it into separate calculated member. But before we write the exact expression for the calculated member, let’s look deeper into the original expression. It features Count(Filter(NonEmpty(…))) construct, i.e. only products which had sales in current month are considered for increasing/decreasing calculation. I.e. even if product had sales in the previous month, but not in the current month, it won’t be counted in the number of products with decreasing sales. In order to repeat the same logic, the expression would be

      [Sales Growth] =
        IIF
        (
          IsEmpty([Measures].[Internet Sales Amount])
         ,NULL
         ,
            [Measures].[Internet Sales Amount]
          - 
            (
              [Date].[Calendar].PrevMember
             ,[Measures].[Internet Sales Amount]
            )
        ) 

    For more details why IIF(cond, NULL, exp) is good for performance, read “Performance of IIF function in MDX” blog. Now, putting all of together so far, we will get the following query:

    WITH 
      MEMBER [Sales Growth] AS 
        IIF
        (
          IsEmpty([Measures].[Internet Sales Amount])
         ,NULL
         ,
            [Measures].[Internet Sales Amount]
          - 
            (
              [Date].[Calendar].PrevMember
             ,[Measures].[Internet Sales Amount]
            )
        ) 
      MEMBER PositiveGrowth AS 
        IIF
        (
          [Measures].[Sales Growth] > 0
         ,1
         ,NULL
        ) 
      MEMBER Measures.NumberOfSoldProductsIncreasing AS 
        Sum
        (
          Descendants
          (
            [Product].[Product Categories].CurrentMember
           ,[Product].[Product Categories].[Product]
          )
         ,[Measures].[PositiveGrowth]
        ) 
      MEMBER NegativeGrowth AS 
        IIF
        (
          [Measures].[Sales Growth] < 0
         ,1
         ,NULL
        ) 
      MEMBER Measures.NumberOfSoldProductsDecreasing AS 
        Sum
        (
          Descendants
          (
            [Product].[Product Categories].CurrentMember
           ,[Product].[Product Categories].[Product]
          )
         ,[Measures].[NegativeGrowth]
        ) 
    SELECT
      {
        [Measures].[Number Of Products]
       ,[Measures].[NumberOfSoldProductsIncreasing]
       ,[Measures].[NumberOfSoldProductsDecreasing]
      } ON 0
     ,[Date].[Calendar].[Month] ON 1
    FROM [Adventure Works];

    When executed in MDX Studio, we get the following perfmon statistics:

    Time             : 171 ms
    Calc covers      : 15
    Cells calculated : 1993
    Sonar subcubes   : 6
    SE queries       : 3
    Cache hits       : 7
    Cache misses     : 5
    Cache inserts    : 5
    Cache lookups    : 12

    This is now much better than before, there are only 3 SE queries. All of the performance warnings have been addressed, but we still have warnings about CurrentMember not working well with multiselect. Indeed, when we put multiselect on product categories on Bikes and Accessories, we get errors for number of increasing and decreasing products. Links to “Writing multiselect friendly MDX calculations” and “Multiselect friendly MDX for calculations looking at current coordinate” blogs are helpful to understand what the problem is. Unfortunately, there is no universal solution, and each case needs to be evaluated differently.

    In this scenario, we can use apply to Sum(Descendants(…), exp) very similar technique that we applied to Count(Descendants(…)). The idea is not to use Descendants function, which forces us to use CurrentMember, which in turn fails on multiselect, but to put the expression “exp” directly on the Product attribute. In order to do that, we need to create fake measure in the measure group - “Num Products Increased”. It is bound to NULL expression, so this measure will always be NULL, and therefore won’t take any space in storage. But since this measure will have aggregation function Sum, it will properly aggregate when assigned expressions inside MDX Script.

    We will use MDX Studio to emulate MDX Script too (MDX Studio has a feature, where it automatically uses cube selected in the dropdown list to resolve current cube used in script statements).

    Here is the MDX:

    CREATE 
      [Sales Growth] = 
        IIF
        (
          IsEmpty([Measures].[Internet Sales Amount])
         ,NULL
         ,
            [Measures].[Internet Sales Amount]
          - 
            (
              [Date].[Calendar].PrevMember
             ,[Measures].[Internet Sales Amount]
            )
        );
    (
      [Product].[Product Categories].[Product]
     ,[Measures].[Num Products Increased]
    ) = 
      IIF
      (
        [Measures].[Sales Growth] > 0
       ,1
       ,NULL
      );
    SELECT
      {
        [Measures].[Number Of Products]
       ,[Measures].[Num Products Increased]
      } ON 0
     ,[Date].[Calendar].[Month] ON 1
    FROM [Adventure Works]
    WHERE 
      {
        [Product].[Category].&[1]
       ,[Product].[Category].&[4]
      };

    It’s performance characteristics are even slightly better than the query before:

    Time             : 31 ms
    Calc covers      : 6
    Cells calculated : 76
    Sonar subcubes   : 1
    SE queries       : 1
    Cache hits       : 2
    Cache misses     : 0
    Cache inserts    : 0
    Cache lookups    : 2

    Not to mention the fact that it works just fine with multiselect on products ! This is a great example, where fixing MDX to work properly in multiselect scenarios, has the side effect of improving performance.

    We still have warnings about PrevMember with respect to multiselect on the Time dimension. It is more difficult problem, since the definition of “Sales Growth” when there is multiselect on the Time dimension is ambiguous.


  • Microsoft buys DATAllegro

    Official pressrelease: http://www.datallegro.com/pr/7_24_08_microsoft_acquisition.asp

    Curt Monash coverage: http://www.dbms2.com/category/products-and-vendors/datallegro/

    This is a huge win for Microsoft Data Warehousing (and therefore BI) strategy !


  • Analyze MDX with MDX Studio

    Always adhering to the MDX best practices could be a difficult quest to follow. While there is plenty of information out there, it is scattered across book, whitepapers, blogs, BOL, forum posts etc. And this information could be difficult to digest, and sometimes not clear how to apply in the specific scenarios. On top of that, not all the information out there is reliable. Some of it is outdated, and what used to help in one version of AS, could be bad practice in newer version. All this makes practitioner’s life very difficult. But it doesn’t have to be this way. For long time, I wanted to automate many of the checks for MDX best practices, and today I am happy to announce that this functionality is now part of MDX Studio. Starting with version 0.4.0, MDX Studio features the “Analyze” button. It goes through the MDX, analyzes it, and flags issues. To illustrate this better, let’s consider the following example:

    WITH 
     MEMBER
      Measures.WorkingDays AS
        Count(
          Filter(
            Descendants(
              [Date].[Calendar].CurrentMember
              ,[Date].[Calendar].[Date]
              ,SELF)
           ,  [Date].[Day of Week].CurrentMember.Name <> "1"
          )
        )
     MEMBER
      Measures.SalesPerWorkingDay AS 
      '
        IIF(
         [Measures].[WorkingDays] IS NULL OR [Measures].[WorkingDays] = 0
         ,NULL
         ,[Measures].[Internet Sales Amount]/[Measures].[WorkingDays])
       '
    SELECT [Measures].[SalesPerWorkingDay]  ON 0
    , [Date].[Calendar].[Month].MEMBERS ON 1
    FROM [Adventure Works]

    In this example we are looking at sales per working day, where working day is defined as anything but Sunday. Let’s run Analyzer over this query and see what it will find out:

    Line Column Message Link
    7 30 MDX function 'CurrentMember' may raise an error or produce non-desired result when user applies multiselect More Info
    10 33 MDX function 'CurrentMember' may raise an error or produce non-desired result when user applies multiselect More Info
    10 47 Use IS operator to compare objects instead of comparing them by name More Info
    4 6 Consider rewriting Count(Filter(set, cond)) construct as Sum(set, summator) with summator IIF(cond, 1, NULL) More Info
    3 7 If you want to avoid division by zero - use 'value = 0' check, if you want to check whether cell is empty - use IsEmpty function, if you want to check that MDX member or tuple exists - use 'obj IS NULL' check More Info
    2 6 Don't use single quotes for definitions of calculation Measures.SalesPerWorkingDay More Info

     

     

     

     

     

     

     

     

     

    It is easier to navigate through the messages in the desktop edition of MDX Studio, where clicking on the message highlights the relevant portion of the MDX. Also there messages can be sorted, hidden (on right-click context menu) or copy/pasted to Excel. Messages are followed by the link to more information how the particular rule can be applied.

    The warning above are relatively simple, but MDX Studio Analyzer is capable of doing more sophisticated checks. Example of such check is detection of common subexpressions. For the following query

    WITH 
     MEMBER
      Measures.SalesPerWorkingDay AS 
      '
        IIF(
         Count(
          Filter(
            Descendants(
              [Date].[Calendar].CurrentMember
              ,[Date].[Calendar].[Date]
              ,SELF)
           ,  [Date].[Day of Week].CurrentMember.Name <> "1"
          )
        ) = 0
         ,NULL
         ,[Measures].[Internet Sales Amount]
          /
           Count(
             Filter(
               Descendants(
                 [Date].[Calendar].CurrentMember
                 ,[Date].[Calendar].[Date]
                 ,SELF)
              ,  [Date].[Day of Week].CurrentMember.Name <> "1"
             )
           )
        )
       '
    SELECT [Measures].[SalesPerWorkingDay]  ON 0
    , [Date].[Calendar].[Month].MEMBERS ON 1
    FROM [Adventure Works]

    MDX Studio Analyzer will report that the exactly same Count(Filter(…)) expression was used twice:

    Line Column Message
    15 9 Same expression was used before at Line 3 Column 7. Consider eliminating common subexpressions for better performance and to take advantage of cache

     

    The difficult choice that I faced was – for which version of Analysis Services to tailor the messages – AS2005 or AS2008. While some of the advices would stay the same, many others are different, and in some cases are contradictory. For example, in AS2005, CASE operator was never optimized, so at certain scenarios it was better to use nested IIFs, but in AS2008, CASE is as optimized as IIF, so it is preferable to nested IIFs. Another example is NON_EMPTY_BEHAVIOR property, which often times is essential to get good performance in AS2005, but usually is not needed in AS2008, and is better to be removed. With AS2005 everybody recommends replacing IIFs with SCOPE whenever possible, but it is not needed with AS2008 etc. At the end I decided to go with AS2008 only – it just feels as a better investment for the future. AS2008 should be released almost any day now, and I expect much faster migration to it from AS2005, then what we saw with migration from AS2000 to AS2005. The reason is simple – migration to AS2008 from AS2005 is very smooth – all the fundamental concepts stay the same, both client and management object models stay code compatible etc.

    One last thing to mention: Best practices are exactly what they are – best practices. It means, that most of the time using them you will be better off. However, they are absolute rules to be followed always. Sometimes there are exceptions from the rules. AS is a complex, feature-rich product, and MDX engine is probably the most sophisticated piece in it. Sometimes features interact in such a way, that some best practices might result in worse results. For example, one of the advices that MDX Studio will give is when it sees Filter(CrossJoin(…)), it will recommend using CrossJoin(Filter(…)) instead. It is common sense to reduce the size of the set before crossjoin’ing. But not only this is not always possible, but also there are rare cases where it would result in worse performance. So while you are encouraged to follow the best practices, always use them as a guide, and measure before and after applying the rules to make sure things do improve.

    MDX Studio 0.4.0 is released with desktop and online versions simultaneously.

    Desktop version can be downloaded from: http://cid-74f04d1ea28ece4e.skydrive.live.com/browse.aspx/MDXStudio/v0.4.0

    Online version can be accessed at: http://mdx.mosha.com

    As usual – please post your comments, suggestions, bug reports etc at MDX Studio forum.


  • The simpler MDX is – the better !

    Whenever I am asked “What is the single most important advice to write efficient MDX”, I always give same answer: The simpler MDX you write – typically the better performance you will get. The reasoning behind this is straightforward: If MDX is simple, it will be easier for the engine to understand and therefore to optimize it. I know that we all tend to think that we can do all kinds of tricks and outsmart engine, and sometimes it is true, but the engine becomes smarter and smarter with every release, and especially in AS2008. So the tricks which could’ve helped before, actually could be hurting more than helping.

    And now, it seems that I am not the only one saying that ! Mark Whitehorn (my coauthor of “Fast Track to MDX” book) together with Keith Burns have just published a technical whitepaper “Best Practices for Data Warehousing with SQL Server 2008”. This seems to be a whitepaper focused specifically on the improvements in SQL 2008, and it includes section named “Write simpler MDX without worrying about performance”. Nice title, and in addition it has explanation of block computation mode.

    I have really only couple of minor comments about it:

    - Block computation mode was not introduced in AS2008, it existed in AS2005 (I have plenty of examples in my blogs). AS2008 though, extended applicability of block computation mode very significantly.

    - Description of calculated member says that it is “running total of two consecutive years”, but the expression is not for running total, but rather for regular total, and also there is nothing specific about years in it. It will work with Month, Quarter, Day etc just as well.

    - I didn’t really understand the following statement “In most hierarchical structures we know that if data is present for one cell in 2003, it will be there for all cells in 2003. The trip to see if there is data for the previous period only needs to be carried out once”. I think this statement is really misleading. If data is present for one cell in 2003 – there is absolutely no guarantee that it will be present for all (even for any other) cells in 2003. The diagrams just above this statement show that in fact most of the cells don’t have data. What I think authors tried to say was that MDX function PrevMember doesn’t need to be called for every cell, because [2004].PrevMember will always result in [2003], and AS2008 is smart enough to make this evaluation only once, and then compute the entire subcube of 2003 for non-empty cells only

    - The example uses VISIBLE=2 property on calculated member. I wonder why…


  • MDX Studio Online – Format and Parse MDX

    MDX Formatting functionality debuted in MDX Studio 0.2.8, and judging by the amount of feedback I got on it – it became the most popular feature of MDX Studio ever ! Couple of people mentioned that for SQL formatting/beautifying needs, they would go online to sites like www.sqlhere.com, which often do reasonable job for MDX as well. So this got me thinking that I could put MDX formatting functionality online as well. With little refactoring of MDX Studio code I was able to hack my first ever ASP.NET application, which became MDX Studio Online Edition. This was a significant milestone, so I promoted version number to 0.3, and also promoted status from Alpha to Beta.

    The MDX Studio Online can be accessed at http://mdx.mosha.com

    In addition to “Format MDX” button, it also features “Parse” button, which does exactly the same as desktop version – it shows the MDX parse tree. Just like in desktop version, it can Format and Parse both queries (i.e. SELECT statements) and MDX Scripts (i.e. CREATE MEMBER/SET, SCOPE, assignments etc). The biggest known feature which is lacking from MDX Studio formatting – is support for comments. Currently it just ignores the comments instead of carrying them forward into formatted statements. I still didn’t figured out how to do it, but this issue tops my todo list.

    Update: Big thanks to Nick Medveditskov who also has written his own MDX Formatter application – he told me the trick he used to preserve comments in formatted MDX, and this functionality is now implemented in MDX Studio deployed online.

    Being an online application – you can use it right away, without any installation, so it can be especially handful for consultants working at customer site, where software installations are not allowed. Another great benefit of it being online application, is that everybody is always on the latest code, so all the hotfixes and version upgrades go live for everybody immediately (well the downside is that if I break something, then everybody is broken right away and there is no rollback, but this would never happen to me, of course, would it ?)

    I would like to thanks several people who did smoke testing for this application before it went public and provided lots of very valuable feedback and advice, people from all around the world – Vladimir Stepa (Germany), Vidas Matelis (Canada), Greg Galloway (US), Darren Gosbell (Australia), Ajit Singh (India) and Chris Webb (UK). After all of them gave green light to the current version, I feel OK now to release it to public.

    As usual please send your feedback about the tool to the MDX Studio forum at http://www.ssas-info.com/forum/MDXStudio


  • Analysis Services protocol – official documentation

    One of my most popular blog posts was “Analysis Services 2005 protocol - XMLA over TCP/IP”. Previously this information wasn’t available from anywhere else. But things changed today. Microsoft now has engaged in the “Open Specification program”. Citing from the web page:

    Microsoft is providing open connections to its high-volume products—Windows Vista (including the .NET Framework), Windows Server 2008, SQL Server 2008, Office 2007, Exchange Server 2007, and Office SharePoint Server 2007—as well as additional information so that software developers, business partners and competitors can better interact with these Microsoft products or invent new solutions for customers.

    As a developer, you now have full access to information about protocols, binary file formats, and other specifications for these products that can be used to create solutions.

    (Yes, these include Office file format specifications!)

    Since Analysis Services is part of SQL Server, the protocols for accessing Analysis Services are listed under “Microsoft SQL Server Protocols”. The most interesting one is

    [MS-SSAS9]: SQL Server Analysis Services Protocol Specification

    which documents Analysis Services 2005 protocol (and 2008 as well, since protocol didn’t change between these versions).

    There is also

    [MS-SSAS8]: SQL Server Analysis Services Protocol Specification

    which documents Analysis Services 2000 protocol. However, this one would be much harder to go through, since AS2000 had “smart client” architecture, where client code was doing lots of work – MDX parsing, evaluation, caching etc – therefore the protocol between client and server is at much lower level than MDX queries and it is not easy to make sense out of it.


  • MDX Formatting with MDX Studio

    Darren Gosbell put up a blog post about MDX Formatting, where he describes how he prefers to format MDX to look nice on the screen. So I thought – why to leave to a human something that machine can do. I decided to implement these rules (with some modifications) into MDX Studio. It became MDX Studio 0.2.8 release, available immediately for download at http://cid-74f04d1ea28ece4e.skydrive.live.com/browse.aspx/MDXStudio/v0.2.8

    I only had couple of hours to work on it, so it is rough on the edges, and I will appreciate comments in the MDX Studio forum at http://www.ssas-info.com/forum/MDXStudio. But basic staff like calculated members, named sets, subselects etc should work. (One known exception is set aliases).

    The functionality is accessible through MDX –> Format menu. Here is example from Darren’s blog

    with member measures.ptd as 'sum(periodstodate([Date].[Calendar].[Month],
    [Date].[Calendar].currentmember),[Measures].[Sales Amount] )',format_string = "currency"
    select   {[Measures].[Measures].[Sales Amount] ,measures.ptd} on 0,
    {[Date].[Calendar].[Month].&[2003]&[2].children} on 1 from [Adventure Works] where 
    ([Product].[Product Categories].[Category].&[1])
    

    And here is how it looks after autoformatting:

    WITH
      MEMBER measures.ptd AS 
        sum(
          periodstodate(
            [Date].[Calendar].[Month],
            [Date].[Calendar].currentmember),
          [Measures].[Sales Amount]) 
        
        , format_string = "currency"
    SELECT
      {
        [Measures].[Measures].[Sales Amount],
        measures.ptd
      } ON 0,
      [Date].[Calendar].[Month].&[2003]&[2].children ON 1
    FROM [Adventure Works]
    WHERE 
      [Product].[Product Categories].[Category].&[1]
    

  • No OLAP Market shares for 2007

    Every year around March or April I am looking forward the OLAP Report site to see the OLAP market share numbers. Computing these numbers is not a simple task as Nigel Pendse explains on http://www.olapreport.com/market.htm, and this kind of analysis really is not available anywhere else. This year again, I was waiting for the 2007 results to see by how much Microsoft increased its market share. March and April passed, then May passed, we are well into June, and the results didn’t show up. So I contacted Nigel, and here is what he told me (publishing with permission):

    Mosha,

    Unfortunately, I don't think it's possible to calculate accurate OLAP market shares any more. Applix, Business Objects, Cognos and Hyperion were all acquired in the last year, and therefore no longer publish figures, so it's no longer possible to work out their OLAP revenues. Furthermore, in the year of acquisition, their figures are particularly distorted (sales people work very hard to close deals just before the acquisition, because they don't expect to keep the commission afterwards, so sales are high the quarter before the deals close, and weak the following quarter).

    Of, course, the disruption doesn't help sales in those companies, so I don't doubt that Microsoft increased its lead, its just that I don't have any good way any longer of calculating it. Generally, the new owners of those companies are not prepared to disclose product revenues.

    Regards

    Nigel

    So after the wave of last year consolidations, it is no longer possible to have accurate estimates. But even without precise numbers, how the future looks like ? (emphasis is mine)

    I may try again in the future, but at the moment, I can't think of a reliable way to calculate OLAP market shares. I think it's also less interesting than in the past, with fewer significant vendors remaining.

    Microsoft is clearly going to stay #1.

    Oracle+Hyperion will probably stay at #2, but its share is unlikely to rise.

    IBM (Cognos+Applix) will stay at number 3; its share may rise a bit, particularly if IBM gives TM1 the promotion that Applix could not afford to do, but I think IBM will stay in #3.

    SAP+BOBJ will stay in the next position.

    MicroStrategy may gain share, but is unlikely to overtake SAP+BOBJ.

    No other vendor comes close, so the top five aren't likely to change position any time soon.

    Anyway, the picture below shows how the market share trends looked like until 2006 (source: OLAP Report)

    Calculating the market shares for consolidated companies based on 2006 positions

      2006 Market share
    Microsoft 31.6%
    Oracle+Hyperion 21.7%
    Cognos+Applix=IBM 16.5%
    SAP+Business Objects 13.1%
    Microstrategy 7.3%

    Update: Turns out the numbers in this table are not accurate, because I forgot to account that Business Objects had acquired Cartesis and that SAP had acquired OutlookSoft. Anyway, Nigel has updated his OLAP market share page with deeper analysis of 2007 consolidation effect on market shares, and he has more accurate numbers.


  • Counting distinct values in MDX

    How many elements in the set have distinct values ? This sounds like a simple question. Apparently, this is well known problem in Excel, and there is a classic solution for it. There are plenty of sites in Internet (for example here and here) which give the following solution:

    =SUM(1/COUNTIF(A1:A6,A1:A6)))

    (note that this formula needs to be entered with Ctrl-Shift-Enter to indicate that this is array formula, or it won’t give right result – one of the Excel quirks)

    When someone showed me this solution, I just stared at it and I couldn’t understand how it worked. Only rewriting it in MDX clarified what was going on. Let’s take the following example based on Adventure Works – we want to count how different letters the product names start from (just wanted to pick something for which Product dimension didn’t have dedicated attribute, because otherwise it would’ve been too simple). The MDX which performs the same algorithm as Excel formula above in these conditions will look like following:

    with 
      member measures.CountDistinctLetters as 
       Sum([Product].[Product].[Product].MEMBERS AS AllProducts,
           1/Filter([Product].[Product].[Product].MEMBERS, 
              VBA!Left(AllProducts.Current.Name,1) = VBA!Left([Product].[Product].CurrentMember.Name,1)
             ).Count)
    
    select CountDistinctLetters on 0
    from [Adventure Works]
    

    It is more verbose, but probably it is not simpler to understand to normal developer than the Excel’s formula. In the nutshell the idea is following: For every element in the set, we compute how many other elements produce the same value (this is COUNTIF in Excel, and Filter(…).Count in MDX), and then dividing 1 by this number we get fraction of current element. The summing up all these fractions will give the desired number.

    (Unfortunately there is no easy way to debug such formulas. The MDX debugger inside MDX Studio is not capable of dealing with such expressions. I actually spent quite a time thinking about how to build decent environment for debugging MDX, and after few failed starts I had what I consider as breakthrough idea. Early experiments were very encouraging, and if this idea works, I will blog soon about it. It radically different from all other MDX debugging attempts.)

    It is clear that performance of such solution is horrible. For every element in the set, we go again and again over all elements in the set just to find out how many of them have same value. The algorithmic complexity of this nested loop approach is O(n^2). Indeed the query above executes for 12 seconds. It is possible to slightly improve it, by forcing caching the results of VBA!Left through

    with 
      member Prefix as VBA!Left([Product].[Product].CurrentMember.Name,1)
      member measures.CountDistinctLetters as 
       Sum([Product].[Product].[Product].MEMBERS AS AllProducts,
           1/Filter([Product].[Product].[Product].MEMBERS, 
              (AllProducts.Current,Prefix) = Prefix
             ).Count)
    
    select CountDistinctLetters on 0
    from [Adventure Works]

    and reducing time to 11 seconds, but obviously, different, more sane approach is needed here.

    Another idea is instead of doing nested-loop algorithm, sort the elements of the set, and then scan sorted set once counting number of different elements. The only technical difficulty with that would be - while scaning sorted set, we need to compare current element with the previous one. Getting current element is easy, but how to get previous one ? There is no "Previous" function in MDX, there is only PrevMember, but it works on hierarchical ordering of the level, and here we have arbitrary set sorted by arbitrary criteria. The solution is to use CurrentOrdinal function, which returns the index of the tuple being iterated in the set, and then using CurrentOrdinal-1