THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Alberto Ferrari

PowerPivot, Parent/Child and Unary Operators

Following my last post about parent/child hierarchies in PowerPivot, I worked a bit more to implement a very useful feature of Parent/Child hierarchies in SSAS which is obviously missing in PowerPivot, i.e. unary operators. A unary operator is simply the aggregation function that needs to be used to aggregate values of children over their parent.

Unary operators are very useful in accountings where you might have incomes and expenses in the same hierarchy and, at the total level, you want to subtract expenses from incomes, in order to get a meaningful total. Clearly, you might obtain a good result adding a sign in front of the values in the fact table but, doing so, you miss the opportunity to create more than one hierarchy over the same data where values might get different signs in different hierarchies. For example, I might have a hierarchy where expenses are subtracted from the total and another one, with a different structure, where the expenses need to be summed with other values. Thus, I suppose that signing the value in the fact table is not a viable solution and I want to put the signs into the hierarchy.

Implementing unary operators with the four mathematical operations is not easy at all. It can be done, but with a quite big effort and some severe limitations that I am not going to describe here. Moreover, in my personal experience I have never used neither multiplication nor division in any hierarchy, I have always used sum and subtraction only. Thus, I am going to focus on these two basic operations only, leaving the four operations to a next post, if somebody asks me to. Sorriso

Let us start with the table structure for the demo. We have an Accounts table, which holds the hierarchy, and a Transactions table with the transactions. All amounts in the transactions table are positive numbers.

CREATE TABLE Accounts (
    AccountId         INT, 
    Account           VARCHAR (100),
    ParentAccountId   INT,
    Aggregator        VARCHAR(1)
)

CREATE TABLE Transactions (
    AccountId    INT,
    Description  VARCHAR (100),
    Amount       MONEY
)

Having the two tables, we can now fill them with some data:

INSERT INTO Accounts (AccountId, Account, ParentAccountId, Aggregator)
VALUES  
        ( 1, 'Final Total',       NULL, ' '), -- Grand Total
        ( 2, 'Revenues',             1, '+'), --   ADD Total Revenues to Grand Total
        ( 3, 'Courses Provided',     2, '+'), --     SUM Courses to Revenues
        ( 4, 'Instructors paid',     3, '-'), --       SUBTRACT paid instructors from Courses
        ( 5, 'Cathering',            3, '-'), --     SUBTRACT cathering from Courses
        ( 6, 'Attendees rates',      3, '+'), --       SUM attendee rates to Courses
        ( 7, 'Consultancy',          2, '+'), --     SUM Consultancy to Revenues
        ( 8, 'Expenses',             1, '-'), --   SUBTRACT Expenses to Total
        ( 9, 'Travel',               8, '+'), --     SUM travel to Expenses
        (10, 'Travel refund',        9, '-'), --       SUBTRACT travels refund to Travels
        (11, 'Travels paid',         9, '+'), --       SUM travels paid to Expenses
        (12, 'Courses Taken',        8, '+'); --     SUM courses taken to Expenses

INSERT INTO Transactions VALUES 
    ( 4, 'Marco Russo',         200),
    ( 4, 'Rob Collie',          500),
    ( 5, 'Sandwiches',          300),
    ( 6, 'Course in Seattle',   800),
    ( 6, 'Course in Boston',   1200),
    ( 7, 'Work in Microsoft',   400),
    ( 7, 'Work in Apple',       500),
    ( 7, 'Work in SQL Blog',    300),
    (10, 'Travel to Seattle',    80),
    (11, 'Travel to Boston',    150),
    (11, 'Travel to London',    190),
    (12, 'SSAS Maestros',      1000);

Take a look at the Accounts table where I provided some descriptions for the operations. You can see, for example, that account number 4 (Instructors Paid), even if it belongs to Revenues, needs to be subtracted from there because it is, in reality, an expense. The same applies for account number 10 (Travel Refund) which belongs to an expense (Travel) but must be subtracted from there. Moreover, since Expenses needs to be subtracted from the grand total, it happens that Travel Refund needs to summed to the grand total and subtracted from expenses. As you can see, an account can change its sign during the hierarchy navigation, which is a perfectly normal and desired behavior. Nevertheless, we need a way to handle this.

You should already know, from my previous post, that the hierarchy need to be flattened in order to work with PowerPivot. I am not going to bore you again with the flattening technique, here I am focusing on the unary operators only.

The idea is that we need to compute the sign that each account should have at the different levels of the hierarchy. Let us take, as an example, the account number 10: Travel Refund. It is a leaf node in the hierarchy at level 4. Let us look at how we should aggregate its value at the various levels:

  • Level 4 (itself) a minus to show that its value will be subtracted from Travel
  • Level 3 (Travel) a minus since its value will be subtracted from Travel
  • At level (Expenses) it its sign becomes a plus, since is is aggregated into expenses which value is then subtracted from the grand total. Having traversed two minus, its sign becomes now a plus.
  • At level 1 (Final Total) it remains a plus since the grand total does not change the sign of the last aggregation (which was Expenses)

You see that at each different level of the hierarchy an account might be summed or subtracted, depending on the number of minus signs that have been encountered during the “path” from the leaf node to the final total. With this basic idea in mind, it is now clear that we need to compute the “sign path” of each node and, from there, compute the sign of the account at the different levels of the hierarchy.

This is easily accomplished with the next query:

WITH AggPaths as (
    SELECT 
        AccountID   = A.accountID, 
        Account     = CAST (A.Account AS VARCHAR (100)),
        NodeDepth   = 1,
        AggPath     = CAST (a.aggregator AS VARCHAR (100)) 
    FROM 
        Accounts a where ParentAccountId IS NULL
    UNION ALL
    SELECT 
        AccountID   = Children.accountID, 
        Account     = CAST (REPLICATE (' ', Parent.NodeDepth * 4) + Children.Aggregator + ' ' + Children.Account AS VARCHAR (100)),
        NodeDepth   = Parent.NodeDepth + 1,
        AggPath     = CAST (Parent.AggPath + Children.aggregator AS VARCHAR (100)) 
    FROM Accounts Children
        INNER JOIN AggPaths Parent on Children.ParentAccountId = Parent.AccountID)
SELECT 
    AggPaths.*, 
    SignAtLevel1 = CASE WHEN NodeDepth < 1 THEN NULL WHEN dbo.COUNTMINUS (SUBSTRING (AggPath, 1, 4)) % 2 = 0 THEN 1 ELSE -1 END,
    SignAtLevel2 = CASE WHEN NodeDepth < 2 THEN NULL WHEN dbo.COUNTMINUS (SUBSTRING (AggPath, 2, 4)) % 2 = 0 THEN 1 ELSE -1 END,
    SignAtLevel3 = CASE WHEN NodeDepth < 3 THEN NULL WHEN dbo.COUNTMINUS (SUBSTRING (AggPath, 3, 4)) % 2 = 0 THEN 1 ELSE -1 END,
    SignAtLevel4 = CASE WHEN NodeDepth < 4 THEN NULL WHEN dbo.COUNTMINUS (SUBSTRING (AggPath, 4, 4)) % 2 = 0 THEN 1 ELSE -1 END
FROM AggPaths
ORDER BY AccountID

I have used a function, CountMinus, which counts the number of minus in a string and whose definition is simply:

CREATE FUNCTION CountMinus (@Param VARCHAR (100)) 
RETURNS INT AS
BEGIN
    RETURN LEN (@Param) - LEN (REPLACE (@Param, '-', ''));
END

The result of the query is this:

image

You can see that the AggPath column contains the “sign path” of each account and the various SignAtLevel contain the sign that this account should have when aggregated at the various levels. At the root level I added a star to avoid having a blank, which would return incorrect values for SUBSTRING.

Using the query and the original table, JOINED together, you can easily load the data inside PowerPivot and start some analysis. Now, to show the values in a PivotTable, you can decide two different methods:

  • Use, at each level, the sign that the value should have when aggregated with its parent
  • Always use the sign that the node will have when shown at level 1

The difference between the two visualizations is evident in the next figure:

image

In “Value” I used the first technique. You can see that “Travel refund” has a minus sign, indicating that it will be subtracted from expenses. The same account, on the other measure, has a plus sign, indicating that the value will be summed to the final total. Both the information are correct, it all depends on how the user is used to look at numbers. I personally prefer the visualization of “FinalValue”, since my brain recognizes it as correct, while I refuse to understand the other one. Nevertheless, it is a matter of taste, I have met plenty of accounting managers who strongly prefer the other visualization.

Now, how do we compute the two formulas? It is pretty easy, indeed. It is enough to note that, in the filter context of each cell, we are browsing the hierarchy at a defined and well known level. We simply need to sum all the amounts with a plus (in that level) and subtract all the amounts with a minus (again, in that level). Thus, the formula for “FinalValue” is:

=IF ([ShowRow] && [CurrentLevel] > 0,
        CALCULATE (SUM (Transactions[Amount]), Accounts[SignAtLevel1] = +1) 
      - CALCULATE (SUM (Transactions[Amount]), Accounts[SignAtLevel1] = -1),
    BLANK ()
)

Where I make use of a couple of measures [ShowRow] and [CurrentLevel] which return true/false depending on whether the row should be visible or not and [CurrentLevel] which computes the current level of visualization. See my previous post to learn how they can be easily defined. Apart from those two measures, the formula is simply a subtraction of two CALCULATE which impose the correct filtering on the accounts, separating them into two distinct groups: the ones that need to be summed and the ones that need to be subtracted. It might be worth noting that the CALCULATE operates in a pre-existent filter context where all and only the children of the current node are visible and the new filter added to the context does not interfere with that selection, since it operates on a different column.

The formula for “Value” is very similar, just a bit tedious to write:

=IF ([ShowRow],
    IF ([CurrentLevel] = 4, 
        CALCULATE (SUM (Transactions[Amount]), Accounts[SignAtLevel4] = +1) 
      - CALCULATE (SUM (Transactions[Amount]), Accounts[SignAtLevel4] = -1),
    IF ([CurrentLevel] = 3, 
        CALCULATE (SUM (Transactions[Amount]), Accounts[SignAtLevel3] = +1) 
      - CALCULATE (SUM (Transactions[Amount]), Accounts[SignAtLevel3] = -1),
    IF ([CurrentLevel] = 2, 
        CALCULATE (SUM (Transactions[Amount]), Accounts[SignAtLevel2] = +1) 
      - CALCULATE (SUM (Transactions[Amount]), Accounts[SignAtLevel2] = -1),
    IF ([CurrentLevel] = 1, 
        CALCULATE (SUM (Transactions[Amount]), Accounts[SignAtLevel1] = +1) 
      - CALCULATE (SUM (Transactions[Amount]), Accounts[SignAtLevel1] = -1),
    BLANK ()
)))))

Here we perform the same computation but, instead of always using SignAtLevel1, we use the correct sign for each level, resulting in different signs at different levels. As I said before, it is a matter of personal taste.

Please note that I preferred to use CALCULATE with a filter context for +/- where I could have used a more intuitive SUMX over the accounts, multiplying the resulting value for the correct sign, for performance reasons. In a big hierarchy with a big fact table SUM with CALCULATE should perform much better than SUMX, requiring less iterations over the fact table to gather the final result.

This very same technique can be used to compute a hierarchy with multiplications and divisions only. It cannot, however, handle the four operations together because, in that case, we will need to follow precedence order during computations, something that we can ignore with any set of operators that obey to the commutative law, as we are doing here.

The biggest difference between unary operator as defined in this post and the same in SSAS is that in SSAS the aggregation over the hierarchy with unary operators is always applied to the final result. Thus, once you define this hierarchy in SSAS and ask for the total amount, the result is computed accordingly to the hierarchy, while in PowerPivot this is not the case: the result is computed with the signs only when the hierarchy is displayed, otherwise SUM is used. I personally prefer the PowerPivot method since, if you have more than one hierarchy with unary operators, SSAS becomes terribly slow, due to the need to perform very complex computations. In PowerPivot you are free to define as many hierarchies as you want and always pay the CPU price to aggregate over one of them.

Clearly, if you want to handle more than one hierarchy, it is strongly advisable to use a separate table for each hierarchy, otherwise the formulas would become a real pain to write if they need to take care of more than one hierarchy in the same table.

This is just another example of what PowerPivot can do for your reporting needs. If you want to touch the real Karma of PowerPivot, don’t miss the opportunity to follow one of the workshops I and Marco Russo are bringing all over Europe in the next few months.

You can find more info on www.powerpivotworkshop.com. Hope to see you there!

Published Tuesday, March 01, 2011 9:00 AM by AlbertoFerrari
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

merry said:

What do you mean with a "ShowRow" ?

December 22, 2012 7:50 PM
 

Allan said:

This blog is very useful...Thanks,could you explain how to write [ShowRow] and [CurrentLevel] in DAX?

March 30, 2013 10:49 AM
 

Alejandro said:

Thanks for your great posts! I really enjoyed following the examples and learning on the way. I tried also to figure out if the unary operator part could be made a bit simpler. My try is at http://tomasamot.wordpress.com/2013/04/09/powerpivot-parentchild-and-unary-operators-a-possible-variation/

thanks again!

April 9, 2013 9:25 PM
 

Luis Simoes said:

What if you just want the value of the currentmember and not a result operation over it’s children?

Basically it’s the representation of having the full hierarchy using ~ unary operator.

How do you achieve that?

September 23, 2013 10:25 AM
 

Cynthia said:

Alberto, thanks for this great post. I was able to follow this and create rollups for Accounts dimension.

But now I have other dimensions like Products, Scenario and others that have its own levels and unary operations. How can I handle that?

With the formula for FinalValue you explained, i was able to get the values for Accounts. But for others for eg, if i need to add Scenario into the picture would this be right

=IF ([ShowRow] && [CurrentLevel] > 0,

       CALCULATE (SUM (Transactions[Amount]), Accounts[SignAtLevel1] = +1, Scenario[SignAtLevel1]=+1)

     - CALCULATE (SUM (Transactions[Amount]), Accounts[SignAtLevel1] = -1, Scenario[SignAtLevel1]=-1),

   BLANK ()

)

IF you get time, please let me know

October 10, 2013 12:41 PM
 

Abc123 said:

Similar (maybe) question to the last poster. I'm finding the result of my calculation is constantly a zero as the a - b parts of the formula result in the same number.

I've narrowed the problem down to this. For some reason I get the same number no matter what number I insert in place of "= -1)":

=IF ([CurrentLevel] = 1,

       CALCULATE(SUM([Amount])/1000000,'tblProductRollup', Accounts[SignAtLvl1] = -1) ,0)

Any ideas why this is the case?

December 30, 2013 11:19 AM
 

AlbertoFerrari said:

@Cynthia, @Abc... it is nearly impossible to answer you without looking at the complete model. Unfortunately, comments here are not very good to provide consultancy...

December 30, 2013 12:30 PM
 

Arti said:

Hi Alberto! May we get a helping hand from your side?

The main task we stack with is to group numbers by some certain level of hierarchy, by immediate children of a node that user picks up in SSRS-filter.

For example: http://blogs.microsoft.co.il/photos/barbaro/images/128411/364x423.aspx

- if user chooses “09-311”, then there should be returned two records “96-311” and “96-312” and aggregated values for them _collected from all their descendants up to leafs_. In MDX this worked automagically, but in DAX it’s some sort of pain (Is it possible to query hierarchies? How does Excel do it?). It’s necessary first to define the list of members we need to put into the final set and then run through subtrees aggregating values.

The next query works 29 seconds:

EVALUATE

CALCULATETABLE(

SUMMARIZE(

FILTER('GL Account', 'GL Account'[Derived GL Account Code]="100092-201411")

,'GL Account'[Derived GL Account Code]

,"Budget YTD"

,CALCULATE([Budget $ (YTD)], ALL('GL Account'), PATHCONTAINS('GL Account'[Path], EARLIER('GL Account'[Derived GL Account Code])))

,"Counter"

,CALCULATE(countrows('GL Account'), ALL('GL Account'), PATHCONTAINS('GL Account'[Path], EARLIER('GL Account'[Derived GL Account Code])))

)

)

It returns 1 record, field “Counter” shows that it runs across 196 records of the “GL Account” table (which contains 6,5 millions records). Adding another calculation adds 30-40 seconds to it’s execution.

But if to put the GL Account Derived Code directly into subqueries then the script work optimised way - only 7 seconds:

EVALUATE

CALCULATETABLE(

SUMMARIZE(

FILTER('GL Account', 'GL Account'[Derived GL Account Code]="100092-201411")

,'GL Account'[Derived GL Account Code]

,"Budget YTD"

,CALCULATE([Budget $ (YTD)], ALL('GL Account'), PATHCONTAINS('GL Account'[Path], "100092-201411"))

,"Counter"

,CALCULATE(countrows('GL Account'), ALL('GL Account'), PATHCONTAINS('GL Account'[Path], "100092-201411"))

)

)

So the questions are the next: 1) Have you ever seen a sample realisation of hierarchy slicing at some particular node? (I didn’t find) 2) Is there any way to make the first script work as fast as the second?

Thank you in advance,

Arti, Sydney.

November 9, 2014 11:37 PM

Leave a Comment

(required) 
(required) 
Submit

About AlbertoFerrari

Alberto Ferrari is a Business Intelligence consultant. He his interests lie in two main areas: BI development lifecycle methodologies and performance tuning of ETL and SQL code. His main activities are with SSIS and SSAS for the banking, manufacturing and statistical sectors. He is also a speaker in international conferences like European PASS Conference and PASS Summit.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement