THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

How to relate tables in DAX without using relationships

PowerPivot supports only one type of relationship between two tables, which is the one-to-many relationship. You can define that a column in a table (the “many” side) corresponds to a lookup table through a column which is an identity column there (the “one” side). DAX provides support to follow this relationship through functions such as RELATED and RELATEDTABLE.

As I already wrote in this blog, many-to-many relationships are not directly supported by DAX and we can work-around that by writing more or less complex DAX expressions. However, in that case the many-to-many relationship was still partially supported by two existing one-to-many relationships. Sometimes there cases where the relationship would require the creation of new tables that we do not have and it might be useful to implement the same concept of relationship by using only DAX and nothing else. The scenario that I will show will be the cost simulation by using several couriers. But before that, let’s learn how to use DAX to get the same RELATED function behavior without having an underlying relationship in the model.

IMPORTANT NOTE: in a following post we will see how we can optimize performance of the shipment cost simulation sample with a model based on relationships between tables. However, the sample in this post can be easier to implement for an end user. With medium data sets (below million of rows) performance are still good.

Simulate a table relationship in DAX

We have an Orders table and for each Order there is a Country where the order will be shipped. The Weight column is the parameter necessary to calculate the shipment cost (it could be the weight in kg, for example).

OrderNumber

Country

Weight

SO43697

Canada

35.50

SO43701

Australia

52.00

Then we have a PriceList table with shipment cost for each Country.

Country

Cost

Australia

3.50

Canada

0.50

France

2.50

Germany

2.60

Italy

2.70

USA

0.40

If we had the relationship between these two tables, the calculated column for each order would be:

'Orders'[ShipmentCost] = 'Orders'[Weight] * RELATED('PriceList'[Cost] )

If instead than a calculated column we would like defining a calculated measure, it will be:

'Orders'[ShipmentCost] = SUMX( 'Orders', 'Orders' [Weight] * RELATED( 'PriceList'[Cost] ) )

But what if we don’t have a relationship? Imagine we don’t have a RELATED function. Well, the idea is that we have to filter the row containing the same destination country of the order we are calculating. So, we’d like to replace the expression

RELATED( 'PriceList'[Cost] )

with something like

CALCULATE(
        VALUES( 'PriceList'[Cost] ),
        'PriceList'[Country] = 'Orders'[Country] )

We wrote VALUES( 'PriceList'[Cost] ) because we expect a single value will be returned and VALUES avoid the syntax error that we would have got writing only 'PriceList'[Cost] in the first parameter of CALCULATE. However, the whole expression is not going to work. This is the error:

The expression contains multiple columns, but only a single column can be used in a Boolean expression that is used as a table filter expression.

I have to say that this error is not very clear. The problem is that 'PriceList'[Country] = 'Orders'[Country] does not give an indication of what is the table that we want to filter (remember that a filter expression in CALCULATE always has its effects on a table, see this previous post). What we want to do is to filter rows of the PriceList table, so that our VALUES function will return only the row of the country we have in our Order. We can use a FILTER function instead, returning only the rows from PriceList table which satisfy the condition:

CALCULATE(
        VALUES( 'PriceList'[Cost] ),
        FILTER( 'PriceList', 
                'PriceList'[Country] = 'Orders'[Country] 
        )
)

Replacing the original RELATED call with the CALCULATE expression above, we can define the calculated measure ShipmentCost in this way (in a packed form):

'Orders'[ShipmentCost] = SUMX( 'Orders', 'Orders' [Weight] * 
        CALCULATE( VALUES( 'PriceList'[Cost] ),
                   FILTER( 'PriceList', 
                           'PriceList'[Country] = 'Orders'[Country] ) ) )

The resulting PivotTable will show this calculation.

Row Labels

Sum of Weight

ShipmentCost

SO43697

35.5

17.75

Canada

35.5

17.75

SO43701

52

182

Australia

52

182

Grand Total

87.5

199.75

Table relationship through multiple columns in DAX

Now that we know how to simulate the relationship in DAX, we can also use multiple columns for defining a relationship. For example, imagine we have Country and Zone columns in the Orders table.

OrderNumber

Country

Zone

Weight

SO43697

Canada

1

35.50

SO43701

Australia

2

52.00

Our PriceList now contains different prices for each Country and Zone.

Country

Zone

Cost

Australia

1

2.90

Australia

2

3.20

Australia

3

3.50

Canada

1

0.50

Canada

2

0.55

Canada

3

0.60

We can add another condition to the FILTER on PriceList table, using two boolean expression in an AND condition: we want to have the same Country and the same Zone. The following formula makes the thing working:

'Orders'[ShipmentCost] = SUMX( 'Orders', 'Orders' [Weight] *
        CALCULATE( VALUES( 'PriceList'[Cost] ),
                   FILTER( 'PriceList', 
                           'PriceList'[Country] = 'Orders'[Country]
                           &&
                           'PriceList'[Zone] = 'Orders'[Zone] ) ) )

The resulting PivotTable will show the calculation of ShipmentCost for each order considering the price based on Country and Zone.

Row Labels

Sum of Weight

ShipmentCost

SO43697

35.5

17.75

Canada

35.5

17.75

1

35.5

17.75

SO43701

52

166.4

Australia

52

166.4

2

52

166.4

Grand Total

87.5

184.15

Shipment cost simulation for different Couriers

At this point we want to simulate the cost using different Couriers. Using the same Orders table we have seen before, we add a Courier column to the PriceList table.

Courier

Country

Zone

Cost

Blu Express

Australia

1

2.90

Blu Express

Australia

2

3.20

Blu Express

Australia

3

3.50

Blu Express

Canada

1

0.50

Blu Express

Canada

2

0.55

Blu Express

Canada

3

0.60

SpeedyMail

Australia

1

3.00

SpeedyMail

Australia

2

3.15

SpeedyMail

Australia

3

3.45

SpeedyMail

Canada

1

0.45

SpeedyMail

Canada

2

0.50

SpeedyMail

Canada

3

0.55

Now have to use the Courier attribute in the resulting PivotTable, otherwise there will be an error in the ShipmentCost calculation because there are more prices for each Country/Zone combination.

Row Labels

Sum of Weight

ShipmentCost

Blu Express

87.5

184.15

SO43697

35.5

17.75

SO43701

52

166.4

SpeedyMail

87.5

179.775

SO43697

35.5

15.975

SO43701

52

163.8

Grand Total

87.5

#VALUE!

Having an error for the Grand Total row is probably right. We want to compare different Couriers, simulating a complete calculation for each order using different price lists. Looking at aggregated data, without the order granularity, we will find the most convenient Courier. Data can also diced by Country if we can make local agreements. However, we might want to have a reference value for the Grand Total row and, in general, a valid value for ShipmentCost even when a Courier is not selected. In these cases, we can replace the VALUES( 'PriceList'[Cost] ) expression in the CALCULATE statement with AVERAGE( 'PriceList'[Cost] ), which will use the average price of the selected Couriers (we can always use slicers) in the final total row.

Row Labels

Sum of Weight

ShipmentCost

Blu Express

87.5

184.15

SO43697

35.5

17.75

SO43701

52

166.4

SpeedyMail

87.5

179.775

SO43697

35.5

15.975

SO43701

52

163.8

Grand Total

87.5

181.9625

Final considerations

The scenarios we have described are useful only when the use of relationships is not possible. This can be the case of a particular complex condition to relate two tables, or simply it could be hard or time wasting to generate the necessary intermediate tables to define the relationships between tables containing real data.

From a performance point of view, it is always better to define a model that solve the problem leveraging relationships, using DAX just to make calculation but not to filter data. Using relationships, filtering is implicit in DAX formulas and this helps the engine to produce results faster. However, having a complete toolset to solve real business problems helps you to choose the fastest path to create a working PowerPivot model.

UPDATE: this post seems to qualify for T-SQL Tuesday post!

Published Tuesday, February 09, 2010 4:44 PM by Marco Russo (SQLBI)
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Adam Machanic said:

Maybe you should tag this as a T-SQL Tuesday post? It's certainly on topic...

http://msmvps.com/blogs/robfarley/archive/2010/02/02/invitation-for-t-sql-tuesday-003-relationships.aspx

February 9, 2010 10:41 AM
 

Marco Russo (SQLBI) said:

Good idea - I updated the post! :)

February 9, 2010 10:53 AM
 

Rob Farley said:

(Reposted from my msmvps.com blog ) Lots of blog posts for this month, for the first T-SQL Tuesday to

February 20, 2010 1:32 AM
 

Paul J said:

Brilliant. Just what I was looking for.

November 4, 2011 4:40 PM
 

S said:

Is there a way of joining a table on itself using the RELATED function without importing the same table twice?

December 5, 2011 9:16 AM
 

Marco Russo (SQLBI) said:

Not in current version (Denali/SQL2012), unfortunately

December 5, 2011 1:07 PM
 

nikhil said:

Hi Marco,

I am trying to pass multiple parameter values in my DAX query dataset but not able to do so. Can you help me with that please?

March 5, 2012 2:33 PM
 

nikhil said:

Hi Marco,

I am trying to pass multiple parameter values in my DAX query dataset to generate a report, but not able to do so. Can you help me with that please?

Thanks

Nikhil

March 5, 2012 2:34 PM
 

Marco Russo (SQLBI) said:

March 5, 2012 3:27 PM
 

Karla said:

Hi Marco,

I have a question.. I have two tables... with the same  headers... and I want to add the second to the firt... is it posible?

TABLE 1.

EVALUATE(SUMMARIZE(CALCULATETABLE('FACT_T'

               ,'Date'[Date]>=DATEVALUE("2012-01-01")

               ,'Date'[Date]<=DATEVALUE("2012-09-26")

,'Agent'[Agent Name]<>"")

,'Agent'[Agent Name]

,'Date'[Date]

,'Time'[Time]

,"Metrics", CALCULATE(IF('Metrics'[Available Time]>0, "Available Time")

     ,'Team'[Team Name] = "TEAM1")

,"Agent Time",(CALCULATE('Metrics'[Available Time]

     ,'Agent'[AgentId] = 132669))/10

,"Team Time",(CALCULATE('Metrics'[Available Time]

    ,'Team'[Team Name] = "TEAM1"))/10

))

TABLA 2.

EVALUATE(SUMMARIZE(CALCULATETABLE('FACT_T'

               ,'Date'[Date]>=DATEVALUE("2012-01-01")

               ,'Date'[Date]<=DATEVALUE("2012-09-26")

,'Agent'[Agent Name]<>"")

,'Agent'[Agent Name]

,'Date'[Date]

,'Time'[Time]

,"Metrics", CALCULATE(IF('Metrics'[Unavailable Time]>0, "Unavailable Time")

     ,'Team'[Team Name] = "TEAM1")  

,"Agent Time",(CALCULATE('Metrics'[Unavailable Time]

     ,'Agent'[AgentId] = 132669))/10

,"Team Time",(CALCULATE('Metrics'[Unavailable Time]

    ,'Team'[Team Name] = "TEAM1"))/10

))

November 19, 2013 10:08 AM
 

Marco Russo (SQLBI) said:

Karla,

you cannot do the UNION of two tables. However, I don't understand in this case how do you discriminate the rows with Available and Unavailable measuores. Wouldn't be more clear by using different columns in the same query? Or adding a column with a different value for available and unavailable measures?

In the latter case, you should create the rows (i.e. with CROSSJOIN between SUMMARIZE and a ALL ( table[column] ) of a table containing the two values) and then with ADDCOLUMNS you add the columns you need, using an IF statement to choose between Available and Unavailable measures.

November 19, 2013 5:15 PM
 

Lawrence said:

Hi Marco,

Outstanding post! It is on my bookmark. :-)

I am studying DAX queries and I am having a hard time to join two many to many tables (http://www.sqlbi.com/articles/optimize-many-to-many-calculation-in-dax-with-summarize-and-cross-table-filtering) in the same query. DAX works perfectly when we have just a single many to many. Account Team Role is the bridge table.

EVALUATE(

SUMMARIZE(

'Account Team Role'

, [Account Id]

, 'Accounts'[Account Name]

, 'Account Team'[Name]

)

)

However, when I need two of them together, I have trouble. I was able to do that using the following query but that is REALY slow. :-(

EVALUATE(

FILTER(

CROSSJOIN(

SUMMARIZE(

'Account Team Role'

, [Account Id]

, 'Accounts'[Account Name]

, 'Account Team'[Name]

)

,

SUMMARIZE(

'Account x Account Territory'

, [Account Id]

, 'Account Territory'[Account Territory Name]

)

)

, 'Account Team Role'[Account Id] = 'Account x Account Territory'[Account Id]

)

)

I think it is slow because it has to crossjoin everything and filter later. Is not there a smart way to get that info? I can do that easily using MDX.

SELECT NON EMPTY { [Measures].[# of Accounts] } ON COLUMNS

, NON EMPTY { (

[Accounts].[Account ID].[Account ID].ALLMEMBERS

* [Accounts].[Account Name].[Account Name].ALLMEMBERS

* [Account Team].[Name].[Name].ALLMEMBERS

* [Account Territory].[Account Territory Name].[Account Territory Name].ALLMEMBERS

) } ON ROWS

FROM [Opportunity]

Could you please give some directions or tips to I continue to research?

January 5, 2014 7:10 AM
 

Marco Russo (SQLBI) said:

Lawrence,

this is an interesting case.

If you have an Accounts[Account Id] column that is a primary key for Accounts table, then you probably have a M2M relationship between Accounts and Team Role using 'Account Team Role' and another M2M relationship between Accounts and Territories using 'Account x Account Territory'.

If this is the case, you might try to write the query in this way:

EVALUATE

SUMMARIZE (

   FILTER (

       CROSSJOIN (

           'Account Team Role',

           'Account x Account Territory'

       ),

       'Account Team Role'[Account Id] = 'Account x Account Territory'[Account Id]

   )

   [Account Id],

   'Accounts'[Account Name],

   'Account Team'[Name],

   'Account Territory'[Account Territory Name]

)

Let me know if this works and if is faster than the other.

However, if your MDX query works, I'm wondering about the data model, probably it's not the same.

I'd like to remove the FILTER at all from the DAX query, but I'd like to know more about the data model.

Marco

January 5, 2014 12:57 PM
 

Marco Russo (SQLBI) said:

Lawrend,

if you have time, please try also this version and send me (marco at SQLBI dot com) the profiler trace. I think this second query could be faster because it should scale better on many cores, but I don't have enough data in order to get significative numbers (I hope you have bigger tables!).

EVALUATE

SUMMARIZE (

   GENERATE (

       Accounts,

       GENERATE (

           RELATEDTABLE ( 'Account Team Role' ),

           RELATEDTABLE ( 'Account x Account Territory' )

       )

   ),

  [Account Id],

  'Accounts'[Account Name],

  'Account Team'[Name],

  'Account Territory'[Account Territory Name]

)

January 20, 2014 11:36 AM
 

Stuart said:

Hi Marco

We're jumping in the deep end in applying tabular models to our corporate reporting and struggling immensely with the application of filters. This article is by far the closest example we’ve found of what we need to achieve, but with one minor change.

Imagine the ‘Orders’ table with ‘OrderNumber’, ‘Country’, and ‘Weight’ fields, and the ‘PriceList’ table with ‘Country’, ‘Cost’, and an extra field of ‘Rate’. Imagine now that we are doing a ‘what-if’ scenario to see what happens if we apply new rates (rate group ‘1’, rate group ‘2’, etc.), and we want to produce a PowerPivot where we can filter on ‘Rate’ and see what the impact will be.

How do we need to change this part of the formula (because everything we try is failing to change when we change the filter on ‘rate’):

=

CALCULATE (

   VALUES ( 'PriceList'[Cost] ),

   FILTER (

       'PriceList',

       'PriceList'[Country] = 'Orders'[Country]

   )

)

Thanks

January 24, 2014 9:54 AM
 

Marco Russo (SQLBI) said:

Stuart,

you can write a *measure* (not a calculated column) such as:

=

SUMX (

   VALUES ( Order[Country] ),

   CALCULATE (  

       IF (

           HASONEVALUE ( PriceList[Cost] ),

           VALUES ( PriceList[Cost] )

       ),

       PriceList[Country] = VALUES ( Order[Country] )

   )

)

In our PowerPivot Workshop Full (http://www.powerpivotworkshop.com/) we have an exercise about simulation that is very similar to your scenario.

Marco

January 24, 2014 10:42 AM
 

Stuart said:

Thanks for the response Marco - both really helpful (in giving an approach and limitations) and disappointing at the same time (because only being able to apply it to a measure will defeat our aim).

We're looking to find the latest event in a user-selected date range and apply measures to it (giving another fake example for simplicity; one client, multiple sales - we want to return information on the latest sales figures for a client within a date range). The problem is that we need to apply the same filter across multiple tables (in a complex model of over 30 fact and dimension tables - though many of them are date dimensions). Though we could do this by getting the end user to apply the same date range to each table in turn, this feels like a very cumbersome approach.

We were hoping that we could add two unjoined tables to the model called ReportStartDate and ReportEndDate. We then apply filters to these two tables only when running a report. These two values are then used in calculated columns to return the appropriate surrogate key of the dimension associated with that date, and a join between this calculated column and the dimension is used to return the full details of the record we need.

As you can see, this only works if we can apply a filter to the unjoined table and use that in a calculated column, which your response implies may not be possible.

I'm aware that this question is possibly taking us somewhat closer to consultancy than a quick query, but how would you suggest we might approach finding a solution to this? For example, is a similar scenario covered in any of your books, blog posts, etc.?

Thanks

January 26, 2014 7:12 AM
 

Marco Russo (SQLBI) said:

Stuart,

you have to realize that calculated columns are calculated at "refresh" time, so there is no way you can apply any filter (slicer or similar) to a calculated column.

I see two possible approaches:

1) Use a measure in tables created on-the-fly (see ADDCOLUMNS and similar functions), but the resulting DAX could be very complex and potentially slow.

2) Use linked tables (it only works in PowerPivot) and push the filter as something you write in the linked table in Excel. I used this approach for many simulation models. User has to adapt to it, but refresh time is reasonable.

Both examples are in books we published and in articles on the web, but at the same time I think they are not taiolred for your specific needs, just demonstration of the technology you might use for your purposes.

The of course you can look at consulting services (http://www.sqlbi.com/consulting/remote) if you need a proof of concept for your specific case.

Thanks,

Marco

January 28, 2014 10:54 AM
 

Dev said:

Hi Marco,

I have two dimensions and one fact table in my model.

there is relation between dimension(lets say DIM) and fact table  ..and also relation is there between calendar dimension & fact table but there is no relation between DIM & calendar dimension.

NOW,

I need to calculate sum(days field from a dimension lets say DIM) in fact table

Measure in fact table:=CALCULATE(SUM(Dim[Days]))

This is not giving me the correct values,Can you help with this please?

April 17, 2014 1:25 AM
 

Marco Russo (SQLBI) said:

I am not sure about your question, but if you want to sum the [Days] value according to the selection made in other dimensions, you should use the many-to-many pattern in this way:

Measure in fact table:=CALCULATE ( SUM( Dim[Days] ), FactTable )

Look for more details in this paper: http://www.sqlbi.com/articles/many2many

April 17, 2014 1:33 AM
 

Dev said:

That worked, Thanks.

April 17, 2014 2:28 AM

Leave a Comment

(required) 
(required) 
Submit

About Marco Russo (SQLBI)

Marco Russo is a consultant, writer and trainer specialized in Business Intelligence with Microsoft technologies. He runs the SQLBI.COM website, which is dedicated to distribute resources useful for BI developers, like Integration Services components, Analysis Services models, tools, technical information and so on. Marco is certified as MCT, MCDBA, MCSD.NET, MCSA, MCSE+I.

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement