THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server

How to relate tables in DAX without using relationships

PowerPivot supports only one type of relationship between two tables, which is the one-to-many relationship. You can define that a column in a table (the “many” side) corresponds to a lookup table through a column which is an identity column there (the “one” side). DAX provides support to follow this relationship through functions such as RELATED and RELATEDTABLE.

As I already wrote in this blog, many-to-many relationships are not directly supported by DAX and we can work-around that by writing more or less complex DAX expressions. However, in that case the many-to-many relationship was still partially supported by two existing one-to-many relationships. Sometimes there cases where the relationship would require the creation of new tables that we do not have and it might be useful to implement the same concept of relationship by using only DAX and nothing else. The scenario that I will show will be the cost simulation by using several couriers. But before that, let’s learn how to use DAX to get the same RELATED function behavior without having an underlying relationship in the model.

IMPORTANT NOTE: in a following post we will see how we can optimize performance of the shipment cost simulation sample with a model based on relationships between tables. However, the sample in this post can be easier to implement for an end user. With medium data sets (below million of rows) performance are still good.

Simulate a table relationship in DAX

We have an Orders table and for each Order there is a Country where the order will be shipped. The Weight column is the parameter necessary to calculate the shipment cost (it could be the weight in kg, for example).

OrderNumber

Country

Weight

SO43697

Canada

35.50

SO43701

Australia

52.00

Then we have a PriceList table with shipment cost for each Country.

Country

Cost

Australia

3.50

Canada

0.50

France

2.50

Germany

2.60

Italy

2.70

USA

0.40

If we had the relationship between these two tables, the calculated column for each order would be:

'Orders'[ShipmentCost] = 'Orders'[Weight] * RELATED('PriceList'[Cost] )

If instead than a calculated column we would like defining a calculated measure, it will be:

'Orders'[ShipmentCost] = SUMX( 'Orders', 'Orders' [Weight] * RELATED( 'PriceList'[Cost] ) )

But what if we don’t have a relationship? Imagine we don’t have a RELATED function. Well, the idea is that we have to filter the row containing the same destination country of the order we are calculating. So, we’d like to replace the expression

RELATED( 'PriceList'[Cost] )

with something like

CALCULATE(
        VALUES( 'PriceList'[Cost] ),
        'PriceList'[Country] = 'Orders'[Country] )

We wrote VALUES( 'PriceList'[Cost] ) because we expect a single value will be returned and VALUES avoid the syntax error that we would have got writing only 'PriceList'[Cost] in the first parameter of CALCULATE. However, the whole expression is not going to work. This is the error:

The expression contains multiple columns, but only a single column can be used in a Boolean expression that is used as a table filter expression.

I have to say that this error is not very clear. The problem is that 'PriceList'[Country] = 'Orders'[Country] does not give an indication of what is the table that we want to filter (remember that a filter expression in CALCULATE always has its effects on a table, see this previous post). What we want to do is to filter rows of the PriceList table, so that our VALUES function will return only the row of the country we have in our Order. We can use a FILTER function instead, returning only the rows from PriceList table which satisfy the condition:

CALCULATE(
        VALUES( 'PriceList'[Cost] ),
        FILTER( 'PriceList', 
                'PriceList'[Country] = 'Orders'[Country] 
        )
)

Replacing the original RELATED call with the CALCULATE expression above, we can define the calculated measure ShipmentCost in this way (in a packed form):

'Orders'[ShipmentCost] = SUMX( 'Orders', 'Orders' [Weight] * 
        CALCULATE( VALUES( 'PriceList'[Cost] ),
                   FILTER( 'PriceList', 
                           'PriceList'[Country] = 'Orders'[Country] ) ) )

The resulting PivotTable will show this calculation.

Row Labels

Sum of Weight

ShipmentCost

SO43697

35.5

17.75

Canada

35.5

17.75

SO43701

52

182

Australia

52

182

Grand Total

87.5

199.75

Table relationship through multiple columns in DAX

Now that we know how to simulate the relationship in DAX, we can also use multiple columns for defining a relationship. For example, imagine we have Country and Zone columns in the Orders table.

OrderNumber

Country

Zone

Weight

SO43697

Canada

1

35.50

SO43701

Australia

2

52.00

Our PriceList now contains different prices for each Country and Zone.

Country

Zone

Cost

Australia

1

2.90

Australia

2

3.20

Australia

3

3.50

Canada

1

0.50

Canada

2

0.55

Canada

3

0.60

We can add another condition to the FILTER on PriceList table, using two boolean expression in an AND condition: we want to have the same Country and the same Zone. The following formula makes the thing working:

'Orders'[ShipmentCost] = SUMX( 'Orders', 'Orders' [Weight] *
        CALCULATE( VALUES( 'PriceList'[Cost] ),
                   FILTER( 'PriceList', 
                           'PriceList'[Country] = 'Orders'[Country]
                           &&
                           'PriceList'[Zone] = 'Orders'[Zone] ) ) )

The resulting PivotTable will show the calculation of ShipmentCost for each order considering the price based on Country and Zone.

Row Labels

Sum of Weight

ShipmentCost

SO43697

35.5

17.75

Canada

35.5

17.75

1

35.5

17.75

SO43701

52

166.4

Australia

52

166.4

2

52

166.4

Grand Total

87.5

184.15

Shipment cost simulation for different Couriers

At this point we want to simulate the cost using different Couriers. Using the same Orders table we have seen before, we add a Courier column to the PriceList table.

Courier

Country

Zone

Cost

Blu Express

Australia

1

2.90

Blu Express

Australia

2

3.20

Blu Express

Australia

3

3.50

Blu Express

Canada

1

0.50

Blu Express

Canada

2

0.55

Blu Express

Canada

3

0.60

SpeedyMail

Australia

1

3.00

SpeedyMail

Australia

2

3.15

SpeedyMail

Australia

3

3.45

SpeedyMail

Canada

1

0.45

SpeedyMail

Canada

2

0.50

SpeedyMail

Canada

3

0.55

Now have to use the Courier attribute in the resulting PivotTable, otherwise there will be an error in the ShipmentCost calculation because there are more prices for each Country/Zone combination.

Row Labels

Sum of Weight

ShipmentCost

Blu Express

87.5

184.15

SO43697

35.5

17.75

SO43701

52

166.4

SpeedyMail

87.5

179.775

SO43697

35.5

15.975

SO43701

52

163.8

Grand Total

87.5

#VALUE!

Having an error for the Grand Total row is probably right. We want to compare different Couriers, simulating a complete calculation for each order using different price lists. Looking at aggregated data, without the order granularity, we will find the most convenient Courier. Data can also diced by Country if we can make local agreements. However, we might want to have a reference value for the Grand Total row and, in general, a valid value for ShipmentCost even when a Courier is not selected. In these cases, we can replace the VALUES( 'PriceList'[Cost] ) expression in the CALCULATE statement with AVERAGE( 'PriceList'[Cost] ), which will use the average price of the selected Couriers (we can always use slicers) in the final total row.

Row Labels

Sum of Weight

ShipmentCost

Blu Express

87.5

184.15

SO43697

35.5

17.75

SO43701

52

166.4

SpeedyMail

87.5

179.775

SO43697

35.5

15.975

SO43701

52

163.8

Grand Total

87.5

181.9625

Final considerations

The scenarios we have described are useful only when the use of relationships is not possible. This can be the case of a particular complex condition to relate two tables, or simply it could be hard or time wasting to generate the necessary intermediate tables to define the relationships between tables containing real data.

From a performance point of view, it is always better to define a model that solve the problem leveraging relationships, using DAX just to make calculation but not to filter data. Using relationships, filtering is implicit in DAX formulas and this helps the engine to produce results faster. However, having a complete toolset to solve real business problems helps you to choose the fastest path to create a working PowerPivot model.

UPDATE: this post seems to qualify for T-SQL Tuesday post!


Published Tuesday, February 09, 2010 4:44 PM by Marco Russo (SQLBI)
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Adam Machanic said:

Maybe you should tag this as a T-SQL Tuesday post? It's certainly on topic...

http://msmvps.com/blogs/robfarley/archive/2010/02/02/invitation-for-t-sql-tuesday-003-relationships.aspx

February 9, 2010 10:41 AM
 

Marco Russo (SQLBI) said:

Good idea - I updated the post! :)

February 9, 2010 10:53 AM
 

Twitter Trackbacks for SQLBI - Marco Russo : How to relate tables in DAX without using relationships [sqlblog.com] on Topsy.com said:

February 9, 2010 4:17 PM
 

Rob Farley said:

Feb 13 2010 15:45(Reposted from my msmvps.com blog ) Lots of blog posts for this month, for the first

February 20, 2010 1:30 AM
 

Rob Farley said:

(Reposted from my msmvps.com blog ) Lots of blog posts for this month, for the first T-SQL Tuesday to

February 20, 2010 1:32 AM

Leave a Comment

(required) 
(optional)
(required) 
Submit

About Marco Russo (SQLBI)

Marco Russo is a consultant, writer and trainer specialized in Business Intelligence with Microsoft technologies. He runs the SQLBI.COM website, which is dedicated to distribute resources useful for BI developers, like Integration Services components, Analysis Services models, tools, technical information and so on. Marco is certified as MCT, MCDBA, MCSD.NET, MCSA, MCSE+I.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement