THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

Calculate New, Returning, Lost, and Recovered Customers in #dax

Calculating the number of new and returning customers is a recurring question. I would say this is a “classical” Business Intelligence problem, very common in marketing department. I worked on these problems with many customers, with small and large datasets, and I wrote a DAX Pattern “New and Returning Customers” showing how to calculate:

  • New Customers: customers who never made any purchase
  • Returning Customers: customers who bought something in the past
  • Lost Customers: customers who bought something but did not buy in the last N days days
  • Recovered Customers: previously “lost customers” who made a new purchase

This is not a brand new topic, you can find many other blog posts on this topic (Chris Webb, Javier Guillén, Gerhard Brueckl, David Hager, Rob Collie), so my goal was to show very generic formulas that were generally the best solution in term of performance. This make the formula less readable, such as the following:

[Returning Customers] :=
COUNTROWS (
    CALCULATETABLE (
        VALUES ( <customer_key_column> ),
        VALUES ( <customer_key_column> ),
        FILTER (
            ALL ( <date_column> ),
            <date_column> < MIN ( <date_column> )
        )
    )
)

As you see, using CALCULATETABLE ( VALUES ( table[column] ), VALUES ( table[column] ), … ) seems a useless thing. Why counting the rows returned by VALUES and passing it also as a filter argument? This is a not so intuitive behavior of CALCULATE. The first argument is an expression that will be evaluated in a modified filter context. The third argument is a FILTER on the date column, which extends the range of dates considered, including all the past sales transactions. At this point, the first VALUES would return any customers who made a purchase in the past, but the second argument will only considered those that made a purchase in the current selection of time. The final result is an AND condition between two sets of customers (the intersection of the two sets), which is faster than trying to calculate the number of past transactions of each customer who made a purchase in the current selection of time, filtering only those that results in zero transactions.

In general, I prefer using more readable DAX formulas, also in DAX patterns, optimizing them only when necessary. But in this case the performance might be important (visible to the user) also with a few thousands of customers. As usual, any feedback on the New and Returning Customers pattern will be very welcome!

Published Thursday, June 26, 2014 10:13 AM by Marco Russo (SQLBI)

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Ronald said:

Nice one. I wrote some t-sql the do the same, maybe also interesting to take a look at:

http://www.sqlblog.nl/2014/08/calculate-lost-customers-using-t-sql/

Can do the same trick for new/recurring customers.

grts. Ronald

August 20, 2014 5:32 AM
 

Marco Russo (SQLBI) said:

Hi Ronald, thanks for the link!

August 20, 2014 7:14 AM
 

Oxenskiold said:

That's a rather elegant way to show how tables can filter tables and the intersection is the result. In general dissecting the DAX patterns from your web site is a formidable way to figure out how DAX works in detail.

I think it's important to realize though that the above  measure only works if the date table you are using  is marked as a Date table in the data model and you use the column chosen as the unique identifier for the data table as the <date_column> in the ALL(<date_column>) in the FILTER(). The reason is that when you use this particular marked column from the marked data table as a filter parameter in CALCULATE it overwrites all columns from the date table that existed in the filter context when entering the CALCULATE (it seems). In this case [year] and [Monthname]. This is contrary to the usual rule where a single column used as a filter parameter only overwrites the same column not all columns from the same table.

So if you were to remove the marking of the date table, the first VALUES() would be evaluated with the [year] and [monthname] still active in the filter context. (They would not be overwritten) and then there would be no intersection between the dates coming from the FILTER and the dates from the first VALUES(). (the dates coming from the FILTER() would be all the dates before the dates from the first VALUES()). So the result would be empty, that is no rows are returned.

Of course if you wanted the measure to work whether the date table is marked or not you could just use the full date table as a parameter in the FILTER or as a direct parameter in the CALCULATE. Like this

FILTER (

   ALL ( <date_table> ),

   <date_column> < MIN ( <date_column> )

)

OR

[Returning Customers] :=

COUNTROWS (

   CALCULATETABLE (

       VALUES ( <customer_key_column> ),

       VALUES ( <customer_key_column> ),

       ALL ( <date_table> ),

       FILTER (

           ALL ( <date_column> ),

           <date_column> < MIN ( <date_column> )

       )

   )

)

That way the [year] and [month] columns from the date table are overwritten no matter what.

September 8, 2014 10:21 AM
 

Marco Russo (SQLBI) said:

Thanks for this comment. I know this difference, but having a Date table in a data model that is not marked as a Date Table is not a good idea - it works if you only use these formulas, but you might want to add other measures later, and/or someone else might change the data model. There are no side effects in adopting the ALL ( <date_table> ) filter argument to make the formula working also in case "Mark as Date Table" is not set.

I just don't completely understand your comment "So if you were to remove the marking of the date table, the first VALUES() would be evaluated with the [year] and [monthname] still active in the filter context."

This is the case if you are using another column to join fact and date table, but if the date column is used for the relationship, the formula works even if the Date table is not marked as Date Table.

September 8, 2014 10:35 AM
 

Oxenskiold said:

Oh I see your point thanks for clearing that up.

September 9, 2014 3:36 AM

Leave a Comment

(required) 
(required) 
Submit

About Marco Russo (SQLBI)

Marco Russo is a consultant, writer and trainer specialized in Business Intelligence with Microsoft technologies. He runs the SQLBI.COM website, which is dedicated to distribute resources useful for BI developers, like Integration Services components, Analysis Services models, tools, technical information and so on. Marco is certified as MCT, MCDBA, MCSD.NET, MCSA, MCSE+I.

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement