THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

Distinct Count Measure in PowerPivot using DAX

UPDATE: PowerPivot for Excel 2010 in SQL Server 2012 and Excel 2013 support the DISTINCTCOUNT aggregation in DAX. This blog post is relevant only if you use PowerPivot for Excel 2010 in SQL Server 2008 R2.

A PivotTable based on PowerPivot data doesn’t offer a native Distinct Count aggregation. To calculate it, you need to create a calculated measure in the PivotTable. For example, if we have a simple Sales table from AdventureWorks loaded in PowerPivot and projected into a PivotTable like the following picture.

Distinct-01

To create a new measure, we need to right-click on the field pane the menu “Add New Measure…”

clip_image004 

At this point we name the new measure Distinct Customers and write the DAX formula that calculates the number of rows returned by the DISTINCT function, which returns a list of the distinct values contained in a column (we choose EmailAddress in this example, but if we had a customer key we would have used it instead).

clip_image006

The DAX formula is:

COUNTROWS( DISTINCT( Table[Column] ) )

Where Table is the name of the PowerPivot Table containing the column and Column is the name of the column containing the values we want to count as distinct. Having defined this calculated measure, it is computed in the context defined by the PivotTable query. For example, we can look at the number of distinct customers that have bought each product, comparing that number with the number of sales for each product (which is always a number greater than or equal to the number of distinct customers).

Distinct-04

I’d like to add a personal note. It would have been better if a shorter syntax would have been available, and much better if PivotTable would have offered the Distinct Count as a possible option in the Summarize By options that are available for each native measure (current available options are Sum, Count, Min, Max and Average).

Distinct-05

 

 

 

 

Moreover, even a more direct DAX function like a DISTINCTCOUNT( [column] ) would have been easier and intuitive for “that guy”!

Published Saturday, December 26, 2009 2:02 PM by Marco Russo (SQLBI)
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Russell A said:

Thanks very much. I'm curious, if I have blank rows in my column data, is there a way to exclude these as a distinct value?

I've tried =countrows(distinct(allnoblankrow(tbl[clm]))) but no avail :)

March 7, 2011 5:16 PM
 

Marco Russo said:

March 8, 2011 4:33 AM
 

Serah said:

Can we use CountRows(Distinct()) for a column from aquery in Power Pivot.

Instead of selecting from table, i have columns from a query.

March 31, 2011 9:29 AM
 

Marco Russo (SQLBI) said:

Serah, what do you mean?

You can write = COUNTROWS( DISTINCT( Table[Column] ) ) but I don't understand what do you mean with "Instead of selecting from table, i have columns from a query."

Marco

March 31, 2011 9:45 AM
 

OtterMBA said:

I have a question on count(distint()) or distinctcount() - I use Denali for Excel.

I have a single table that I create a linked table into PowerPivot. On the table I have the following columns [sales staff],[customer],[year]. I have been able to do a distinctcount for 2009,2010, and 2011 just fine. ie.."=calculate(distinctcount([customer]),table1[year]=2009)"

I need to count all of the customers that we did business with in 2009 and 2010 but not 2011, 2011 only, 2010 only, etc. and every iteration of the years. Once I figure out the formula I know I can manipulate for the others.

Curretly I build a pivot table by salesperson,and customer by distinctcount by year. Then use an if formula in an empty cell like if(and(c1=1,d1=1,e1=0),1,0) to figure out customers for 2009 and 2010 only. Then I total all the 1s for each salesperson. Is there a way to create a measure in powerpivot to do this?

January 11, 2012 2:17 PM
 

Marco Russo (SQLBI) said:

Do you know the possible combination in advance, or do you want to select a list of "DO" and a list of "DO NOT" and having a dynamica calculation based on that?

In the latter, take a look at this scenario in this case, because is very similar to the requirement.

http://sqlblog.com/blogs/alberto_ferrari/archive/2011/05/19/powerpivot-basket-analysis-and-the-hidden-many-to-many.aspx

January 11, 2012 5:14 PM
 

Johannes said:

Is there a difference between countrows(distinct()) and distinctcount() ?

February 20, 2012 9:10 AM
 

Marco Russo (SQLBI) said:

Johannes, no differences - actually DISTINCTCOUNT is more clear from a semantic point of view, but internally COUNTROWS(DISTINCT()) is computed instead, so they are identical from a computational point of view.

February 20, 2012 6:01 PM
 

Johannes said:

Hey Marco, forgot to check back until now - thanks for clarifying that!

March 26, 2012 11:44 AM
 

Pathik Bhatt said:

Its been sometime since, I have been trying to get my head around this. I have imported information on close to 1.5 million customer orders in a powerpivot table. Customer orders generate different delivery packets. Sometimes for the same customer orders there are several packets generated. So the data looks like:

Order,Package

e1,p1

e1,p1

e1,p2

e1,p2

e1,p3

e2,p5

e2,p6 etc..

What I want to do is calculate in adjacent field a count of all unique packages that the customer order has generated. In the above example that would be:

Order,Package

e1,p1,3

e1,p1,3

e1,p2,3

e1,p2,3

e1,p3,3

e2,p5,x …

I just learnt about powerpivot and am not able to understand how this could be achieved. I need this field for calculation purpose. Many Thanks!

March 31, 2012 9:53 AM
 

Marco Russo (SQLBI) said:

Sorry for big delay in answering, I missed your comment!

Assuming the table you used as an example is called Orders, the solution is the following one (in PowerPivot 2012)

= CALCULATE(DISTINCTCOUNT(Orders[Package]),ALL(Orders[Package]))

In PowerPivot v1 you can obtain the same result by writing:

= CALCULATE(COUNTROWS(DISTINCT(Orders[Package])),ALL(Orders[Package]))

May 27, 2012 5:07 PM
 

HK said:

Thanks for posting this. It saved me a lot of time on a project.

January 23, 2013 11:23 AM
 

Aviad said:

I have that tableuser number med

1 22

1 22

1 22

2 33

2 33

2 33

1 232

3 44

3 42

3 42

4 50

4 50

4 50

5 98

6 464

7 321

12 9

8 8

15 51

16 52

32 97

3 41

what formula can count number of distinct meds per user ? (i.e user 1 = 2 meds)

Thanks!

Aviad

July 22, 2013 5:37 PM
 

Marco Russo (SQLBI) said:

You can use this formula in a calculated column:

=CALCULATE ( DISTINCTCOUNT ( tableuser[med] ), ALL ( tableuser[med] ) )

Or this in a measure:

MC := DISTINCTCOUNT ( tableuser[med] )

July 22, 2013 7:06 PM
 

Calderon said:

Very useful. Thank you for the formula.

August 22, 2013 9:48 AM
 

Kara said:

Hi I have table with following fields.  I have to find average selling price by SKU  and also count of Distinct SKU by customer and operating unit.  Please help.

operating_unit  customer SKU   Selling price

USA             ABCD     P1     100

USA             PQR      P1     80

USA             LMN      P1     20

USA             ABCD     P2     30

UK              XYZ      P1     50

UK              XYZ      P2     100

September 8, 2013 1:21 AM
 

Marco Russo (SQLBI) said:

Kara,

the average selling price can be obtained by using

AVERAGE ( Table[Selling Price )

putting the SKU on the rows of the PivotTable.

The same is true for getting the distinct count of SKU, create a measure with:

DISTINCTCOUNT ( Table[SKU] )

and put customer and operating unit on the rows of the PivotTable.

Marco

September 8, 2013 4:17 AM
 

smita said:

unable to use distinctcount option.

September 16, 2013 2:55 AM
 

Marco Russo (SQLBI) said:

smita, do you have the latest version of PowerPivot for Excel? The first version didn't have DISTINCTCOUNT function in DAX.

September 16, 2013 3:08 AM
 

Jamal said:

I have a table of customers and their related invoices. I want to get when the first invoice was issued (month & year).

September 22, 2013 6:26 AM
 

Marco Russo (SQLBI) said:

Jamal,

the easiest way is creating a calculated column in Customers table with the formula:

= CALCULATE ( MIN ( Invoices[Date] ) )

So that you get the minimum date of the invoices of each customer.

Marco

September 22, 2013 1:27 PM
 

MrChuckles said:

Thank you! I found your site as the top hit on Google and you really helped me. Your explanation of counting distinct values was something I had been trying to teach myself through trial and error to no avail. Your example made it a two minute process. I appreciate you dude!

November 24, 2013 10:58 AM
 

Chris Gilbert said:

I have a powerpivot table with dates on rows showing the distinctcount of orders for the given date.  I would like to display on each row the average  and max number of orders over all of the dates shown, so that I can display an average and max line on the pivot chart.

The problem I have is how to get the max of the distinctcount() or the maxx( ) of the in-memory distinct() table values.

Can you hel me?

December 11, 2013 9:47 PM
 

Marco Russo (SQLBI) said:

Chris,

you should use CALCULATE with the same measure over a different period.

For example, if you have:

[Orders] := DISTINCTCOUNT ( Orders[OrderID] )

then you can write:

[Monthly Orders] :=

IF (

   HASONEVALUE ( Calendar[Month] ),

   CALCULATE (

       [Orders],

       PARALLELPERIOD ( Calendar[Date], 0, MONTH )

   )

)

December 12, 2013 1:12 AM
 

Chris Gilbert said:

Thanks for the reply.

I found that this [Monthly Order] measure just returns the total of all the DistinctCount, not the max of all those entries.

Any ideas?

Chris

December 12, 2013 9:04 AM
 

Marco Russo (SQLBI) said:

Sorry for the misunderstanding.

Can you show an example (with numbers) of what you want on each row? There are at least 3-4 possible interpretations of your request.

Marco

December 12, 2013 9:15 AM
 

Chris Gilbert said:

All of this just to be able to place an AvgOrders and MaxOrders dashed line across a pivot chart!  

Row Labels NumOrders Average Max

8/1/2002 9 9.1 13

8/2/2002 8 9.1 13

8/3/2002 6 9.1 13

8/4/2002 9 9.1 13

8/5/2002 7 9.1 13

8/6/2002 12 9.1 13

8/7/2002 11 9.1 13

8/8/2002 8 9.1 13

8/9/2002 8 9.1 13

8/10/2002 4 9.1 13

8/11/2002 5 9.1 13

8/12/2002 9 9.1 13

8/13/2002 12 9.1 13

8/14/2002 10 9.1 13

8/15/2002 12 9.1 13

8/16/2002 5 9.1 13

8/17/2002 10 9.1 13

8/18/2002 8 9.1 13

8/19/2002 11 9.1 13

8/20/2002 12 9.1 13

8/21/2002 10 9.1 13

8/22/2002 10 9.1 13

8/23/2002 5 9.1 13

8/24/2002 11 9.1 13

8/25/2002 11 9.1 13

8/26/2002 13 9.1 13

8/27/2002 11 9.1 13

8/28/2002 8 9.1 13

8/29/2002 6 9.1 13

8/30/2002 8 9.1 13

8/31/2002 12 9.1 13

Grand Total 281

December 12, 2013 9:20 AM
 

Marco Russo (SQLBI) said:

Ok so you might use:

[Avg Orders] := AVERAGEX ( Calendar[Date], [Orders] )

[Max Orders] := MAXX ( Calendar[Date], [Orders] )

[Monthly Avg Orders] :=

CALCULATE (

  [Avg Orders],

  PARALLELPERIOD ( Calendar[Date], 0, MONTH )

)

[Monthly Max Orders] :=

CALCULATE (

  [Max Orders],

  PARALLELPERIOD ( Calendar[Date], 0, MONTH )

)

December 12, 2013 9:33 AM
 

Chris Gilbert said:

Both the [Avg Orders] and [Max Orders] measures return an error...

The value for column 'Date' in table 'Calendar' cannot be determined in the current context. Check that all columns referenced in the calculation expression exist, and that there are no circular dependencies. This can also occur when the formula for a measure refers directly to a column without performing any aggregation--such as sum, average, or count--on that column. The column does not have a single value; it has many values, one for each row of the table, and no row has been specified.

December 12, 2013 9:46 AM
 

Chris Gilbert said:

Got it!  The reference to "Calendar[Date]" in both [Avg Orders] and [Max Orders] just needed to the the table name "Calendar".

Thanks for your help.  I hadn't even heard of the PARALLELPERIOD() function, so I've got some investigation to do.

Chris

December 12, 2013 9:53 AM
 

Chris Gilbert said:

That works for when I only have one month of dates on the PivotTable Rows.

If I expand the number of dates to a full year, then I get an Average that changes at every transition to the next month.

If I change the [Monthly Avg Orders] to use YEAR in the PARALLELPERIOD function, then I get the correct average for the year, but switching back (via slicer) to just one month of data still gives me the yearly average for any month that I select.

I tried changing MONTH to DAY, which the context sensitive help for the PARALLELPERIOD states is a valid parameter, but the {Check Formula} feature rejects DAY as invalid.

I'll keep after it.

December 12, 2013 10:08 AM
 

Marco Russo (SQLBI) said:

Chris,

you might want to implement some IF statement using HASONEVALUE in order to check whether you have selections at year or month level.

I am not sure to be able to help more here, not having the full data model and all the details of your specific issue.

I hope you understand - if you want direct consultancy you can see our offer here: http://www.sqlbi.com/consulting

Marco

December 13, 2013 1:39 AM
 

Chris Gilbert said:

Thanks, Marco

I understand, and appreciate all the help.  And, hopefully, others can also benefit from this discussion, as well.

I just found it "interesting" that it was so "non-intuitive" on how to get an AVG or MAX of the values resulting from the DISTINCTCOUNT function... which was one of the most obvious things I've ever had to explain to new users of PowerPivot.

December 13, 2013 6:14 AM
 

Marco Russo (SQLBI) said:

Chris,

there are technical explanations (PivotTable in Excel use MDX, internal Language of PowerPivot is DAX, ...) but I agree with you that the final result is that the mix makes some common operations not enough intuitive. A "visual" max/avg on anything you see in the cells of a PivotTable would make life easier to some users.

Marco

December 13, 2013 6:23 AM
 

Rene Ortiz said:

I have a table with the following fields and need to find a count of distinct for IssueType1, IssueType2, and IssueType3.  I'm using PowerPivot w/Excel 2010

TeamMemberDept IssueType1 IssueType2 IssueType3

Dept1 Due Date Change Improper Call Conduct Late Fees

Dept1 Payment Processing Late Fees Payment Channels

Dept3 Service Member Issue Deferment Payment Channels

Dept3 Excessive Calls Improper Call Conduct Late Fees

Dept4 Due Date Change Deferment Simple Interest

Dept5 Credit Bureau Reporting Improper Call Conduct Improper Call Conduct

Dept2 Credit Bureau Reporting Improper Call Conduct Deferment

Dept6 Excessive Calls Reference/Nearby Calls Billing Statement

Dept7 Excessive Calls Improper Call Conduct Payment Processing

Dept1 Excessive Calls Payment Arrangement Improper Call Conduct

Dept2 Verification / Probing Question Process Deferment Improper Call Conduct

April 9, 2014 3:11 PM
 

Rene Ortiz said:

One more thing to add to my question above...some fields contain blanks, so I would like to avoid including any blank values!

Thanks

April 9, 2014 3:49 PM
 

Marco Russo (SQLBI) said:

Rene,

you can obtain the result with:

= CALCULATE (

   DISTINCTCOUNT(Table[IssueType1]),

   NOT(ISBLANK(Table[IssueType1]))

)

Marco

April 10, 2014 12:22 AM
 

Rene Ortiz said:

Thanks Marco for the formula, but I get "This is not a valid DAX expression

Here is the actual formula I'm using: =CALCULATE(DISTINCTCOUNT('Table_ComplaintsProd'[IssueType1]),NOT(ISBLANK('Table_ComplaintsProd'[IssueType1])))

April 10, 2014 10:05 AM
 

Rene Ortiz said:

I also want to be able to get one count for distinct values from IssueType1, IssueType2 and IssueType3, if that is possible?

Thanks for your time!

April 10, 2014 10:25 AM
 

Marco Russo (SQLBI) said:

Create a calculated column that has something like

= Table[IssueType1) & "|" & Table[IssueType2) & "|" Table[IssueType3)

and do the DISTINCTCOUNT on such a column.

Another technique is doing that at Query time, but it might affect performance and memory requirement in a very bad way.

Marco

April 11, 2014 1:13 AM
 

Richi said:

Hi Marco,

How can one calculate a sum based on distinct fields?  Say for example the field in this case is "ID":

ID  Amt

1   5.00

2  10.00

2  10.00

3  15.00

4  20.00

SUM AMT calculate field should equal 10.00 for ID 2.

Please advise.. Thanks..

April 14, 2014 12:25 AM
 

Marco Russo (SQLBI) said:

Use this:

SUMX (

   SUMMARIZE (

       Table,

       Table[ID],

       Table[Amt]

   ),

   Table[Amt]

)

April 14, 2014 12:36 AM
 

Richi said:

Thanks Marco!  Works for me!

April 14, 2014 9:25 AM
 

Lydia said:

Hi Marco,

I'm trying to get a calculated column that has calculates the average number of days in stock per SKU...

I tried using a similar approach with using

=AVERAGEX(SUMMARIZE(Sold,Sold[IC unique],Sold[Time in Stock]),Sold[Time in Stock])

This is just giving me a total average for all SKUs. Is it possible to do it per SKU?

Thanks for your time!

July 2, 2014 3:38 PM
 

Marco Russo (SQLBI) said:

If the SUK is a column of the same table, you could use this:

=

AVERAGEX (

   SUMMARIZE (

       FILTER ( Sold, Sold[SKU] = EARLIER ( Sold[SKU] ) ),

       Sold[IC unique],

       Sold[Time in Stock]

   ),

   Sold[Time in Stock]

)

July 2, 2014 3:46 PM
 

Amber said:

hi marco,

I am trying to find a net sales over time per sku based on the order of the invoice numbers.  The negative charges are credits. Here is how my table is set up:

invoice_no     amount charged     SKU

1              100                A

2              -100               A

3              150                B

4              -100               B

5              200                A

6              0                  A

I want to have the net amount charged for sku A to be 200 and sku B to be 50.  I have tried

=sumx(filter('invoices', 'invoices'[sku]=EARLIER('invoices'[sku]) && 'invoices'[invoice_no]>=EARLIER('invoices'[invoice_no])),[Amount Charged])

but this is giving me the wrong answer.  Any suggestions?

July 11, 2014 3:01 PM
 

Marco Russo (SQLBI) said:

Amber, can you describe me why A should be 200? I see 100, -100, 200, 0, which totals should be 300 and not 200.

Marco

July 13, 2014 4:20 AM
 

Amber said:

Marco,

A should equal 200.  100 + (-100) + 200 + 0 = 200.

Amber

July 16, 2014 12:25 PM
 

Marco Russo (SQLBI) said:

Amber, you are right, sorry for my confusion. So, why this is not good?

SUM ( Invoices[Amount Charged] )

July 16, 2014 12:30 PM
 

Amber said:

I need it to be over time. For example I want it to look like...

invoice_no     amount charged     SKU     Net Over Time

1              100                A       100

2              -100               A       0

3              150                B       150

4              -100               B       50

5              200                A       200

6              0                  A       200

I've tried changing my formula a little bit, but it still isn't performing correctly.

=sumx(summarize(filter('invoices', 'invoices'[sku]=EARLIER('invoices'[sku]) && 'invoices'[invoice_no]<=EARLIER('invoices'[invoice_no]), 'invoices'[SKU], [Amount Charged]),[Amount Charged])

July 16, 2014 12:37 PM
 

Marco Russo (SQLBI) said:

It seems you want to create a calculated column - in that case this should be the formula to use:

=

SUMX (

   FILTER (

       'invoices',

       'invoices'[sku] = EARLIER ( 'invoices'[sku] )

           && 'invoices'[invoice_no] <= EARLIER ( 'invoices'[invoice_no] )

   ),

   'invoices'[Amount Charged]

)

July 16, 2014 12:49 PM
 

Amber said:

Thank you!

July 16, 2014 12:52 PM
 

San kar said:

Marco, New to Power Pivot. Want to calculate the ranking of the code

I have a 2 tables a table1

name, code

A, D1

B, D2

C, D1

E, D3

F, D2

G, D3

H, D3

table2  where I want to calculate row counts from table1 and rank the codes based on the count for each code as shown here

code, Count, rank

D1, 2, 2

D2, 2, 2

D3, 3, 1

Thanks

August 22, 2014 3:51 PM
 

Marco Russo (SQLBI) said:

Assuming you have a relationship between table1 and table2, you can use:

Count = CALCULATE ( COUNTROWS ( table1 ) )

Rank = RANKX ( ALL ( table2 ), table2[Count] )

August 23, 2014 1:42 AM
 

San Kar said:

Thank you.

August 25, 2014 11:55 AM
 

Sagar said:

Hi,

For excluding blanks from Calculation I tried these two DAX statements but these are working very slow in reports.,

:= CALCULATE (

  DISTINCTCOUNT(Table[IssueType1]),

  NOT(ISBLANK(Table[IssueType1]))

)

:= CALCULATE (

  DISTINCTCOUNT(Table[IssueType1]),

 Table[IssueType1] <> Blank())

)

And tried this ,

:= CALCULATE (

  DISTINCTCOUNT(Table[IssueType1]),

ALLNOBLANKROW( Table[IssueType1]))

)

But this is fast but not excluding Blanks.

Do we have any better DAX statement which will vomit Blanks and give good performance as well?

Thanks for your help.

September 29, 2014 11:41 PM
 

Marco Russo (SQLBI) said:

How many unique values do you have in IssueType1 column?

If there are thousands or more, you should create a calculated column "IsIssueType1Blank" with the formula ISBLANK(Table[IssueType1]) and then transform the measure in:

:= CALCULATE (

 DISTINCTCOUNT(Table[IssueType1]),

 NOT(Table[IsIssueType1Blank])

)

September 30, 2014 12:55 AM
 

Sagar said:

Yes, it has thousands of unique values.

And using this method it's working good.

Thank you...

September 30, 2014 3:54 AM

Leave a Comment

(required) 
(required) 
Submit

About Marco Russo (SQLBI)

Marco Russo is a consultant, writer and trainer specialized in Business Intelligence with Microsoft technologies. He runs the SQLBI.COM website, which is dedicated to distribute resources useful for BI developers, like Integration Services components, Analysis Services models, tools, technical information and so on. Marco is certified as MCT, MCDBA, MCSD.NET, MCSA, MCSE+I.

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement