THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

Median calculation in DAX

I recently discussed with Darren Gosbell about how to calculate the Median in DAX.

The problem is to make the calculation using a measure, so that you can use it in a dynamic way into a PivotTable. We have a People table with three columns: Gender, Customer and Age (you can see the table in the screenshot below). The Median formula can be defined in this way:

MINX( FILTER( VALUES( People[Age] ),
              CALCULATE( COUNTROWS( People ),
                         People[Age] <= EARLIER( People[Age] ) ) 
              > COUNTROWS( People ) / 2 ),
      People[Age] )

The MINX try to find the minimum Age value which has been filtered by the FILTER condition.
The FILTER enumerates all the ages and, for each, CALCULATE the number of people that have an age lower than or equal to the “current” one (here the concept of current is returned by EARLIER, which refers to the row context outside the CALCULATE). The FILTER returns only those ages for which the previous CALCULATE is at least half of the whole population.

I don’t like very much that MINX also calculates the same value for values of Ages that are higher to the lower number that can be found. But this is a logic that applies to an algorithm that iterates data in a single thread. I don’t see any reason why the MINX couldn’t be executed in parallel, and I’d like to know if this is what really happens under the cover – more info if I will discover something.

Below the screenshot of the example and the PivotTable with the calculated Median.

image

Published Tuesday, July 20, 2010 1:54 PM by Marco Russo (SQLBI)

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

David Warner said:

Hi Marco, thank you for a excellent clear discussion of this solution.

Have you encountered any problems when the OLAP query runs with this calculation? Not sure if my problem relates to my dataset being larger than your example?

I get the following error:

============================

Error Message:

============================

Exception from HRESULT: 0x800A03EC

----------------------------

----------------------------

Could not add the field "Median" to the PivotTable because the formula is invalid.

============================

Call Stack:

============================

Server stack trace:

Exception rethrown at [0]:

  at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)

  at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)

  at Microsoft.Office.Interop.Excel.PivotTable.AddDataField(Object Field, Object Caption, Object Function)

  at Microsoft.AnalysisServices.Modeler.FieldList.ExcelInterOpUtil.AddToDataFields(ICalculatedMember calculatedMember, Int32 positionIndex, Boolean isSpecialColumnBasedNamedSetPresent)

----------------------------

  at Microsoft.AnalysisServices.Modeler.FieldList.ExcelInterOpUtil.AddToDataFields(ICalculatedMember calculatedMember, Int32 positionIndex, Boolean isSpecialColumnBasedNamedSetPresent)

  at Microsoft.AnalysisServices.Modeler.FieldList.GeminiPivot.SetFieldOrientation(FieldLocation location, IGeminiColumn column, Int32 positionIndex)

----------------------------

  at Microsoft.AnalysisServices.Modeler.FieldList.GeminiPivot.SetFieldOrientation(FieldLocation location, IGeminiColumn column, Int32 positionIndex)

  at Microsoft.AnalysisServices.Modeler.FieldList.GeminiPivot.AddField(IGeminiColumn column, Int32 index)

  at Microsoft.AnalysisServices.Modeler.FieldList.GeminiPivot.AddField(IGeminiColumn column)

  at Microsoft.AnalysisServices.Modeler.FieldList.FieldListControl.fieldsTreeView_AfterCheck(Object sender, TreeViewEventArgs e)

============================

March 31, 2011 7:12 PM
 

Marco Russo (SQLBI) said:

How many rows and how many different values do you have in the column you are trying to calculate?

Are you using 32-bit or 64-bit version of Excel?

The error seems not related to memory, in reality - can you try to create a smaller data set to make a test?

Marco

April 1, 2011 6:45 AM
 

David Warner said:

Full answer: I have 3 tables, table A has 17,000 rows, table B represents links to table C and has 197,000 rows. Table B contains the measures. Table C has 618 rows.

I have a Report filter on table A, for which for my testing I have selected 210 rows.

I have a Slicer on table C, which I've been selecting up to 128 rows.

I'm using 32-bit version of Excel

Having closed the file, reopened and run on new day with absolutely no changes its gone from never working to always working. That said the Median Value calculation is quite slow, so I wonder if the above error represents a timeout?

Thanks,

David.

April 5, 2011 2:16 AM
 

Marco Russo (SQLBI) said:

You should take a look at VM Size of the Excel process when you run the query - it probably goes very high and in certain condition crashes. Restarting Excel you have less memory pressure, probably.

I don't think it's a timeout problem. It would be nice to investigate in order to improve performance. Do you have a workbook that you can share (=no sensitive data inside)?

Marco

April 5, 2011 5:34 AM
 

Javier Guillen said:

Hi Marco

Thanks for this post.  I am having two issues trying to implement it though:

-  My series of values in which to calculate the Median is a measure, not a calculated column.  Hence I cannot use VALUES or EARLIER.  

- The results do not match excel in many scenarios, as the DAX expression does not use interpolation.  Example, Median over values 1 & 2 only should yield 1.5, not 2

Do you know if there is a way to get the expression to work with measures, and somehow include interpolation?

Thanks!

May 29, 2011 12:28 AM
 

Esben Thomassen said:

Hi Marco.

I have tried your median calculation.

It works. But the problem is that I have several blank fields and the formula calculates the blanks as 0 (zero). Thus the median is skewed. In stead I want to calculate the median on all the non-blank fields.

Do you have any solution to filter the blanks?

Thanks in advance.

October 28, 2011 8:43 AM
 

Marco Russo (SQLBI) said:

Just use COUNTA instead of COUNTROWS.

COUNTA requires a column to be specified - use the column that contains the value that can be blank - COUNTA will count only the non-blank rows

October 28, 2011 8:53 AM
 

Esben Thomassen said:

Thank you - works like a charm. :)

But I am curious to know if it is possible to let it calculate the average value between two numbers if I have an even set of fields?

(I hope you know what I mean).

Thanx

October 31, 2011 6:10 AM
 

Marco Russo (SQLBI) said:

Esben, sorry I'm not sure about what you mean - can you explain it with more details?

October 31, 2011 6:30 AM
 

Esben Thomassen said:

I think it is easier if I give you an example:

Let's say I have these values:

1

2

3

4

5

Then the median is 3.

But if the values are:

1

2

3

4

Then the median is between 2 and 3. Then it would be 2.5.

Is it possible to let it print out 2.5 as result?

Thank you.

October 31, 2011 8:13 AM
 

Marco Russo (SQLBI) said:

You might check whether the COUNTROWS( People ) is even and in such a case you might want to calculate the average between two numbers.

A not so efficient formula could be this one:

=AVERAGEX( FILTER( VALUES( People[Age] ),

             CALCULATE( COUNTROWS( People ),

                        People[Age] <= EARLIER( People[Age] ) )  

             >= COUNTROWS( People ) / 2

            &&

             CALCULATE( COUNTROWS( People ),

                        People[Age] <= EARLIER( People[Age] ) )  

             <= COUNTROWS( People ) / 2 + 1 ),

     People[Age] )

October 31, 2011 9:04 AM
 

Esben Thomassen said:

Well, the syntax seems correct. But it doesn't return any value. The field is blank...?

October 31, 2011 10:19 AM
 

Marco Russo (SQLBI) said:

How many values (rows) do you have in the table?

October 31, 2011 11:25 AM
 

Esben Thomassen said:

I have 730 rows.

November 1, 2011 4:05 AM
 

Marco Russo (SQLBI) said:

Oh I see the problem now.

My formula worked only if there are no duplicated values in Age. In case of duplicated values, there should be more controls over that. Sorry I have no time to further investigate on that now - I'm writing a book and I'm late with the next delivery!

Update comments here if you find the solution.

Marco

November 1, 2011 5:29 PM
 

Esben Thomassen said:

Thank you anyways for all your help.

I will just try to figure it out from here. :)

November 4, 2011 5:52 AM
 

DAX noob said:

Hi,

I want to do the following:-

Annual Average EmployeeCount=([Employee Count for jan]+feb+..+jun)/6

i.e the denominator should dynamically change based on the number of months i.e if the current month is november the denominator should be 11.Do i need to first take monthly counts measure for the numerator ?? how do i go about it???

December 13, 2013 1:47 AM
 

Marco Russo (SQLBI) said:

If you write the formula by hand, you should know how many months you have used in the numerator.

A dynamic calculation could be the following (assuming you are selecting months):

[Annual average] :=

SUMX ( VALUES ( Calendar[Month] ), [Employee Count] )

/ COUNTROWS ( VALUES ( Calendar[Month] ) )

December 13, 2013 4:25 AM
 

DAX noob said:

Thanks Marco, but i am not selecting the months.Based on all the dates in the column ,is there any date function that can figure out how many months are present for the particular year which i could use as the denominator in my case??

December 13, 2013 5:18 AM
 

Marco Russo (SQLBI) said:

It depends. If you are referencing to an Excel PivotTable, you have to consider whether Excel is hiding rows included within the original query (i.e. days without sales should be automatically removed, even if they appear in the Calendar table)

If you are using Reporting Services, you should have the direct control in your query.

I don't know your exact situation, but in general you can use VALUES ( ) to get the member currently active in the current filter context, and you might use ALLSELECTED to get those active in the grand total of the pivottable when you evaluate a specific cell.

There are many variables here and it's hard to provide you a formula without the exact scenario.

Maybe you want this:

CALCULATE (

   COUNTROWS ( VALUES ( Calendar[Month] ) ),

   ALLSELECTED ()

)

Or this:

CALCULATE (

   COUNTROWS (

       FILTER (

           VALUES ( Calendar[Month] ),

           NOT ISBLANK ( [Employee Count] )

       )

   ),

   ALLSELECTED ()

)

Or many other variations....

December 13, 2013 5:53 AM

Leave a Comment

(required) 
(required) 
Submit

About Marco Russo (SQLBI)

Marco Russo is a consultant, writer and trainer specialized in Business Intelligence with Microsoft technologies. He runs the SQLBI.COM website, which is dedicated to distribute resources useful for BI developers, like Integration Services components, Analysis Services models, tools, technical information and so on. Marco is certified as MCT, MCDBA, MCSD.NET, MCSA, MCSE+I.

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement