THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

DISTINCT COUNT measures and NULL value

After several years of using Analysis Services and SQL Server 2000, today I stumbled in this problem.

I have a DISTINCT COUNT measure that have to measure how many different documents are present in a certain aggregation. Sometimes valid cells has no documents and the document field is NULL on the fact table (in reality it's a view that generate this degenerate dimension, anyway this doesn't really matter). The problem is that the NULL value is counted as a valid different value... so when you see 2 in the cube it could be 1 or 2 if you want to ignore NULL values.

Solution is not so light. Following this kb article (only available in korean language, I suppose - no english translation available but keywords are still in latin character so I guessed right...) I've found that a solution is to define two other measure in the cube with "regular" measure (as a best practice it's best to put distinct count measures each in a different cube, for performance reasons - then merge all measures in a virtual cube):

  • Rows_number - defined as SUM of expression "1"
  • NotNullRows_number - defined as COUNT of expression "field", where field is the same field you use as DISTINCT COUNT expression

In the cube with DISTINCT COUNT measure I defined only this measure:

  • Documents_null - defined as DISTINCT COUNT of expression "field", where field is the same as the COUNT measure

Then in the virtual cube you put all together creating a single calculated measure that evaluate the right number:

  • Documents - defined as calculated measure with expression "IIF( Rows_number > NotNullRows_number, Documents_null - 1, Documents_null )"

It works, but you are wasting three times the space originally needed for the DISTINCT COUNT measure.

Distinct count measures are a really bad beast in Analysis Services 2000. Unfortunately it seems that even the Yukon release will suffer from some actual limitations, like the performance penalty caused by an ORDER BY (with the field you used as DISTINCT COUNT measure) in the SELECT sent by Analysis Services to the source database when you have a DISTINCT COUNT measure. Try it on a (virtual) cube with several DISTINCT COUNT measures and a large fact table... a lot of RDBMS resource are needed while processing this cube.

UPDATE: thanks to a valuable feedback, it has to be noted that Analysis Services handles a NULL value like a 0 value in a DISTINCT COUNT measure. In my real-world scenario I had a DISTINCT COUNT measure on a dimension key that's an int identity, so it never has 0 as a valid value. Anyway, you have to take care of this behaviour if you want to use this technique.

Published Tuesday, November 16, 2004 4:18 AM by Marco Russo (SQLBI)

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

No Comments

Leave a Comment

(required) 
(required) 
Submit

About Marco Russo (SQLBI)

Marco Russo is a consultant, writer and trainer specialized in Business Intelligence with Microsoft technologies. He runs the SQLBI.COM website, which is dedicated to distribute resources useful for BI developers, like Integration Services components, Analysis Services models, tools, technical information and so on. Marco is certified as MCT, MCDBA, MCSD.NET, MCSA, MCSE+I.

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement