THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

String Comparison in #DAX

In DAX you don’t have the LIKE operator and you have to use SEARCH instead. However, performance are not very good and it is better to use LEFT and RIGHT if you just need to compare the initial (or ending) match of a string.

I just wrote an article about string comparison in DAX. During my exploration of LIKE replacement functions in DAX, I have found that documentation of SEARCH for PowerPivot v1 is wrong – in DAX the SEARCH function is always case-insensitive, whereas FIND is always case-sensitive. Moreover, in PowerPivot v1 you had to use IFERROR to catch the string not found condition. This was a big issue in performance and in PowerPivot v2 (and in BISM Tabular) the new SEARCH and FIND functions have a fourth parameter that specify the value that should be returned when a match is not found. Using FIND seems to be 10% faster than SEARCH, just because it is case-insensitive.

I'd like to thank Marius Dumitru for the feedback he provided me in this analysis. Here are a few best practices he also suggest:

  • Always avoid IFERROR and ISERROR (everywhere, not just for string functions).
  • Use LEFT instead of FIND/SEARCH for expressing “starts with” conditional expressions on strings
  • Use FIND instead of SEARCH if your comparison is (or can be) case-sensitive.
  • When FIND or SEARCH are needed, use their new flavors taking a 4th parameter (added in SQL11)

If you are curious to see a few performance numbers, at the end of the post I includes the timing for a few query over a Tabular model based on an IISLog table. I included a best-practice table conversion in my article. If you think that a regular expression function would be useful in DAX, you can vote this suggestion on Connect.

-- Each DAX query is preceded by a comment with total execution time

-- The original condition over IISLog[cIp] column is this one in SQL:

--   WHERE cIp NOT LIKE '217.57.131.11%'

--     AND cIp NOT LIKE '11.22.33.3%'

--     AND cIp NOT LIKE '127.0.0.1'

--     AND cIp NOT LIKE '192.168.%'

--

-- Other conditions tested in a few queries are:

--     AND csUriStem LIKE '%SQLBI%Methodology%at%work%pdf'

--     AND csUserAgent NOT LIKE '%bot%'

 

-- 28 seconds

EVALUATE

CALCULATETABLE(

    ROW( "Rows", DISTINCTCOUNT( IISLog[cIp] ) ),

    FILTER( ALL(IISLog[csUserAgent]),

            SEARCH( "bot", IISLog[csUserAgent], 1, 0 ) = 0

    ),

    FILTER( ALL(IISLog[csUriStem]),

            SEARCH( "SQLBI*Methodology*at*work*pdf", IISLog[csUriStem], 1, 0 ) <> 0

    ),

    FILTER( ALL(IISLog[cIp]),

      [cIp] <> "127.0.0.1"

      && LEFT( [cIp], 12 ) <> "11.22.33.3"

      && LEFT( [cIp], 8 ) <> "192.168."

      && LEFT( [cIp], 13 ) <> "217.57.131.11"

    )

)

 

 

-- 15 seconds

EVALUATE

CALCULATETABLE(

    ROW( "Rows", DISTINCTCOUNT( IISLog[cIp] ) ),

    FILTER( ALL(IISLog[csUserAgent]),

            SEARCH( "bot", IISLog[csUserAgent], 1, 0 ) = 0

    )

)

 

-- 11 seconds

EVALUATE

CALCULATETABLE(

    ROW( "Rows", DISTINCTCOUNT( IISLog[cIp] ) ),

    FILTER( ALL(IISLog[csUriStem]),

            SEARCH( "SQLBI*Methodology*at*work*pdf", IISLog[csUriStem], 1, 0 ) <> 0

    )

)

 

 

-- 9 seconds

EVALUATE

CALCULATETABLE(

    ROW( "Rows", DISTINCTCOUNT( IISLog[cIp] ) ),

    FILTER( ALL(IISLog[csUriStem]),

            SEARCH( "SQLBI", IISLog[csUriStem], 1, 0 ) <> 0

    )

)

 

 

-- 54 seconds

EVALUATE

CALCULATETABLE(

    ROW( "Rows", DISTINCTCOUNT( IISLog[cIp] ) ),

    FILTER( ALL(IISLog[cIp]),

            IFERROR( SEARCH( "217.57.131.11", IISLog[cIp] ), -1 ) < 0

            && IFERROR( SEARCH( "11.22.33.3", IISLog[cIp] ), -1 ) < 0

            && IFERROR( SEARCH( "127.0.0.1", IISLog[cIp] ), -1 ) < 0

            && IFERROR( SEARCH( "192.168.", IISLog[cIp] ), -1 ) < 0

    )

)

 

-- 16 seconds

EVALUATE

CALCULATETABLE(

    ROW( "Rows", DISTINCTCOUNT( IISLog[cIp] ) ),

    FILTER( ALL(IISLog[cIp]),

            SEARCH( "217.57.131.11", IISLog[cIp], 1, 0 ) = 0

            && SEARCH( "11.22.33.3", IISLog[cIp], 1, 0 ) = 0

            && SEARCH( "127.0.0.1", IISLog[cIp], 1, 0 ) = 0

            && SEARCH( "192.168.", IISLog[cIp], 1, 0 ) = 0

    )

)

 

 

-- 14 seconds

EVALUATE

CALCULATETABLE(

    ROW( "Rows", DISTINCTCOUNT( IISLog[cIp] ) ),

    FILTER( ALL(IISLog[cIp]),

            FIND( "217.57.131.11", IISLog[cIp], 1, 0 ) = 0

            && FIND( "11.22.33.3", IISLog[cIp], 1, 0 ) = 0

            && FIND( "127.0.0.1", IISLog[cIp], 1, 0 ) = 0

            && FIND( "192.168.", IISLog[cIp], 1, 0 ) = 0

    )

)

 

 

-- 12 seconds

EVALUATE ROW( "result",

  COUNTROWS(

    FILTER( DISTINCT(IISLog[cIp]),

            [cIp] <> "127.0.0.1"

      && SEARCH( "11.22.33.3", [cIp], 1, 0 ) <> 1

      && SEARCH( "192.168.", [cIp], 1, 0 ) <> 1

      && SEARCH( "217.57.131.11", [cIp], 1, 0 ) <> 1

    )

  )

)

 

 

-- 12 seconds

EVALUATE

CALCULATETABLE(

    ROW( "Rows", DISTINCTCOUNT( IISLog[cIp] ) ),

    FILTER( ALL(IISLog[cIp]),

            [cIp] <> "127.0.0.1"

      && SEARCH( "11.22.33.3", [cIp], 1, 0 ) <> 1

      && SEARCH( "192.168.", [cIp], 1, 0 ) <> 1

      && SEARCH( "217.57.131.11", [cIp], 1, 0 ) <> 1

    )

)

 

-- 2 seconds

EVALUATE

CALCULATETABLE(

    ROW( "Rows", DISTINCTCOUNT( IISLog[cIp] ) ),

    FILTER( ALL(IISLog[cIp]),

      [cIp] <> "127.0.0.1"

      && LEFT( [cIp], 12 ) <> "11.22.33.3"

      && LEFT( [cIp], 8 ) <> "192.168."

      && LEFT( [cIp], 13 ) <> "217.57.131.11"

    )

)

 

 

Published Friday, December 30, 2011 11:57 AM by Marco Russo (SQLBI)
Filed under: , ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Yommy said:

Hi Marco,

Is that possible to use Search function to search a list of text strings in a table and compared each text of these text string against a list of text in another table?

January 23, 2013 5:31 AM
 

Marco Russo (SQLBI) said:

You have to create two nested loops, i.e.:

FILTER( VALUES( Target[Name] ), COUNTROWS( FILTER( ALL( SearchNames[Name] ), SEARCH( SearchNames[Name], Target[Name], 1, 0 ) <> 0 ) ) > 0 )

I'm sure you might have performance issues... :)

January 23, 2013 8:10 AM
 

Tommy said:

Dear Marco,

Thanks for your help.

Just to further clarify the exact scenario of the issue i face:

i have two unrelated tables:

1) Table A (approx. 50 rows) - Table A [Column A1] contains the full text description to be compared.

2) Table B (approx 5 rows) - Table B [Column B1] constains the list of key strings to be search from Table A [Column A1].

If search result is positive, extract the text description from Table B [Column B2] and insert into Table A [Column A2].

Appreciate you could advise how your suggested formula above to be implemented for my scenario.

Regards,

Tommy  

January 23, 2013 9:42 PM
 

Tommy said:

i tried to modified your suggested solution above and derived the following calculated column formula:

=calculate(if(countrows(values('Table B'[Colmn B2]))>1,"More than 1 classification found",values('Table B'[Colmn B2])),filter(all('Table B'),countrows(filter(values('Table A'[Colmn A1]),countrows(filter(all('Table B'[Colmm B1]),search('Table B'[Colmn B1],'Table A'[Colmn A1],1,0)<>0))>0))>0))

But i have no luck :(

The resulted calculated column is all "More than 1 classification found".

Any advice?

Regards,

Tommy

January 23, 2013 11:47 PM
 

Marco Russo (SQLBI) said:

Tommy,

assuming you are creating the calculated column in TableA, you already have a filter context in TableA so you only need to iterate over TableB in order to get the description you need.

Let me know if the following works:

=

calculate(

if(

countrows(values('Table B'[Colmn B2]))>1,

"More than 1 classification found",

values('Table B'[Colmn B2])

),

filter(

all('Table B'[Colmm B1]),

search('Table B'[Colmn B1],'Table A'[Colmn A1],1,0)<>0

)

)

January 24, 2013 6:21 AM
 

Tommy said:

Hurray :) It Finally Works ! Great great thanks !

Marco, i would like to understand abit more in detail about how the 'evualation' context works for the above scenario.

Since both of the table are not related for the scenario above, i need to use FILTER function to 'bring' in the Table B into the 'evaluation' context of calculated column in Table A.

In the example above, we use All (Table B [Column B1] to 'bring' in only 1 column, which is B1, of Table B, instead of the whole Table B.  So, my question is: how would the expression evaluated under the outer CALCULATE function manage to 'get' the Column B2 of Table B from the filter context that it is evaluating since the whole Table B is not brought into the context?

Regards,

Tommy

January 24, 2013 9:14 PM
 

Marco Russo (SQLBI) said:

Evaluation context is composed by row context and filter context.

In the previous formula I Always use row context in filter conditions and never use filter context.

A detailed explanation of that is included in our books (both PowerPivot and BISM Tabular) and available also in this video (subject to subscription): http://projectbotticelli.com/knowledge/dax-evaluation-context-data-analysis-expressions-video-tutorial

A shorter introduction is available on TechNet here:

http://technet.microsoft.com/en-us/library/gg413423.aspx

January 25, 2013 7:41 AM
 

Tommy said:

Hi Marco,

I thought that when use CALCULATE function on calculated column, which you use it in your previous formula, will convert the row context to filer context?

I still don't understand that in the example above, we use All (Table B [Column B1] to 'bring' in only 1 column, which is B1, of Table B, instead of the whole Table B to the evaluation context of CALCULATE function.  So, my question is: how would the expression evaluated under the outer CALCULATE function manage to 'get' the Column B2 of Table B from the evaluating filter context since the whole Table B is not brought into the context?

January 25, 2013 6:08 PM
 

Marco Russo (SQLBI) said:

The ALL( 'Table B'[Colmn B1] ) returns a table with just one column.

The result of the filter still contain a few values from that column, which is passed as an argument of CALCULATE.

Because the filter context is modified with such values, there is a side effect (called cross-filter) also on other columns of the 'Table B' table. So when you call VALUES( 'Table B'[Colmn B2] ) in the expression evaluated by CALCULATE, because the filter context has a filter on Colmn B1, only values in B2 that belongs to row filtered by Colmn B1 will be returned.

Remember that evaluation order of CALCULATE is all filters (from second argument onwards) and after these filters have been applied to the filter context, then the first argument is evaluated and its result returned by CALCULATE.

January 28, 2013 11:33 AM
 

Marshall said:

Hi Marco,

This post is giving me some ideas to solve a DAX problem I am working on. I am trying to FIND a string that is no longer there!!

The better explanation is; I have Monhtly Batched data, and when a record (these are Tasks from a Task List) is Closed, it will longer make it to my export of data for the month. So I need to compare last months data to Find the item ID and somehow see if it is not present in this months data. If it is not present, I am trying to use a Calculated column to have a value of 'Closed', and if it is present, to just be Blank().

I have been playing with an IF and FIND functions with no success so far. Any ideas?

Thanks in advance for any ideas.

Marshall

June 2, 2014 11:40 AM
 

Marco Russo (SQLBI) said:

If you create a calculated column, the simplest way is to iterate the table using FILTER counting the rows in the result. Something like (adding the controls over table to consider only the period you need):

= IF ( COUNTROWS ( FILTER ( table, FIND( "yourword", table[column], 1, 0 ) <> 0 ) ) = 0, "Closed", BLANK() )

June 2, 2014 2:07 PM
 

Marshall said:

Thanks Marco,

I'll play around with your suggetion and see what I can make of it.

In my workbook it looks like this, but gives me an error related to the Filter function.

=IF ( COUNTROWS ( FILTER ( MasterData,FIND([Action Identifier], MasterData[Action Identifier], 1, 0 ) <> 0)MasterData,MasterData[Batch MonthYear]=[Last Month]) = 0, "Closed", BLANK() )

MasterData is the name of the table.

Action Identifier is the column of unique ID (the target of my FIND, so it is not a partial text phrase I am trying to find but the whole cell value)

And  the BatchMonth and LastMonth is the way I am trying to tell it to look for items from last month (based on whatever Today is).

Thanks again, and thank you for all the great content on your site.

Marshall

June 2, 2014 7:33 PM
 

Marco Russo (SQLBI) said:

THere is a missing comma in the formula

<> 0)******MasterData,

Please specify which error you get

June 3, 2014 1:16 AM
 

Marshall said:

Marco,

Thanks again for the direction and assistance. I was having some other issues with my PP workbook so I abandoned the approach of using the calculated column but did find another solution which I won't boar you with.

I did realize there was a comma missing, but the FILTER function was not being recognized when I put the comma in so that is why I had left it out. Of course that led to other problems.

I will be watching and reading for more of your insights - keep up the good work.

Marshall

June 3, 2014 7:01 PM

Leave a Comment

(required) 
(required) 
Submit

About Marco Russo (SQLBI)

Marco Russo is a consultant, writer and trainer specialized in Business Intelligence with Microsoft technologies. He runs the SQLBI.COM website, which is dedicated to distribute resources useful for BI developers, like Integration Services components, Analysis Services models, tools, technical information and so on. Marco is certified as MCT, MCDBA, MCSD.NET, MCSA, MCSE+I.

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement