THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Adam Machanic

Adam Machanic, Boston-based independent database consultant, writer, and speaker, shares his experiences with programming, performance tuning, and optimizing SQL Server 2000, 2005, and 2008, in conjunction with related technologies such as .NET.

Counting occurrences of a substring within a string

I have absolutely no idea why anyone wants to do this, but I keep answering the same question in forums: "How do I count the occurrences of a substring [note: usually comma] within a string?"

 

In an effort to thwart carpal tunnel syndrome, I have created the Ultimate Substring Occurrence Counting UDF.

... And here it is:

 

CREATE FUNCTION dbo.GetSubstringCount
(
@InputString TEXT,
@SubString VARCHAR(200),
@NoisePattern VARCHAR(20)
)
RETURNS INT
WITH SCHEMABINDING
AS
BEGIN
RETURN
(
SELECT COUNT(*)
FROM dbo.Numbers N
WHERE
SUBSTRING(@InputString, N.Number, LEN(@SubString)) = @SubString
AND PATINDEX(@NoisePattern, SUBSTRING(@InputString, N.Number + LEN(@SubString), 1)) = 0
AND PATINDEX(@NoisePattern, SUBSTRING(@InputString, N.Number - 1, 1)) = 0
)
END

First note: You need (regular readers, you guessed it) a numbers table.

Okay, so what's it do? Simply put, it returns the number of times @SubString appears within @InputString. But wait! -- Act now and you will receive an additional bonus feature at no extra cost! Can you feel the love?

The @NoisePattern parameter allows the user to put the UDF into "exact match" mode.

For instance, let's say you have a big string containing some text about automobile manufacturers, and for some reason (again, I have no clue why people need this functionality -- fill me in if you do!) you want to count the number of occurrences of the word "auto", but not the number of occurrences of other forms of the word, e.g. "automobile" or .... some word that ends in "auto" (if such a word exists).

By specifying a pattern for @NoisePattern of characters that shouldn't be adjacent to your word, you're telling the UDF that any other characters are safe. Leaving the parameter empty means that all occurrences of the substring will be counted. Examples:

 

SELECT dbo.GetSubstringCount('We like the autos, the autos that go boom.', 'auto', '')
-- Returns 2

SELECT dbo.GetSubstringCount('Autos are fun. I like to drive my auto.', 'auto', '')
-- Also returns 2

SELECT dbo.GetSubstringCount('Autos are fun. I like to drive my auto.', 'auto', '%[a-z]%')
-- Only returns 1 -- The exact match must not have adjacent alphabetic characters


Published Wednesday, July 12, 2006 10:22 PM by Adam Machanic
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

beckham said:

Good one..

May 7, 2007 12:59 AM
 

Jereme Guenther said:

I am looking for this functionality, however I would prefer to do it without a Numbers table.

My need is that of a search engine sproc.  I need to be able to weight the returned records and one of the items in the weighting system is to check how many times the search pattern occurs in the record.

June 17, 2008 11:12 AM
 

Pickles said:

I need this too. I have a column that basically contains sql and I need to pull out field_ids from it, but each row has a different number of field_ids in the sql. Is there a different(better) solution?

July 16, 2008 6:28 PM
 

Adam Machanic said:

I don't know why you would want to do it without the numbers table.  It makes the solution an order of magnitude faster.

July 23, 2008 2:30 PM
 

Craig Hathaway said:

Hi Adam, I have tried this code but even the included examples do not work!

SELECT dbo.GetSubstringCount('We like the autos, the autos that go boom.', 'auto', '')

-- returns 0   <-- should be 2

SELECT dbo.GetSubstringCount('Autos are fun.  I like to drive my auto.', 'auto', '')

-- returns 0   <-- should be 2

SELECT dbo.GetSubstringCount('Autos are fun.  I like to drive my auto.', 'auto', '%[a-z]%')

-- returns 1   <-- should be 1

Is there a SQL setting / version dependency that O could be missing?

Craig

February 22, 2009 9:06 PM
 

Adam Machanic said:

Hi Craig,

Thanks for pointing this out.  It is indeed a version issue.  I wrote this back in the bad old days of SQL Server 2000, and apparently there was a change to the way PATINDEX works between 2000 and 2005. I just tested:

SELECT PATINDEX('', ' ')

SQL Server 2000 returns 0, whereas SQL Server 2005 returns 1.  This is breaking the third predicate in the WHERE clause within the function, which checks to see if the target string is prepended by anything that matches the input pattern.

I'll have to think about how to fix this, but as a temporary workaround if you don't want to use exact-match mode, you could pass in some character that you know can't possibly exist in the target string:

SELECT dbo.GetSubstringCount('We like the autos, the autos that go boom.', 'auto', CHAR(255))

--Returns 2, even in SQL Server 2005 or 2008

In the meantime, I'm wondering if this "new" behavior makes sense?  I'm not sure, but I'm leaning towards SQL Server 2000's answer.  An empty pattern shouldn't, in my opinion, match on anything at all...

February 23, 2009 11:18 AM

Leave a Comment

(required) 
(optional)
(required) 
Submit

About Adam Machanic

Adam Machanic is a Boston-based independent database consultant, writer, and speaker. He has been involved in dozens of SQL Server implementations for both high-availability OLTP and large-scale data warehouse applications, and has optimized data access layer performance for several data-intensive applications. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "Expert SQL Server 2005 Development" (Apress, 2007) and "Inside SQL Server 2005: Query Tuning and Optimization" (Microsoft Press, 2007). Adam regularly speaks at user groups, community events, and conferences on a variety of SQL Server and .NET-related topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server, a Microsoft Certified IT Professional (MCITP), and a member of the INETA North American Speakers Bureau.

This Blog

Syndication

News

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement