THE SQL Server Blog Spot on the Web
Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Adam Machanic

Adam Machanic, Boston-based independent database consultant, writer, and speaker, shares his experiences with programming, performance tuning, and optimizing SQL Server 2000, 2005, and 2008, in conjunction with related technologies such as .NET.

Counting occurrences of a substring within a string

Originally posted here.

 


I have absolutely no idea why anyone wants to do this, but I keep answering the same question in forums: "How do I count the occurrences of a substring [note: usually comma] within a string?"

 

In an effort to thwart carpal tunnel syndrome, I have created the Ultimate Substring Occurrence Counting UDF.

... And here it is:

 

CREATE FUNCTION dbo.GetSubstringCount
(
@InputString TEXT,
@SubString VARCHAR(200),
@NoisePattern VARCHAR(20)
)
RETURNS INT
WITH SCHEMABINDING
AS
BEGIN
RETURN
(
SELECT COUNT(*)
FROM dbo.Numbers N
WHERE
SUBSTRING(@InputString, N.Number, LEN(@SubString)) = @SubString
AND PATINDEX(@NoisePattern, SUBSTRING(@InputString, N.Number + LEN(@SubString), 1)) = 0
AND PATINDEX(@NoisePattern, SUBSTRING(@InputString, N.Number - 1, 1)) = 0
)
END

First note: You need (regular readers, you guessed it) a numbers table.

Okay, so what's it do? Simply put, it returns the number of times @SubString appears within @InputString. But wait! -- Act now and you will receive an additional bonus feature at no extra cost! Can you feel the love?

The @NoisePattern parameter allows the user to put the UDF into "exact match" mode.

For instance, let's say you have a big string containing some text about automobile manufacturers, and for some reason (again, I have no clue why people need this functionality -- fill me in if you do!) you want to count the number of occurrences of the word "auto", but not the number of occurrences of other forms of the word, e.g. "automobile" or .... some word that ends in "auto" (if such a word exists).

By specifying a pattern for @NoisePattern of characters that shouldn't be adjacent to your word, you're telling the UDF that any other characters are safe. Leaving the parameter empty means that all occurrences of the substring will be counted. Examples:

 

SELECT dbo.GetSubstringCount('We like the autos, the autos that go boom.', 'auto', '')
-- Returns 2

SELECT dbo.GetSubstringCount('Autos are fun. I like to drive my auto.', 'auto', '')
-- Also returns 2

SELECT dbo.GetSubstringCount('Autos are fun. I like to drive my auto.', 'auto', '%[a-z]%')
-- Only returns 1 -- The exact match must not have adjacent alphabetic characters


Published Wednesday, July 12, 2006 10:22 PM by Adam Machanic
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

beckham said:

Good one..

May 7, 2007 12:59 AM
 

Jereme Guenther said:

I am looking for this functionality, however I would prefer to do it without a Numbers table.

My need is that of a search engine sproc.  I need to be able to weight the returned records and one of the items in the weighting system is to check how many times the search pattern occurs in the record.

June 17, 2008 11:12 AM

Leave a Comment

(required) 
(optional)
(required) 
Submit

About Adam Machanic

Adam Machanic is a Boston-based independent database consultant, writer, and speaker. He has been involved in dozens of SQL Server implementations for both high-availability OLTP and large-scale data warehouse applications, and has optimized data access layer performance for several data-intensive applications. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "Expert SQL Server 2005 Development" (Apress, 2007) and "Inside SQL Server 2005: Query Tuning and Optimization" (Microsoft Press, 2007). Adam regularly speaks at user groups, community events, and conferences on a variety of SQL Server and .NET-related topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server and a Microsoft Certified IT Professional (MCITP).
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement