THE SQL Server Blog Spot on the Web
Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Adam Machanic

Adam Machanic, Boston-based independent database consultant, writer, and speaker, shares his experiences with programming, performance tuning, and optimizing SQL Server 2000, 2005, and 2008, in conjunction with related technologies such as .NET.

Pattern-based replacement UDF

Originally posted here.


As a personal challenge, I decided to write a UDF that will work just like T-SQL's REPLACE() function, but using patterns as input.

The first question: How does REPLACE() handle overlapping patterns?

 

SELECT REPLACE('babab', 'bab', 'c')

--------------------------------------------------
cab

(1 row(s) affected)


SELECT REPLACE('bababab', 'bab', 'c')

--------------------------------------------------
cac

(1 row(s) affected)

It appears that SQL Server parses the input string from left to right, replacing the first instance of the replacement string, and then continues parsing to the right.

Next question: How to do the replacement on a pattern? As it turns out, this is somewhat trickier than I initially thought. A replacement requires a starting point -- easy to find using PATINDEX -- and an end point. But there is no function for finding the last character of a pattern. So you'll see that the UDF loops character-by-character, testing PATINDEX, in order to find the end of the match. This is useful for situations like:

 

SELECT dbo.PatternReplace('baaa', 'ba%', 'c')

-- We know that the match starts at character 1... but where does it end?

Anyway, enough background, here's the code:

 

CREATE FUNCTION dbo.PatternReplace
(
@InputString VARCHAR(4000),
@Pattern VARCHAR(100),
@ReplaceText VARCHAR(4000)
)
RETURNS VARCHAR(4000)
AS
BEGIN
DECLARE @Result VARCHAR(4000) SET @Result = ''
-- First character in a match
DECLARE @First INT
-- Next character to start search on
DECLARE @Next INT SET @Next = 1
-- Length of the total string -- 8001 if @InputString is NULL
DECLARE @Len INT SET @Len = COALESCE(LEN(@InputString), 8001)
-- End of a pattern
DECLARE @EndPattern INT

WHILE (@Next <= @Len)
BEGIN
SET @First = PATINDEX('%' + @Pattern + '%', SUBSTRING(@InputString, @Next, @Len))
IF COALESCE(@First, 0) = 0 --no match - return
BEGIN
SET @Result = @Result +
CASE --return NULL, just like REPLACE, if inputs are NULL
WHEN @InputString IS NULL
OR @Pattern IS NULL
OR @ReplaceText IS NULL THEN NULL
ELSE SUBSTRING(@InputString, @Next, @Len)
END
BREAK
END
ELSE
BEGIN
-- Concatenate characters before the match to the result
SET @Result = @Result + SUBSTRING(@InputString, @Next, @First - 1)
SET @Next = @Next + @First - 1

SET @EndPattern = 1
-- Find start of end pattern range
WHILE PATINDEX(@Pattern, SUBSTRING(@InputString, @Next, @EndPattern)) = 0
SET @EndPattern = @EndPattern + 1
-- Find end of pattern range
WHILE PATINDEX(@Pattern, SUBSTRING(@InputString, @Next, @EndPattern)) > 0
AND @Len >= (@Next + @EndPattern - 1)
SET @EndPattern = @EndPattern + 1

--Either at the end of the pattern or @Next + @EndPattern = @Len
SET @Result = @Result + @ReplaceText
SET @Next = @Next + @EndPattern - 1
END
END
RETURN(@Result)
END

... And here's how you run it, with some sample outputs showing that it does, indeed, appear to work:

 

SELECT dbo.PatternReplace('babab', 'bab', 'c')

--------------------------------------------------
cab

(1 row(s) affected)


SELECT dbo.PatternReplace('babab', 'b_b', 'c')

--------------------------------------------------
cab

(1 row(s) affected)


SELECT dbo.PatternReplace('bababe', 'b%b', 'c')


--------------------------------------------------
cabe

(1 row(s) affected)

Hopefully this will help someone, somewhere. I haven't found any use for it yet :)

Thanks to Steve Kass for posting some single-character replacement code which I based this UDF on.

 


Update, January 10, 2005: Thanks to Frank Kalis, I've tracked down some problems with the original UDF. The version posted here has been fixed and now should respond identically to the T-SQL REPLACE function when NULLs or non-pattern-based arguments are passed in. The following example pairs should return the same values (and do, at this point!)

 

SELECT dbo.PatternReplace(NULL, '', 'abc')
SELECT REPLACE(NULL, '', 'abc')

SELECT dbo.PatternReplace('abc', '', NULL)
SELECT REPLACE('abc', '', NULL)

SELECT dbo.PatternReplace('abc', NULL, '')
SELECT REPLACE('abc', NULL, '')

SELECT dbo.PatternReplace('abc', 'b', '')
SELECT REPLACE('abc', 'b', '')

SELECT dbo.PatternReplace('adc', 'b', '')
SELECT REPLACE('adc', 'b', '')

Published Wednesday, July 12, 2006 10:07 PM by Adam Machanic
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

No Comments

Leave a Comment

(required) 
(optional)
(required) 
Submit

About Adam Machanic

Adam Machanic is a Boston-based independent database consultant, writer, and speaker. He has been involved in dozens of SQL Server implementations for both high-availability OLTP and large-scale data warehouse applications, and has optimized data access layer performance for several data-intensive applications. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "Expert SQL Server 2005 Development" (Apress, 2007) and "Inside SQL Server 2005: Query Tuning and Optimization" (Microsoft Press, 2007). Adam regularly speaks at user groups, community events, and conferences on a variety of SQL Server and .NET-related topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server and a Microsoft Certified IT Professional (MCITP).
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement