THE SQL Server Blog Spot on the Web
Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Adam Machanic

Adam Machanic, Boston-based independent database consultant, writer, and speaker, shares his experiences with programming, performance tuning, and optimizing SQL Server 2000, 2005, and 2008, in conjunction with related technologies such as .NET.

Pattern-based split string

Originally posted here.

 


"hickymanz" asked in the SQL Server Central forums for a method of counting unique words in a text column. Wayne Lawton recommended using a string split function, which was a good idea, but not quite adequate for the job in my opinion.

Typical string split functions, like this one that I wrote can handle only a single delimiter, e.g. a comma. But in the case of splitting for unique words you want all sorts of possible delimiters -- punctuation marks, white space including spaces, possibly numerics, etc.

I believe I've seen other, similar requests for getting unique words before, so I decided to solve the problem. I modified that split string function to accept a pattern of valid characters for the second argument. Anything NOT in the pattern will be treated as a delimiter. I figured that would be a bit more flexible than the other way around, so that people don't have to type in (or figure out) every single possible white space and/or punctuation character.

Anyway, I think the modification is pretty straightforward if you understand the previous function, so check it out and if you have any questions feel free to ask. Here is the function:

 

CREATE FUNCTION dbo.SplitStringPattern
(
@List TEXT,
@Pattern VARCHAR(50)
)
RETURNS @ReturnTbl TABLE (OutParam VARCHAR(20))
WITH SCHEMABINDING
AS
BEGIN
DECLARE @LeftSplit VARCHAR(7998)
DECLARE @SplitStart INT SET @SplitStart = 0
DECLARE @SplitEnd INT
SET @SplitEnd = 7998

SET @Pattern = '%' + '[^' + RIGHT(@Pattern, LEN(@Pattern) - 1) + '%'

DECLARE @Delimiter CHAR(1)
SELECT @Delimiter = CHAR(MAX(Number))
FROM dbo.Numbers
WHERE CHAR(Number) LIKE @Pattern

SELECT @SplitEnd = MAX(Number)
FROM dbo.Numbers
WHERE (SUBSTRING(@List, Number, 1) LIKE @Pattern
OR Number = DATALENGTH(@List) + 1)
AND Number BETWEEN @SplitStart AND @SplitEnd

WHILE @SplitStart < DATALENGTH(@List) - 1
BEGIN
SET @LeftSplit = @Delimiter + SUBSTRING(@List, @SplitStart, @SplitEnd - @SplitStart) + @Delimiter

INSERT @ReturnTbl (OutParam)
SELECT LTRIM(RTRIM(SUBSTRING(@LeftSplit, Number + 1,
PATINDEX(@Pattern, SUBSTRING(@LeftSplit, Number + 1, LEN(@LeftSplit))) - 1)))
AS Value
FROM dbo.Numbers
WHERE Number <= LEN(@LeftSplit) - 1
AND SUBSTRING(@LeftSplit, Number, 1) LIKE @Pattern
AND SUBSTRING(@LeftSplit, Number + 1,
PATINDEX(@Pattern, SUBSTRING(@LeftSplit, Number + 1, LEN(@LeftSplit))) - 1) <> ''

SET @SplitStart = @SplitEnd + 1
SET @SplitEnd = @SplitEnd + 7998

SELECT @SplitEnd = MAX(Number) + @SplitStart
FROM dbo.Numbers
WHERE (SUBSTRING(@List, Number + @SplitStart, 1) LIKE @Pattern
OR Number+@SplitStart = DATALENGTH(@List) + 1)
AND Number BETWEEN 1 AND @SplitEnd - @SplitStart
END

RETURN
END

A note on the pattern for input: The pattern should be single-character based, and must be delimited with [ ]. Also, you must type in the actual characters in the pattern. I do not recommend using ranges; ranges cause strange side-effects because they tend to include (depending on collation) characters that you wouldn't expect to show up (e.g. characters with umlauts over them). So unless that's what you really want, don't use ranges.

Here's a usage example:

SELECT OutParam
FROM dbo.SplitStringPattern('This is a test.', '[abcdefghijklmnopqrstuvwxyz]')


OutParam
-------------
This
is
a
test

Enjoy!



Published Wednesday, July 12, 2006 10:31 PM by Adam Machanic
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Chris Law said:

Hi Adam,

I have two tables within a SQL database. The 1st table has an identified column and column which lists one of more email identifers for a second table,

e.g.

ID     Email

--     ----------

1      AS1 AS11

2      AS2 AS3 AS4 AS5

3      AS6 AS7

The second table has a column which has an email identifier and another column which lists one email address for that particular identifier, e.g.

ID      EmailAddress

---     ------------------

AS1      abcstu@emc.com

AS2      abcstu2@emc.com

AS3      abcstu3@emc.com

AS4      abcstu4@em.com

AS5      abcstu5@emc.com

AS6      abcstu6@emc.com

AS7      abcstu7@emc.com

AS11     abcstu8@emc.com

I need to create a stored procedure or function that:

1. Selects an Email from the first table, based on a valid ID,

2. Splits the Email field of the first table (using the space separator) so that there is an array of Emails and then,

3. Selects the relevant EmailAddress value from the second table, based on a valid Email stored in the array

Is there any way that this can be done directly within SQL Server using a stored procedure/function without having to use cursors?

Many Thanks,

Chris Law

probetatester@yahoo.com

September 4, 2007 7:26 PM
 

Adam Machanic said:

Hi Chris,

Sure: Use a standard SplitString function (such as the one posted here: http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/splitting-a-string-of-unlimited-length.aspx), and then use the following query:

SELECT j.EmailAddress
FROM
(
SELECT x.OutParam
FROM YourIdentifierTable i
CROSS APPLY dbo.SplitString(i.Email) x
WHERE i.ID = 123
) y
JOIN YourEmailAddressTable j ON y.OutParam = j.ID

September 10, 2007 3:12 PM
 

Asghar said:

Hi Adam this is a great function but I have got string with such as ("hello","sdfsfs",....)

My problem is how can I pass two parremeters such as , and "".

and how can I insert the values into a table

Thanks

November 28, 2007 11:43 PM
 

Adam Machanic said:

Asghar:

I think you need the Tokenizer (or perhaps a slight modification of it):

http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/tokenize-udf.aspx

December 4, 2007 10:32 AM

Leave a Comment

(required) 
(optional)
(required) 
Submit

About Adam Machanic

Adam Machanic is a Boston-based independent database consultant, writer, and speaker. He has been involved in dozens of SQL Server implementations for both high-availability OLTP and large-scale data warehouse applications, and has optimized data access layer performance for several data-intensive applications. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "Expert SQL Server 2005 Development" (Apress, 2007) and "Inside SQL Server 2005: Query Tuning and Optimization" (Microsoft Press, 2007). Adam regularly speaks at user groups, community events, and conferences on a variety of SQL Server and .NET-related topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server and a Microsoft Certified IT Professional (MCITP).
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement