THE SQL Server Blog Spot on the Web
Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Adam Machanic

Adam Machanic, Boston-based independent database consultant, writer, and speaker, shares his experiences with programming, performance tuning, and optimizing SQL Server 2000, 2005, and 2008, in conjunction with related technologies such as .NET.

Rowset string concatenation: Which method is best?

Originally posted here.

Yeah, yeah, yeah, let's get this out of the way right from the start: Don't concatenate rows into delimited strings in SQL Server. Do it client side.

Except if you really have to create delimited strings in SQL Server. In which case you should read on.

There was a little discussion on SQLTeam about the best way to concatenate. I recommended a scalar UDF solution, whereas Rob Volk recommended a solution involving a temp table.

I mentioned my dislike for the temp table solution for a couple of reasons. First of all, it relies on a clustered index for ordering. That will probably work in this example, but is not guaranteed to always work and relying on indexes rather than ORDER BY for ordering is definitely not a habit I want anyone to get into. The clustered index as it was described in Rob's example also has another problem that I didn't even notice until I was writing this entry. But I'll get to that in a moment. The second reason I dislike the temp table is that I felt it would be less efficient than the scalar UDF.

Rob didn't agree about the efficiency. And so I set out to prove him wrong...

We'll use the Authors table in Pubs. I want a comma-delimited list, per state, of the last name of each author who lives there.

First, the scalar UDF:

 

USE pubs
GO

CREATE FUNCTION dbo.ConcatAuthors(@State CHAR(2))
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @Output VARCHAR(8000)
SET @Output = ''

SELECT @Output = CASE @Output
WHEN '' THEN au_lname
ELSE @Output + ', ' + au_lname
END
FROM Authors
WHERE State = @State
ORDER BY au_lname

RETURN @Output
END
GO

To find the list I want:

SELECT DISTINCT State, dbo.ConcatAuthors(State)
FROM Authors
ORDER BY State

... And the adaptation of Rob's temp table method... I did change two things due to problems I discovered during testing. One, I've altered the au_lname column to VARCHAR(8000); the column in the Authors table is VARCHAR(40), not large enough for all of the California authors. What if we were dealing with a much larger dataset? Second, I added an IDENTITY column, and I'm clustering on that instead of the actual data to get the ordering. I'm doing so because of the VARCHAR(8000). Index rows can be a maximum of 900 bytes, so if we had enough data to exceed that length, this method would fail.

 

CREATE TABLE #AuthorConcat
(
State CHAR(2) NOT NULL,
au_lname VARCHAR(8000) NOT NULL,
Ident INT IDENTITY(1,1) NOT NULL PRIMARY KEY
)

INSERT #AuthorConcat
(
State,
au_lname
)
SELECT
State,
au_lname
FROM Authors
ORDER BY
State,
au_lname

DECLARE @Authors VARCHAR(8000)
SET @Authors = ''
DECLARE @State CHAR(2)
SET @State = ''

UPDATE #AuthorConcat
SET @Authors = au_lname = CASE
WHEN @State = State THEN @Authors + ', ' + au_lname
ELSE au_lname END,
@State = State

SELECT State, MAX(au_lname)
FROM #AuthorConcat
GROUP BY State

Clever, but more complex and harder to read than the scalar UDF version. Output is identical, but that's not why we're here. Which one is more efficient?

Drumroll, please...

Results were tabulated using STATISTICS IO, STATISTICS TIME, and Query Analyzer's Show Execution Plan. DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE were run before each test.

Scalar UDF Method
Total cost: 0.0492
Total Scan count: 1
Total Logical reads: 2
Total Physical reads: 2
Total time: 25 ms

Temp Table Method
Total cost: 0.2131
Total Scan count: 4
Total Logical reads: 9
Total Physical reads: 2
Total time: 88 ms

So in conclusion, neither method is incredibly taxing with the tiny Pubs dataset, but I think I have proven that the UDF is far more efficient.

 


Update, February 28, 2005: Modified the adapation of Rob Volk's method to use a CREATE TABLE instead of SELECT INTO, as the latter is not necessarily guaranteed to insert rows in the right order for the sake of this example. Thanks to "PW" on SQLServerCentral for pointing this problem out. Note that this changed the total costs very slightly -- for the better -- but the UDF still performs better by quite a large margin.

Published Wednesday, July 12, 2006 10:15 PM by Adam Machanic

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Ulises said:

There is yet another approach to test, that performance wise seems to provide its advantages:

DECLARE @EmployeeList varchar(100)

SELECT @EmployeeList = COALESCE(@EmployeeList + ', ', '') +

  CAST(Emp_UniqueID AS varchar(5))

FROM SalesCallsEmployees

SELECT @EmployeeList

I saw it on this website: http://www.sqlteam.com/item.asp?ItemID=2368

March 9, 2007 10:27 AM
 

Adam Machanic said:

Hi Ulises,

That should be exactly the same as the CASE expression from a perf POV.

March 18, 2007 12:16 PM
 

Peter Larsson said:

What about this SQL Server 2005 attempt?

-- Prepare sample data

DECLARE @Sample TABLE (ID INT, Code VARCHAR(3))

INSERT @Sample

SELECT 290780, 'LT' UNION ALL

SELECT 290780, 'AY' UNION ALL

SELECT 290781, 'ILS' UNION ALL

SELECT 290780, 'AY'

-- Show the expected output

SELECT DISTINCT s1.ID,

STUFF((SELECT DISTINCT TOP 100 PERCENT ',' + s2.CODE FROM @Sample AS s2 WHERE s2.ID = s1.ID ORDER BY ',' + s2.CODE FOR XML PATH('')), 1, 1, '') AS CODES

FROM @Sample AS s1

ORDER BY s1.ID

SELECT DISTINCT s1.ID,

STUFF((SELECT TOP 100 PERCENT ',' + s2.CODE FROM @Sample AS s2 WHERE s2.ID = s1.ID ORDER BY ',' + s2.CODE FOR XML PATH('')), 1, 1, '') AS CODES

FROM @Sample AS s1

ORDER BY s1.ID

SELECT DISTINCT s1.ID,

STUFF((SELECT ',' + s2.CODE FROM @Sample AS s2 WHERE s2.ID = s1.ID FOR XML PATH('')), 1, 1, '') AS CODES

FROM @Sample AS s1

ORDER BY s1.ID

March 28, 2007 3:54 AM
 

Adam Machanic said:

Hi Peter,

FOR XML PATH will definitely outperform virtually all other methods.  Plus, it's documented/supported.  In 2005, it's absolutely the best choice!

May 22, 2007 2:07 PM
 

Oscar said:

You made my day Adam, Thank you VERY much, i was thinking on ussing cursors to obtain each variable field to obtain the ones to concatenate linked to those fields, i didn't code anything because you kept me of doing it, your solution is eficient, fast and easy, i dont have a lot of experience developing applications, i am doing a distribution service management application on asp nowdays, and i really understood your explanation. What book would u suggest me to learn the sql things that they didnt teach me at school, which is the best method to learn sql, besides constant reading?.

Thanks again Adam, and best Regards.  

November 10, 2007 1:15 AM
 

Anonymous said:

With SQL 2005 another possibility exists.  Custom Aggregate functions using the CLR.  The Microsoft documentation even uses concatenation as their example.

January 4, 2008 11:12 AM
 

Ashish said:

Hi Adam

        I have been struggling with the below error while trying to run the SSIS from the one of our PRO server.

Error: 2008-09-11 15:14:08.96

  Code: 0xC004801F

  Source: Data Flow Task Data Flow Task (DTS.Pipeline)

  Description: The component metadata for "component "DataReader Source" (157)" could not be upgraded to the newer version of the component. The PerformUpgrade method failed.

End Error

I am unable to understand how to remove this error, can you please help me on this?

September 23, 2008 8:00 AM

Leave a Comment

(required) 
(optional)
(required) 
Submit

About Adam Machanic

Adam Machanic is a Boston-based independent database consultant, writer, and speaker. He has been involved in dozens of SQL Server implementations for both high-availability OLTP and large-scale data warehouse applications, and has optimized data access layer performance for several data-intensive applications. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "Expert SQL Server 2005 Development" (Apress, 2007) and "Inside SQL Server 2005: Query Tuning and Optimization" (Microsoft Press, 2007). Adam regularly speaks at user groups, community events, and conferences on a variety of SQL Server and .NET-related topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server and a Microsoft Certified IT Professional (MCITP).

This Blog

Syndication

News

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement