Home Uncategorized Rowset string concatenation: Which method is best?

Rowset string concatenation: Which method is best?

November 10, 2004

899

Yeah, yeah, yeah, let’s get this out of the way right from the start: Don’t concatenate rows into delimited strings in SQL Server. Do it client side.

Except if you really have to create delimited strings in SQL Server. In which case you should read on.

There was a little discussion on SQLTeam about the best way to concatenate. I recommended a scalar UDF solution, whereas Rob Volk recommended a solution involving a temp table.

I mentioned my dislike for the temp table solution for a couple of reasons. First of all, it relies on a clustered index for ordering. That will probably work in this example, but is not guaranteed to always work and relying on indexes rather than ORDER BY for ordering is definitely not a habit I want anyone to get into. The clustered index as it was described in Rob’s example also has another problem that I didn’t even notice until I was writing this entry. But I’ll get to that in a moment. The second reason I dislike the temp table is that I felt it would be less efficient than the scalar UDF.

Rob didn’t agree about the efficiency. And so I set out to prove him wrong…

We’ll use the Authors table in Pubs. I want a comma-delimited list, per state, of the last name of each author who lives there.

First, the scalar UDF:

USE pubs
GO

CREATE FUNCTION dbo.ConcatAuthors(@State CHAR(2))
RETURNS VARCHAR(8000)
AS
BEGIN
	DECLARE @Output VARCHAR(8000)
	SET @Output = ''

	SELECT @Output =	CASE @Output 
				WHEN '' THEN au_lname 
				ELSE @Output + ', ' + au_lname 
				END
	FROM Authors
	WHERE State = @State
	ORDER BY au_lname

	RETURN @Output
END
GO

To find the list I want:

SELECT DISTINCT State, dbo.ConcatAuthors(State)
FROM Authors
ORDER BY State

… And the adaptation of Rob’s temp table method… I did change two things due to problems I discovered during testing. One, I’ve altered the au_lname column to VARCHAR(8000); the column in the Authors table is VARCHAR(40), not large enough for all of the California authors. What if we were dealing with a much larger dataset? Second, I added an IDENTITY column, and I’m clustering on that instead of the actual data to get the ordering. I’m doing so because of the VARCHAR(8000). Index rows can be a maximum of 900 bytes, so if we had enough data to exceed that length, this method would fail.

CREATE TABLE #AuthorConcat
(
	State CHAR(2) NOT NULL,
	au_lname VARCHAR(8000) NOT NULL,
	Ident INT IDENTITY(1,1) NOT NULL PRIMARY KEY
)

INSERT #AuthorConcat 
(
	State,
	au_lname
)
SELECT
	State, 
	au_lname
FROM Authors
ORDER BY 
	State, 
	au_lname

DECLARE @Authors VARCHAR(8000)
SET @Authors = ''
DECLARE @State CHAR(2)
SET @State = ''

UPDATE #AuthorConcat
SET @Authors = au_lname =	CASE 
				WHEN @State = State THEN @Authors + ', ' + au_lname 
				ELSE au_lname END,
	@State = State

SELECT State, MAX(au_lname) 
FROM #AuthorConcat
GROUP BY State

Clever, but more complex and harder to read than the scalar UDF version. Output is identical, but that’s not why we’re here. Which one is more efficient?

Drumroll, please…

Results were tabulated using STATISTICS IO, STATISTICS TIME, and Query Analyzer’s Show Execution Plan. DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE were run before each test.

Scalar UDF Method
Total cost: 0.0492
Total Scan count: 1
Total Logical reads: 2
Total Physical reads: 2
Total time: 25 ms

Temp Table Method
Total cost: 0.2131
Total Scan count: 4
Total Logical reads: 9
Total Physical reads: 2
Total time: 88 ms

So in conclusion, neither method is incredibly taxing with the tiny Pubs dataset, but I think I have proven that the UDF is far more efficient.

Update, February 28, 2005: Modified the adapation of Rob Volk’s method to use a CREATE TABLE instead of SELECT INTO, as the latter is not necessarily guaranteed to insert rows in the right order for the sake of this example. Thanks to “PW” on SQLServerCentral for pointing this problem out. Note that this changed the total costs very slightly — for the better — but the UDF still performs better by quite a large margin.

22 COMMENTS

Ulises March 9, 2007 At 6:27 pm

There is yet another approach to test, that performance wise seems to provide its advantages:
DECLARE @EmployeeList varchar(100)
SELECT @EmployeeList = COALESCE(@EmployeeList + ‘, ‘, ”) +
CAST(Emp_UniqueID AS varchar(5))
FROM SalesCallsEmployees
SELECT @EmployeeList
I saw it on this website: http://www.sqlteam.com/item.asp?ItemID=2368
Adam Machanic March 18, 2007 At 7:16 pm

Hi Ulises,
That should be exactly the same as the CASE expression from a perf POV.
Peter Larsson March 28, 2007 At 10:54 am

What about this SQL Server 2005 attempt?
— Prepare sample data
DECLARE @Sample TABLE (ID INT, Code VARCHAR(3))
INSERT @Sample
SELECT 290780, ‘LT’ UNION ALL
SELECT 290780, ‘AY’ UNION ALL
SELECT 290781, ‘ILS’ UNION ALL
SELECT 290780, ‘AY’
— Show the expected output
SELECT DISTINCT s1.ID,
STUFF((SELECT DISTINCT TOP 100 PERCENT ‘,’ + s2.CODE FROM @Sample AS s2 WHERE s2.ID = s1.ID ORDER BY ‘,’ + s2.CODE FOR XML PATH(”)), 1, 1, ”) AS CODES
FROM @Sample AS s1
ORDER BY s1.ID
SELECT DISTINCT s1.ID,
STUFF((SELECT TOP 100 PERCENT ‘,’ + s2.CODE FROM @Sample AS s2 WHERE s2.ID = s1.ID ORDER BY ‘,’ + s2.CODE FOR XML PATH(”)), 1, 1, ”) AS CODES
FROM @Sample AS s1
ORDER BY s1.ID
SELECT DISTINCT s1.ID,
STUFF((SELECT ‘,’ + s2.CODE FROM @Sample AS s2 WHERE s2.ID = s1.ID FOR XML PATH(”)), 1, 1, ”) AS CODES
FROM @Sample AS s1
ORDER BY s1.ID
Adam Machanic May 22, 2007 At 9:07 pm

Hi Peter,
FOR XML PATH will definitely outperform virtually all other methods. Plus, it’s documented/supported. In 2005, it’s absolutely the best choice!
Oscar November 10, 2007 At 9:15 am

You made my day Adam, Thank you VERY much, i was thinking on ussing cursors to obtain each variable field to obtain the ones to concatenate linked to those fields, i didn’t code anything because you kept me of doing it, your solution is eficient, fast and easy, i dont have a lot of experience developing applications, i am doing a distribution service management application on asp nowdays, and i really understood your explanation. What book would u suggest me to learn the sql things that they didnt teach me at school, which is the best method to learn sql, besides constant reading?.
Thanks again Adam, and best Regards.
Anonymous January 4, 2008 At 7:12 pm

With SQL 2005 another possibility exists. Custom Aggregate functions using the CLR. The Microsoft documentation even uses concatenation as their example.
Ashish September 23, 2008 At 3:00 pm

Hi Adam
I have been struggling with the below error while trying to run the SSIS from the one of our PRO server.
Error: 2008-09-11 15:14:08.96
Code: 0xC004801F
Source: Data Flow Task Data Flow Task (DTS.Pipeline)
Description: The component metadata for "component "DataReader Source" (157)" could not be upgraded to the newer version of the component. The PerformUpgrade method failed.
End Error
I am unable to understand how to remove this error, can you please help me on this?
Chi October 30, 2008 At 11:19 pm

The FOR XML Path version can only concatenate up to 256 characters long. Any way to resolve this limitation?
Adam Machanic November 3, 2008 At 6:07 pm

Hi Chi,
That is not correct. I regularly use it to form CLOB strings, and it can (in theory — I hope not in practice) go all the way to 2 GB. If you’re hitting a 256 character wall, it’s either a data type issue (VARCHAR(256)), or your UI that’s making it look like you’ve hit a limit when you really have not.
Michael C February 2, 2009 At 12:16 pm

"And so I set out to prove him wrong…"
Interesting, I would have thought you would set out to find the best solution…..
CERU February 12, 2009 At 7:04 pm

FOR XML PATH will convert some of the XML special characters – like & < > – to something else… Any way to get around that?
Adam Machanic February 13, 2009 At 11:39 pm

CERU: Absolutely:
—
DECLARE @t TABLE (x CHAR(1))
INSERT @t
SELECT ‘&’
UNION ALL
SELECT ‘<‘
UNION ALL
SELECT ‘>’
SELECT
(
SELECT
(
SELECT x AS [text()]
FROM @t
FOR XML PATH(”), TYPE
) AS r
FOR XML RAW, TYPE
).value(‘(/row/r)[1]’, ‘varchar(max)’)
—
CERU February 19, 2009 At 5:44 pm

Great!.. Thanks
I read somewhere that FOR XML will be removed in future version.. that is a shame since I just found how to do this.. Thanks for your help!
Adam Machanic February 19, 2009 At 6:40 pm

Hi CERU,
FOR XML EXPLICIT is deprecated, but FOR XML is not and I don’t expect it to be removed anytime in the foreseeable future. XML is an absolutely vital technology at this point, and the FOR XML syntax — especially the FOR XML PATH variant — is one of the key elements of SQL Server’s XML support. Since it’s not currently on the deprecation path it has to stay around for at least four more major versions of the product, so it should be with us for at least 10 more years at a minimum. And I certainly don’t expect it to be deprecated in SQL11, so we can expect it to be around for a lot longer than that. So get comfortable with it… It’s here to stay.
RBArryYoung February 23, 2009 At 9:01 pm

Adam:
Here’s the FOR XML fix that I have been using, it seems simpler and appears to work just as well. Have I missed anything?
–=====
select (
SELECT n + ‘,’
FROM (
SELECT ‘a<b’ AS n
UNION ALL
SELECT ‘b>a’
UNION ALL
SELECT ‘b&a’
UNION ALL
SELECT ‘b
a’) r
FOR XML PATH(”), TYPE
).value(‘.[1]’,’varchar(max)’)
–=====
Adam Machanic February 23, 2009 At 10:25 pm

Looks good to me, RBarryYoung. Did you come up with that, or is there someone we can credit? I asked a group of MVPs how to solve the problem and the best anyone came up with is the version I posted to CERU (credit goes to Tony Rogerson for coming up with it). The one you posted is certainly better and when I use it I would love to know to whom I should attribute it.
RBArryYoung March 1, 2009 At 10:19 pm

I came up with it after I saw the prior version that you posted here and over at Ward Pond’s blog also.
Ali Delshad May 8, 2009 At 10:22 am

Hey man
Thank you very much 🙂
You make my work too much easier
JRK March 30, 2010 At 6:27 pm

RBArryYoung…
Nice one!.
Kevin May 24, 2011 At 1:02 pm

Hello Everyone. I tried this lines of query on 2005 and it works just fine. A particular client needs it on version 2000 but it keeps giving syntax error near ‘For’. Any help? here is my code:
update tblLabResult
set Specimen = (select distinct stuff((Select ‘/’ + sp.[Name]
from dbo.tblLabSpecimenUsed as su inner join dbo.tblLabSpecimen as sp
on su.SpecimenID = sp.SpecimenID
where su.ResultID = @NResultID
order by su.SpecimenID
for xml path(”)),1,1,”) as userlist
from tblLabSpecimenUsed
)
where ResultID = @NResultID
sai January 18, 2012 At 8:07 am

FOR XML is not an option when you are working with characters like 0x0018 or ox0016 etc. it throws an exception saying that XML cannot serialize since it has those values. i don’t have a clue at this point of time what to do with that. performance wise it will eliminate the need for while loop and i tested it with millions of data and it just works like a charm. can you find out a way for that and also how to add coleasce for the column. thanks
Adam Machanic January 18, 2012 At 7:30 pm

sai: I use REPLACE to get rid of those characters. If you peek at the code for sp_whoisactive you’ll see how I’ve done it. It’s ugly but it works.
Not sure what you mean about COALESCE. Can you be more specific?

Comments are closed.

Rowset string concatenation: Which method is best?

22 COMMENTS

Popular Posts

Looking Forward 100 Months (T-SQL Tuesday #100): The Roundup

T-SQL Tuesday #200 (v1.0): Last of the DBAs

Invitation: T-SQL Tuesday #100 – Looking Forward 100 Months

The Career Influence of Usenet Strangers (T-SQL Tuesday #096)

Playing the Third-Party Recruiter Game (T-SQL Tuesday #093)

The SQL Hall of Shame

The Guru (T-SQL Tuesday #089)

Solving the Net Changes Problem with Temporal Tables (T-SQL Tuesday #087)