Coalesce is not the answer to string concatentation in T-SQL

I’ve seen many posts over the years about using the COALESCE function to get string concatenation working in T-SQL. This is one of the examples here (borrowed from Readifarian Marc Ridey).

DECLARE @categories varchar(200)
SET @categories = NULL

SELECT @categories = COALESCE(@categories + ‘,’,”) + Name
FROM Production.ProductCategory

SELECT @categories

This query can be quite effective, but care needs to be taken, and the use of COALESCE should be properly understood. COALESCE is the version of ISNULL which can take more than two parameters. It returns the first thing in the list of parameters which is not null. So really it has nothing to do with concatenation, and the following piece of code is exactly the same – without using COALESCE:

DECLARE @categories varchar(200)
SET @categories = ”

SELECT @categories = @categories + ‘,’ + Name
FROM Production.ProductCategory

SELECT @categories

But the unordered nature of databases makes this unreliable. The whole reason why T-SQL doesn’t (yet) have a concatenate function is that this is an aggregate for which the order of elements is important. Using this variable-assignment method of string concatenation, you may actually find that the answer that gets returned doesn’t have all the values in it, particularly if you want the substrings put in a particular order. Consider the following, which on my machine only returns ‘,Accessories’, when I wanted it to return ‘,Bikes,Clothing,Components,Accessories’:

DECLARE @categories varchar(200)
SET @categories = NULL

SELECT @categories = COALESCE(@categories + ‘,’,”) + Name
FROM Production.ProductCategory
ORDER BY LEN(Name)

SELECT @categories

Far better is to use a method which does take order into consideration, and which has been included in SQL2005 specifically for the purpose of string concatenation – FOR XML PATH(”)

SELECT ‘,’ + Name
FROM Production.ProductCategory
ORDER BY LEN(Name)
FOR XML PATH(”) 

In the post I made recently comparing GROUP BY and DISTINCT when using subqueries, I demonstrated the use of FOR XML PATH(”). Have a look at this and you’ll see how it works in a subquery. The ‘STUFF’ function is only there to remove the leading comma.

USE tempdb;
GO
CREATE TABLE t1 (id INT, NAME VARCHAR(MAX));
INSERT t1 values (1,’Jamie’);
INSERT t1 values (1,’Joe’);
INSERT t1 values (1,’John’);
INSERT t1 values (2,’Sai’);
INSERT t1 values (2,’Sam’);
GO

select
    id,
    stuff((
        select ‘,’ + t.[name]
        from t1 t
        where t.id = t1.id
        order by t.[name]
        for xml path(”)

    ),1,1,”) as name_csv
from t1
group by id
;

FOR XML PATH is one of the only situations in which you can use ORDER BY in a subquery. The other is TOP. And when you use an unnamed column and FOR XML PATH(”), you will get a straight concatenation, with no XML tags. This does mean that the strings will be HTML Encoded, so if you’re concatenating strings which may have the < character (etc), then you should maybe fix that up afterwards, but either way, this is still the best way of concatenating strings in SQL Server 2005.

16 thoughts on “Coalesce is not the answer to string concatentation in T-SQL

  1. The first two queries are NOT identical. I think a simple run would prove that. The second one has an extra “,” prepended. Arg. I agree that there isn’t much point to using coalesce specifically, but it certainly makes life easier.

    To do it that way you would have to:
    set @v = ”
    select @v = @v + (case @v when ” then ” else ‘,’ end) + t.[name]

    If you rely on the concat null yields null, then you might end up with something like this to avoid the “coalesce”, if you find that function distasteful for this purpose (and there is no reason for that besides preference, though isnull is clearer)

    set @v = null
    set @v = isnull(@v+’,’, ”) + t.[name]

    Also, for those of you who are stuck with sql 2000 (like me!) you can do this simple workaround: A sub query with a top + orderby, like this:

    select
    @v = isnull(@v+’,’,”) + x.[name]
    from
    ( select top 100 percent [name],[other fields]
    from dbo.table
    order by [name] desc
    ) x

    Not quite as nice as we would like, but I can’t say that FOR XML PATH(“”) looks any cleaner (it’s worse, to me) or easier to understand.

    I am sure there is some reason why we shouldn’t do that too, but I seem to find an “official” reason not to do _whatever it is_ that I try to do in tsql beyond a simple select. However, if that method is seriously harmful, I would be interested in knowing why.

  2. Thanks for your comments, Andrew.

    The problem is that the database makes no guarantee that it won’t simplify your query to the point of giving you the wrong result.

    Using TOP 100 PERECENT in the derived table doesn’t work in SQL 2005, because the system understands that you want the whole dataset – so it ignores the ORDER BY in the derived table.

    And yes, the first two queries are slightly different, but this can be resolved very easily with STUFF. FOR XML PATH doesn’t look any clearer, I agree – but it does guarantee the order of the concatenation – and in my book, the predictable results will always win out.

  3. Absolutely EXCELLENT acticle! Thank you for the reference to STUFF, been looking for a way to do this for ages.

  4. Your solution is not correct…
    If you add any of <> ‘ to your string name…
    You will not return the correct answer..
    Just run the following…

    Jorge

    ===

    CREATE TABLE t1 (id INT, name VARCHAR(MAX));
    INSERT t1 values (1,’Jamie’);
    INSERT t1 values (1,’Joe’);
    INSERT t1 values (1,’John’);
    INSERT t1 values (2,’Sai’);
    INSERT t1 values (2,’Sam’);
    INSERT t1 values (2,’Can you handle < Check it out'); INSERT t1 values (2,'What about > Check it out’);
    GO

    select
    id,
    stuff((
    select ‘,’ + t.[name]
    from t1 t
    where t.id = t1.id
    order by t.[name]
    for xml path(”)
    ),1,1,”) as name_csv
    from t1
    group by id
    ;

  5. Yes, this is the one drawback for using FOR XML PATH(”) for string concatenation. You can always use an HTMLDecode-style function to put them back – something like:

    replace(replace(replace(@str,'<‘,’< '),'>','>‘),’&’,’&’)

    (making sure that & is handled last so that you don’t htmldecode too much stuff)

    It’s still very useful though – but I’m definitely looking forward the day when

    Rob

  6. You don’t want to use replace since you don’t know exactly what was on the original value and you may also miss some characters that need bve replaced.

    Here is what I found to fix it. The solution is to get the value out of the XML that it is produced, using XPATH query. I just cast the value to VARCHAR…and you are done.

    …and here is the sample
    Thanks

    Jorge

    –DROP TABLE t1;
    CREATE TABLE t1 (id INT, name VARCHAR(MAX));

    INSERT t1 values (1,’Jamie’);
    INSERT t1 values (1,’Joe’);
    INSERT t1 values (1,’John’);
    INSERT t1 values (2,’Sai’);
    INSERT t1 values (2,’Sam’);
    INSERT t1 values (2,’Can you handle < Check it out'); INSERT t1 values (2,'What about > Check it out’);

    GO

    select
    id,
    stuff((
    select ‘,’ + t.[name] as name
    from t1 t
    where t.id = t1.id
    order by t.[name]
    for xml path(”), TYPE).value(‘name[1]’, ‘VARCHAR(MAX)’)
    ,1,1,”) as name_csv
    from t1
    group by id;

  7. For those of use with DBAs that will not allow the use of XMLDATA or any other XML option within the database (I’m one of them obviously), if you need to have an order for the list, use a table variable to store the ordered list, then make list from the table variable. Example:

    declare @stuff table
    (
    RowId int identity(1,1),
    ItemName varchar(30)
    )
    DECLARE @ItemLst varchar(2000)
    select @ItemLst = NULL — hate using set

    insert into @stuff(ItemName)
    select ItemName
    from Item
    order by ItemName

    SELECT @ItemLst = COALESCE(@ItemLst + ‘,’,”) + ItemName
    FROM @stuff

    As I said, I’m not allowed to use any XML options in the database, otherwise I’d have used those options.

    Have fun.

    Bad wine is for driving away bad relations – French Proverb, “I drank what?”

    GPS
    (yes these are my real initials…)

  8. GPS,

    Better to:
    select @ItemLst = ”

    …and then not use COALESCE, or maybe use it on the new ItemName.

    Like:

    SELECT @ItemLst = @ItemLst + ISNULL(‘,’ + ItemName,”)

    And then use STUFF to remove the leading comma like SELECT @ItemLst = STUFF(@ItemLst,1,1,”)

    Otherwise you’re testing for NULL way too often.

    But this is still a lot of work compared to using FOR XML PATH(”)

    Rob

  9. Oh, and GPS… there’s still no absolute guarantee that the data will be returned in the correct order out of the table variable.

    Rob

  10. I’m used to using coalesce and I’ve run into stuff in a database. I thought I’d do some research and I’ve landed here. It seems like the primary argument against coalesce is null checking?

    Anyway, stuff looks ok, except it only deals in varchar. I’ve got nvarchars, this seems like a deal breaker. Also, the manual manipulation to restore the XML escaped characters seems as though it would torpedo any performance benefit over coalesce…

    Is the null check optimization a YAGNI pre-optimization, or am I missing something here? Also, is there an NVARCHAR variation to stuff?

    Thanks,
    Ben

  11. Hi Ben,

    The thing is that the concatenation isn’t happening because of COALESCE, because COALESCE is just a generalised form of ISNULL.

    I should update the post to include a useful way of decoding the other characters.

    Rob

Comments are closed.