I often use queries like:
SELECT *
FROM ThisTable
OUTER APPLY (SELECT (SELECT SomeField + ' ' AS [data()]
FROM SomeTable
WHERE SomeTable.ID = ThisTable.ID
FOR XML PATH ('')) AS ConcatenatedSomeField) A
I often want to get multiple concatenated concatenated fields from this table, instead of just one. I could logically do this:
SELECT *
FROM ThisTable
OUTER APPLY (SELECT (SELECT SomeField + ' ' AS [data()]
FROM SomeTable
WHERE SomeTable.ID = ThisTable.ID
FOR XML PATH ('')) AS ConcatenatedSomeField) A
OUTER APPLY (SELECT (SELECT SomeField2 + ' ' AS [data()]
FROM SomeTable
WHERE SomeTable.ID = ThisTable.ID
FOR XML PATH ('')) AS ConcatenatedSomeField2) B
OUTER APPLY (SELECT (SELECT SomeField3 + ' ' AS [data()]
FROM SomeTable
WHERE SomeTable.ID = ThisTable.ID
FOR XML PATH ('')) AS ConcatenatedSomeField3) C
But it looks crappy and error prone when anything needs to be updated; also SomeTable is often a long list of joined tables so it could also have performance implications getting the same tables over and over.
Is there a better way to do this?
Thanks.
You can create a CLR User-Defined Aggregate Function that does the concatenation for you.
Your code would then look like this instead.
You could do something like this. Instead of immediately sending the XML value to a string, this query uses the TYPE keyword to return an xml type object which can then be queried. The three query functions search the xml object for all instances of the Somefield element and return a new xml object containing just those values. Then the value function strips out the xml tags surrounding the values and passes them into a varchar(max)
This is the same answer as I gave here: https://dba.stackexchange.com/questions/125771/multiple-column-concatenation/
The OP of that question referenced the answer given here. You can see below that sometimes the simplest answer can be the best. If SomeTable is multiple tables then I would go ahead and put it into a CTE to avoid having the same complex code multiple times.
I ran a few tests using a little over 6 mil rows. With an index on the ID column.
Here is what I came up with.
Your initial query:
This one ran for ~23 minutes.
I ran this version which is the version I first learned. In some ways it seems like it should take longer but it doesn't.
This version ran in just over 2 minutes.