Flatten association table to multi-value column?

2019-02-15 17:19发布

问题:

I have a table with just product ID's and category ID's (products can be in more than one category). How can I flatten the category ID's into a product column so I end us with this:

id | name | desc | categories
1 | test1 | lorem | 1,3,4,23
2 | test2 | ipsom | 4,6,24

It is like I need to loop into a separate table for the categories column. How can I do this or is there a better way?

回答1:

There's no in-built way to do it in MSSQL.

Simulating group_concat MySQL function in Microsoft SQL Server 2005? has a good description of how to go about implementing a workaround.



回答2:

I created an CLR aggregate function that takes a varchar column and returns all its values separated by commas. In other words, it joins several strings into a comma-separated list. I am sure its performance is way better than any T-Sql trick.

As any aggregate function, it can be used in combination with group by. For example:

SELECT id, name, desc, JoinStrings(CONVERT(VARCHAR(20), category_id))
FROM product p
INNER JOIN category_products c ON p.category_id = c.category_id
GROUP BY id, name, desc

Here's the C# code to create the CLR assembly into Sql Server 2008:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;


[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.UserDefined, IsInvariantToDuplicates=false, IsInvariantToOrder=false, IsInvariantToNulls=true, MaxByteSize=-1)]
public struct JoinStrings : IBinarySerialize
{
    private char[] sb;
    private int pos;
    public void Init()
    {
        sb = new char[512000];
        pos = 0;
    }

    public void Accumulate(SqlString Value)
    {
        if (Value.IsNull) return;
        char[] src = Value.ToString().ToCharArray();
        Array.Copy(src, 0, sb, pos, src.Length);
        pos += src.Length;
        sb[pos] = ',';
        pos++;
    }

    public void Merge(JoinStrings Group)
    {
        Accumulate(Group.Terminate());
    }

    public SqlString Terminate()
    {
        if (pos <= 0) 
            return new SqlString();
        else
            return new SqlString(new String(sb, 0, pos-1));
    }

    public void Read(System.IO.BinaryReader r)
    {
        this.Init();
        pos = r.ReadInt32();
        r.Read(sb, 0, pos);
    }

    public void Write(System.IO.BinaryWriter w)
    {
        w.Write(pos);
        w.Write(sb, 0, pos);
    }
}

Here's the code to create the function (although deploying from Visual Studio should do it automatically):

CREATE AGGREGATE [dbo].[JoinStrings]
(@s [nvarchar](4000))
RETURNS[nvarchar](max)
EXTERNAL NAME [YouAssemblyName].[JoinStrings]


回答3:

I would suggest using a Recursive CTE. I believe that it would be something like this:

select productid, categoryid, 
    row_number() over (partition by id order by categoryid) as rownum
into #tabletorecurse
from TABLENAME

with finaloutput as
(
    select productid as id, name, desc, categoryid as categories, rownum
    from #tabletorecurse
        join PRODUCTTABLE
            on PRODUCTTABLE.id = #tabletorecurse.productid
    where rownum = 1

    union all

    select tr.id, tr.name, tr.desc, 
        finaloutput.categories + ', ' + tr.categoryid, tr.rownum
    from #tabletorecurse as tr
        join finaloutput 
            on finaloutput.rownum + 1 = tr.rownum 
                and finaloutput.id = tr.productid
)
select id, name, desc, categories
from finaloutput
    join 
    (
        select max(rownum) as maxrow, id
        from finaloutput
        group by id 
    ) as maxvalues
       on maxvalues.id = finaloutput.id 
           and maxvalues.maxrow = finaloutput.rownum


回答4:

Use a function.
This does a lookup to text so you will need to adapt.
The COALESCE is just to put a ,.
This is from a large scale production application - it works and it fast.
Function was questioned by JustinPony as function is slow
I am hitting some tables of million of records but only returning 100 rows.
The function is only applied to the hundred rows.

usage:

select top 5 sID, ( select [dbo].[JoinMVEnum](docSVsys.sID, '140') ) as [Flag Issue]
from docSVsys

function

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[JoinMVText]
(
   @sID int,
   @fieldID tinyint
)
RETURNS VARCHAR(MAX)
AS 
BEGIN
    DECLARE @MVtextList varchar(max)
    SELECT @MVtextList = COALESCE(@MVtextList + '; ', '') + docMVtext.value
    FROM docMVtext with (nolock) 
    WHERE docMVtext.sID = @sID and fieldID = @fieldID
    RETURN @MVtextList
END

GO