CLR: Multi Param Aggregate, Argument not in Final

2019-06-23 04:17发布

问题:

Why is my delimiter not appearing in the final output? It's initialized to be a comma, but I only get ~5 white spaces between each attribute using:

  SELECT [article_id]
         , dbo.GROUP_CONCAT(0, t.tag_name, ',') AS col
    FROM [AdventureWorks].[dbo].[ARTICLE_TAG_XREF] atx
    JOIN [AdventureWorks].[dbo].[TAGS] t ON t.tag_id = atx.tag_id
GROUP BY article_id

The bit for DISTINCT works fine, but it operates within the Accumulate scope...

Output:

article_id  |  col
-------------------------------------------------
1           |  a         a         b         c         

Update: The excess space between values is because the column as defined as NCHAR(10), so 10 characters would appear in the output. Silly mistake on my part...

Solution


With Martin Smith's help about working with the Write(BinaryWriter w) method, this update works for me:

public void Write(BinaryWriter w)
{
    w.Write(list.Count);
    for (int i = 0; i < list.Count; i++ )
    {
        if (i < list.Count - 1)
        {
            w.Write(list[i].ToString() + delimiter);
        }
        else 
        {
            w.Write(list[i].ToString());
        }
    }
}

The Question:


Why does the above solve my problem? And why wouldn't it let me use more than one w.write call inside the FOR loop?

C# Code:


using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Xml.Serialization;
using System.Xml;
using System.IO;
using System.Collections;
using System.Text;

[Serializable]
[SqlUserDefinedAggregate(Format.UserDefined, MaxByteSize = 8000)]
public struct GROUP_CONCAT : IBinarySerialize
{
    ArrayList list;
    string delimiter;

    public void Init()
    {
        list = new ArrayList();
        delimiter = ",";
    }

    public void Accumulate(SqlBoolean isDistinct, SqlString Value, SqlString separator)
    {
        delimiter = (separator.IsNull) ? "," : separator.Value ;

        if (!Value.IsNull)
        {
            if (isDistinct)
            {
                if (!list.Contains(Value.Value))
                {
                    list.Add(Value.Value);
                }
            }
            else
            {
                list.Add(Value.Value);
            }            
        }
    }

    public void Merge(GROUP_CONCAT Group)
    {
        list.AddRange(Group.list);
    }

    public SqlString Terminate()
    {
        string[] strings = new string[list.Count];

        for (int i = 0; i < list.Count; i++)
        {
            strings[i] = list[i].ToString();
        }

        return new SqlString(string.Join(delimiter, strings));
    }

    #region IBinarySerialize Members

    public void Read(BinaryReader r)
    {
        int itemCount = r.ReadInt32();
        list = new ArrayList(itemCount);

        for (int i = 0; i < itemCount; i++)
        {
            this.list.Add(r.ReadString());
        }
    }

    public void Write(BinaryWriter w)
    {
        w.Write(list.Count);
        foreach (string s in list)
        {
            w.Write(s);
        }
    }
    #endregion
}

回答1:

The problem here is that you do not serialize delimiter. Add:

w.Write(delimiter)

as a first line in your Write method and

delimiter = r.ReadString();

as a first line in your Read method.

Regarding your questions to suggested work-around:

Why does the above solve my problem?

It does not. It merely worked with your test scenario.

And why wouldn't it let me use more than one w.write call inside the FOR loop?

Write method needs to be compatible with Read method. If you write two strings and read only one then it is not going to work. The idea here is that your object may be removed from the memory and then loaded. This is what Write and Read are supposed to do. In your case - this indeed was happening and you were not able to keep the object value.



回答2:

The answer given by @agsamek is correct but not complete. The query processor may instantiate multiple aggregators, e.g. for parallel computations, and the one that will eventually hold all data after successive calls of Merge() may be assigned an empty recordset, i.e. its Accumulate() method may be never called:

var concat1 = new GROUP_CONCAT();
concat1.Init();
results = getPartialResults(1); // no records returned
foreach (var result in results)
    concat1.Accumulate(result[0], delimiter); // never called
...
var concat2 = new GROUP_CONCAT();
concat2.Init();
results = getPartialResults(2);
foreach (var result in results)
    concat2.Accumulate(result[0], delimiter);
...
concat1.Merge(concat2);
...
result = concat1.Terminate();

In this scenario, concat1's private field delimiter used in Terminate() remains what it is by default in Init() but not what you pass in SQL. Luckily or not, your test SQL uses the same delimiter value as in Init(), so you can't reveal the difference.

I'm not sure if this is a bug or if it has been fixed in later versions (I stumbled on it in SQL Server 2008 R2). My workaround was to make use of the other group that is passed in Merge():

public void Merge(GROUP_CONCAT Group)
{
    if (Group.list.Count != 0) // Group's Accumulate() has been called at least once
    {

        if (list.Count == 0) // this Accumulate() has not been called
            delimiter = Group.delimiter;

        list.AddRange(Group.list);
    }
}

P.S. I would use StringBuilder instead of ArrayList.