SQLCLR custom aggregate with multiple parameters

2019-03-06 11:00发布

问题:

I have trouble understanding of how CLR User-Defined Aggregates work.

I have to create some custom CLR aggregates with multiple parameters. The point is to get the value of the first parameter depending on the second.

For example, I have the following values in my table, and I need the oldest employee Name for each Type:

    Type   |   Name   |   Age   
--------------------------------
Manager    | emp 1    |   35    
Manager    | emp 2    |   42    
Developer  | emp 3    |   36    
Developer  | emp 4    |   45    
Developer  | emp 5    |   22    

So I would like to write a query like this to get the result by using my assembly:

Select      Type, dbo.fOldestEmployee(Name, Age) AS [Name]
From        xxx
Group By    Type

This would respond :

    Type   |   Name   
----------------------
Manager    | emp 2     
Developer  | emp 4    

It look like it's possible with a CLR User-Defined Aggregate, but I have difficulty finding a concrete example of this kind of implementation.

For the moment I have this. I create a class to collect the datas, but how can I sort (or do other thing) to them?

using System;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text;
using System.Collections;
using System.IO;

[Serializable]
[SqlUserDefinedAggregate(
    Format.UserDefined,
    IsInvariantToOrder = false, // order changes the result
    IsInvariantToNulls = false, // nulls change the result
    IsInvariantToDuplicates = false, // duplicates change the result
    MaxByteSize = -1)]
public struct sOlder
{
    private List<MyData> _datas;

    public void Init()
    {
        _datas = new List<MyData>();
    }

    public void Accumulate(SqlString valueField, SqlInt32 ValueInt)
    {
        if (!valueField.IsNull && !ValueInt.IsNull)
        {
            _datas.Add(new MyData
            {
                ValField = valueField.Value,
                ValInt = ValueInt.Value
            });
        }
    }

    public void Merge (sOlder Group)
    {
        _datas.AddRange(Group._datas);
    }

    public SqlString Terminate ()
    {
        //...
    }

    public class MyData
    {
        public String ValField { get; set; }
        public Int32 ValInt { get; set; }
    }
}

Any ideas ?

回答1:

There's no need to store a list of all the records - you only need to store the details of the oldest record you've seen so far.

Something like this should work:

[Serializable]
[SqlUserDefinedAggregate(
    Format.UserDefined,
    IsInvariantToOrder = true,
    IsInvariantToNulls = true,
    IsInvariantToDuplicates = true,
    MaxByteSize = -1)]
public struct sOlder : IBinarySerialize
{
    private struct MyData
    {
        public string Name { get; set; }
        public int? Age { get; set; }

        public int CompareTo(MyData other)
        {
            if (Age == null) return other.Age == null ? 0 : -1;
            if (other.Age == null) return 1;
            return Age.Value.CompareTo(other.Age.Value);
        }

        public static bool operator <(MyData left, MyData right)
        {
            return left.CompareTo(right) == -1;
        }

        public static bool operator >(MyData left, MyData right)
        {
            return left.CompareTo(right) == 1;
        }
    }

    private MyData _eldestPerson;

    public void Init()
    {
        _eldestPerson = default(MyData);
    }

    public void Accumulate(SqlString name, SqlInt32 age)
    {
        if (!name.IsNull && !age.IsNull)
        {
            var currentPerson = new MyData
            {
                Name = name.Value,
                Age = age.Value
            };

            if (currentPerson > _eldestPerson)
            {
                _eldestPerson = currentPerson;
            }
        }
    }

    public void Merge (sOlder other)
    {
        if (other._eldestPerson > _eldestPerson)
        {
            _eldestPerson = other._eldestPerson;
        }
    }

    public SqlString Terminate()
    {
        return _eldestPerson.Name;
    }

    public void Write(BinaryWriter writer)
    {
        if (_eldestPerson.Age.HasValue)
        {
            writer.Write(true);
            writer.Write(_eldestPerson.Age.Value);
            writer.Write(_eldestPerson.Name);
        }
        else
        {
            writer.Write(false);
        }
    }

    public void Read(BinaryReader reader)
    {
        if (reader.ReadBoolean())
        {
            _eldestPerson.Age = reader.ReadInt32();
            _eldestPerson.Name = reader.ReadString();
        }
        else
        {
            _eldestPerson = default(MyData);
        }
    }
}


回答2:

If you are looking for an implementation of your specific request, then @Richard's answer looks to be correct (though, you might still need to implement the Read and Write methods for using a custom type -- Format.UserDefined).

However, it seems from the comments on the question that this is more of a general question of when to do processing of whatever information you are collecting. In that case:

  • The Accumulate method is called for every row in a particular GROUP. This is the entry point.

  • The Merge method is called when parallelism is being used. SQL Server uses this method to combine the information from various threads. Depending on the type of algorithm you are doing, here you might: combine the current and incoming information, decide to keep the current info or the incoming info (as is being done in @Richard's implementation), recalculate the current info based on the new incoming info.

  • The Terminate method is called at the end of each particular GROUP. Here is where you would do the final calculation / logic and then return the expected result.

This information, and more, can be found on the MSDN page for Requirements for CLR User-Defined Aggregates.