Topological Sorting using LINQ

2019-01-18 03:33发布

站内文章 / 前沿技术

42 0

该账号已被封号

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a list of items that have a partial order relation, i. e, the list can be considered a partially ordered set. I want to sort this list in the same way as in this question. As correctly answered there, this is known as topological sorting.

There's a reasonably simple known algorithm to solve the problem. I want a LINQ-like implementation of it.

I already tried to use OrderBy extension method, but I'm quite sure it's not able to make topological sorting. The problem is that the IComparer<TKey> interface is not able to represent a partial order. This happens because the Compare method can return basically 3 kinds of values: zero, negative, and positive, meaning are-equal, is-less-than, and is-greater-then, respectively. A working solution would only be possible if there were a way to return are-unrelated.

From my biased point of view, the answer I'm looking for might be composed by an IPartialOrderComparer<T> interface and an extension method like this:

public static IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector,
    IPartialOrderComparer<TKey> comparer
);

How would this be implemented? How does the IPartialOrderComparer<T> interface would look like? Would you recommend a different approach? I'm eager to see it. Maybe there's a nicer way to represent the partial order, I don't know.

回答1:

I would suggest using the same IComparer interface, but writing the extension method so as to interpret 0 as not related. In a partial ordering, if elements a and b are equal their order doesn't matter, like-wise if they are unrelated - you only have to order them with respect to elements with which they have defined relationships.

Here's an example that does a partial ordering of even and odd integers:

namespace PartialOrdering
{
    public static class Enumerable
    {
        public static IEnumerable<TSource> PartialOrderBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IComparer<TKey> comparer)
        {
            List<TSource> list = new List<TSource>(source);
            while (list.Count > 0)
            {
                TSource minimum = default(TSource);
                TKey minimumKey = default(TKey);
                foreach (TSource s in list)
                {
                    TKey k = keySelector(s);
                    minimum = s;
                    minimumKey = k;
                    break;
                }
                foreach (TSource s in list)
                {
                    TKey k = keySelector(s);
                    if (comparer.Compare(k, minimumKey) < 0)
                    {
                        minimum = s;
                        minimumKey = k;
                    }
                }
                yield return minimum;
                list.Remove(minimum);
            }
            yield break;
        }

    }
    public class EvenOddPartialOrdering : IComparer<int>
    {
        public int Compare(int a, int b)
        {
            if (a % 2 != b % 2)
                return 0;
            else if (a < b)
                return -1;
            else if (a > b)
                return 1;
            else return 0; //equal
        }
    }
    class Program
    {
        static void Main(string[] args)
        {
            IEnumerable<Int32> integers = new List<int> { 8, 4, 5, 7, 10, 3 };
            integers = integers.PartialOrderBy<Int32, Int32>(new Func<Int32, Int32>(delegate(int i) { return i; }), new EvenOddPartialOrdering());
        }
    }
}

Result: 4, 8, 3, 5, 7, 10

回答2:

This is my optimized and refurbished version of tehMick's answer.

One change I made was replacing the real list of values left to yield for a logical list. For this, I have two size arrays of the same size. One has all the values, and the others contains flags telling if each value has been yielded. This way, I avoid the cost of having to resize a real List<Key>.

One other change is that I'm reading all keys only one time at the start of the iteration. For reasons I can't recall now (maybe it' was just my instinct) I don't like the idea of calling the keySelector function several times.

The last touches was parameter validation, and an extra overload that uses an implicit key comparer. I hope the code is readable enough. Check it out.

public static IEnumerable<TSource> PartialOrderBy<TSource, TKey>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector)
{
    return PartialOrderBy(source, keySelector, null);
}

public static IEnumerable<TSource> PartialOrderBy<TSource, TKey>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector,
    IComparer<TKey> comparer)
{
    if (source == null) throw new ArgumentNullException("source");
    if (keySelector == null) throw new ArgumentNullException("keySelector");
    if (comparer == null) comparer = (IComparer<TKey>)Comparer<TKey>.Default;

    return PartialOrderByIterator(source, keySelector, comparer);
}

private static IEnumerable<TSource> PartialOrderByIterator<TSource, TKey>(
    IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector,
    IComparer<TKey> comparer)
{
    var values = source.ToArray();
    var keys = values.Select(keySelector).ToArray();
    int count = values.Length;
    var notYieldedIndexes = System.Linq.Enumerable.Range(0, count).ToArray();
    int valuesToGo = count;

    while (valuesToGo > 0)
    {
        //Start with first value not yielded yet
        int minIndex = notYieldedIndexes.First( i => i >= 0);

        //Find minimum value amongst the values not yielded yet
        for (int i=0; i<count; i++)
        if (notYieldedIndexes[i] >= 0)
        if (comparer.Compare(keys[i], keys[minIndex]) < 0) {
            minIndex = i;
        }

        //Yield minimum value and mark it as yielded
        yield return values[minIndex];
        notYieldedIndexes[minIndex] = -1;
        valuesToGo--;
    }
}

回答3:

Well, I'm not sure that this way of handling it is the best way, but I could be wrong.

The typical way to handle topological sorting is by using a graph, and for each iteration remove all nodes which doesn't have an inbound connection, and simultaneously remove all connections outbound from those nodes. The removed nodes are the output of the iteration. Repeat until you cannot remove any more nodes.

However, in order to get those connections in the first place, with your method, you would need:

A method (your comparer) that could say "this before that" but also "no information for these two"
Iterate all combinations, creating connections for all combinations that the comparer returns an ordering for.

In other words, the method would probably defined like this:

public Int32? Compare(TKey a, TKey b) { ... }

and then return null when it doesn't have a conclusive answer for two keys.

The problem I'm thinking about is the "iterate all combinations" part. Perhaps there's a better way to handle this, but I'm not seeing it.

回答4:

I believe that Lasse V. Karlsen's answer is on the right track, but I don't like hiding of the Compare method (or a separate interface for that matter which doesn't extend from IComparable<T>).

Instead, I'd rather see something like this:

public interface IPartialOrderComparer<T> : IComparer<T>
{
    int? InstancesAreComparable(T x, T y);
}

This way, you still have an implementation of IComparer<T> which can be used in other places that require IComparer<T>.

However, it also requires you to indicate the relationship of the instances of T to each other with the return value in the following way (similar to IComparable<T>):

null - instances are not comparable to each other.
0 - the instances are comparable to each other.
0 - x is a comparable key, but y is not.
< 0 - y is a comparable key, but x is not.

Of course, you won't get partial ordering when passing an implementation of this to anything that expects an IComparable<T> (and it should be noted that Lasse V. Karlsen's answer doesn't solve this either) as what you need can't be represented in a simple Compare method which takes two instances of T and returns an int.

To finish the solution, you have to provide a custom OrderBy (as well as ThenBy, OrderByDescending and ThenByDescending as well) extension method which will accept the new instance parameter (as you have already indicated). The implementation would look something like this:

public static IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(      
    this IEnumerable<TSource> source,      
    Func<TSource, TKey> keySelector,      
    IPartialOrderComparer<TKey> comparer)
{
    return Enumerable.OrderBy(source, keySelector,
        new PartialOrderComparer(comparer);
}

internal class PartialOrderComparer<T> : IComparer<T>
{
    internal PartialOrderComparer(IPartialOrderComparer<T> 
        partialOrderComparer)
    {
        this.partialOrderComparer = partialOrderComparer;
    }

    private readonly IPartialOrderComparer<T> partialOrderComparer;

    public int Compare(T x, T y)
    {
        // See if the items are comparable.
        int? comparable = partialOrderComparable.
            InstancesAreComparable(x, y);

        // If they are not comparable (null), then return
        // 0, they are equal and it doesn't matter
        // what order they are returned in.
        // Remember that this only to determine the
        // values in relation to each other, so it's
        // ok to say they are equal.
        if (comparable == null) return 0;

        // If the value is 0, they are comparable, return
        // the result of that.
        if (comparable.Value == 0) return partialOrderComparer.Compare(x, y);

        // One or the other is uncomparable.
        // Return the negative of the value.
        // If comparable is negative, then y is comparable
        // and x is not.  Therefore, x should be greater than y (think
        // of it in terms of coming later in the list after
        // the ordered elements).
        return -comparable.Value;            
    }
}

回答5:

Interface to define partial order relationship:

interface IPartialComparer<T> {
    int? Compare(T x, T y);
}

Compare should return -1 if x < y, 0 if x = y, 1 if y < x and null if x and y are not comparable.

Our goal is to return an ordering of the elements in the partial order that respects the enumeration. That is, we seek a sequence e_1, e_2, e_3, ..., e_n of the elements in the partial order such that if i <= j and e_i is comparable to e_j then e_i <= e_j. I'll do this using a depth-first search.

Class that implements topological sort using depth-first search:

class TopologicalSorter {
    class DepthFirstSearch<TElement, TKey> {
        readonly IEnumerable<TElement> _elements;
        readonly Func<TElement, TKey> _selector;
        readonly IPartialComparer<TKey> _comparer;
        HashSet<TElement> _visited;
        Dictionary<TElement, TKey> _keys;
        List<TElement> _sorted;

        public DepthFirstSearch(
            IEnumerable<TElement> elements,
            Func<TElement, TKey> selector,
            IPartialComparer<TKey> comparer
        ) {
            _elements = elements;
            _selector = selector;
            _comparer = comparer;
            var referenceComparer = new ReferenceEqualityComparer<TElement>();
            _visited = new HashSet<TElement>(referenceComparer);
            _keys = elements.ToDictionary(
                e => e,
                e => _selector(e), 
                referenceComparer
            );
            _sorted = new List<TElement>();
        }

        public IEnumerable<TElement> VisitAll() {
            foreach (var element in _elements) {
                Visit(element);
            }
            return _sorted;
        }

        void Visit(TElement element) {
            if (!_visited.Contains(element)) {
                _visited.Add(element);
                var predecessors = _elements.Where(
                    e => _comparer.Compare(_keys[e], _keys[element]) < 0
                );
                foreach (var e in predecessors) {
                    Visit(e);
                }
                _sorted.Add(element);
            }
        }
    }

    public IEnumerable<TElement> ToplogicalSort<TElement, TKey>(
        IEnumerable<TElement> elements,
        Func<TElement, TKey> selector, IPartialComparer<TKey> comparer
    ) {
        var search = new DepthFirstSearch<TElement, TKey>(
            elements,
            selector,
            comparer
        );
        return search.VisitAll();
    }
}

Helper class needed for marking nodes as visited while doing depth-first search:

class ReferenceEqualityComparer<T> : IEqualityComparer<T> {
    public bool Equals(T x, T y) {
        return Object.ReferenceEquals(x, y);
    }

    public int GetHashCode(T obj) {
        return obj.GetHashCode();
    }
}

I make no claim that this is the best implementation of the algorithm but I believe that it is a correct implementation. Further, I did not return an IOrderedEnumerable as you requested but that is easy to do once we are at this point.

The algorithm works by doing a depth-first search through the elements adding an element e to the linear ordering (represented by _sorted in the algorithm) if we have already added all the predecessors of e have already been added to the ordering. Hence, for each element e, if we haven't already visited it, visit its predecessors and then add e. Thus, this is the core of the algorithm:

public void Visit(TElement element) {
    // if we haven't already visited the element
    if (!_visited.Contains(element)) {
        // mark it as visited
        _visited.Add(element);
        var predecessors = _elements.Where(
            e => _comparer.Compare(_keys[e], _keys[element]) < 0
        );
        // visit its predecessors
        foreach (var e in predecessors) {
            Visit(e);
        }
        // add it to the ordering
        // at this point we are certain that
        // its predecessors are already in the ordering
        _sorted.Add(element);
    }
}

As an example, consider the partial-ordering defined on subsets of {1, 2, 3} where X < Y if X is a subset of Y. I implement this as follows:

public class SetComparer : IPartialComparer<HashSet<int>> {
    public int? Compare(HashSet<int> x, HashSet<int> y) {
        bool xSubsety = x.All(i => y.Contains(i));
        bool ySubsetx = y.All(i => x.Contains(i));
        if (xSubsety) {
            if (ySubsetx) {
                return 0;
            }
            return -1;
        }
        if (ySubsetx) {
            return 1;
        }
        return null;
    }
}

Then with sets defined as the list of subsets of {1, 2, 3}

List<HashSet<int>> sets = new List<HashSet<int>>() {
    new HashSet<int>(new List<int>() {}),
    new HashSet<int>(new List<int>() { 1, 2, 3 }),
    new HashSet<int>(new List<int>() { 2 }),
    new HashSet<int>(new List<int>() { 2, 3}),
    new HashSet<int>(new List<int>() { 3 }),
    new HashSet<int>(new List<int>() { 1, 3 }),
    new HashSet<int>(new List<int>() { 1, 2 }),
    new HashSet<int>(new List<int>() { 1 })
};
TopologicalSorter s = new TopologicalSorter();
var sorted = s.ToplogicalSort(sets, set => set, new SetComparer());

This results in the ordering:

{}, {2}, {3}, {2, 3}, {1}, {1, 3}, {1, 2}, {1, 2, 3}

which respects the partial order.

That was a lot of fun. Thanks.

回答6:

thanks very much to everybody, starting from Eric Mickelsen's answer I've came up with my version as I prefer to use null values to indicate no relation as Lasse V. Karlsen said.

public static IEnumerable<TSource> PartialOrderBy<TSource>(
        this IEnumerable<TSource> source,            
        IPartialEqualityComparer<TSource> comparer)
    {
        if (source == null) throw new ArgumentNullException("source");
        if (comparer == null) throw new ArgumentNullException("comparer");


        var set = new HashSet<TSource>(source);
        while (!set.IsEmpty())
        {
            TSource minimum = set.First();                

            foreach (TSource s in set)
            {                    
                var comparison = comparer.Compare(s, minimum);
                if (!comparison.HasValue) continue;
                if (comparison.Value <= 0)
                {
                    minimum = s;                        
                }
            }
            yield return minimum;
            set.Remove(minimum);
        }
    }

public static IEnumerable<TSource> PartialOrderBy<TSource>(
       this IEnumerable<TSource> source,
       Func<TSource, TSource, int?> comparer)
    {
        return PartialOrderBy(source, new PartialEqualityComparer<TSource>(comparer));
    }

then I have the following interface for the comparer

public interface IPartialEqualityComparer<T>
{
    int? Compare(T x, T y);
}

and this helper class

internal class PartialEqualityComparer<TSource> : IPartialEqualityComparer<TSource>
{
    private Func<TSource, TSource, int?> comparer;

    public PartialEqualityComparer(Func<TSource, TSource, int?> comparer)
    {
        this.comparer = comparer;
    }

    public int? Compare(TSource x, TSource y)
    {
        return comparer(x,y);
    }
}

that allows to beautify the usage a bit so my tests can looks like the following

 var data = new int[] { 8,7,6,5,4,3,2,1,0 };
 var partiallyOrdered = data.PartialOrderBy((x, y) =>
     {
        if (x % 2 == 0 && y % 2 != 0) return null;
        return x.CompareTo(y);
     });