的Python:排序的依赖列表(Python: sorting a dependency list)

2019-06-26 09:58发布

我试图找出如果我的问题,使用内置排序()函数是可解的,或者如果我需要做自己 - 使用CMP老同学本来是比较容易的。

我的数据集的样子:

x = [
('business', Set('fleet','address'))
('device', Set('business','model','status','pack'))
('txn', Set('device','business','operator'))
....

排序规则应基本对于N&Y所有值,其中Y> N,X [N] [0]不是在X [Y] [1]

虽然我使用Python 2.6,其中CMP的说法仍然是可用的,我试图让这个Python 3的安全。

因此,这可以使用一些魔法的λ和关键参数做什么?

- == UPDATE == -

感谢礼和温斯顿! 我真的没有想到用钥匙将工作,或者如果我能怀疑这将是一个鞋拔解决方案,它是不理想的。

因为我的问题是对数据库表的依赖关系,我不得不稍作除了利的代码从一个依赖列表中删除的项目(在一个精心设计的数据库,这不会发生,但谁住在那个神奇的完美的世界?)

我的解决方案:

def topological_sort(source):
    """perform topo sort on elements.

    :arg source: list of ``(name, set(names of dependancies))`` pairs
    :returns: list of names, with dependancies listed first
    """
    pending = [(name, set(deps)) for name, deps in source]        
    emitted = []
    while pending:
        next_pending = []
        next_emitted = []
        for entry in pending:
            name, deps = entry
            deps.difference_update(set((name,)), emitted) # <-- pop self from dep, req Py2.6
            if deps:
                next_pending.append(entry)
            else:
                yield name
                emitted.append(name) # <-- not required, but preserves original order
                next_emitted.append(name)
        if not next_emitted:
            raise ValueError("cyclic dependancy detected: %s %r" % (name, (next_pending,)))
        pending = next_pending
        emitted = next_emitted

Answer 1:

你想要什么叫做拓扑排序 。 虽然可以使用内置实行sort()这是相当尴尬,而且最好是直接在Python实现拓扑排序。

为什么要尴尬? 如果你研究的维基页面上的两个算法,它们都依赖于一个正在运行的一套“标节点”的,一个概念,很难扭曲成一种形式sort()可以使用,因为key=xxx (甚至cmp=xxx )无国籍比较函数效果最好,特别是因为timsort不保证该元素将被检查的顺序。我(很)确保其使用任何解决方案sort()将要结束了冗余计算每个呼叫的一些信息以键/ CMP功能,以避开无国籍问题。

以下是我一直在使用(一些JavaScript库的依赖关系排序)ALG:

编辑:大大返工此基础上温斯顿·尤尔特的解决方案

def topological_sort(source):
    """perform topo sort on elements.

    :arg source: list of ``(name, [list of dependancies])`` pairs
    :returns: list of names, with dependancies listed first
    """
    pending = [(name, set(deps)) for name, deps in source] # copy deps so we can modify set in-place       
    emitted = []        
    while pending:
        next_pending = []
        next_emitted = []
        for entry in pending:
            name, deps = entry
            deps.difference_update(emitted) # remove deps we emitted last pass
            if deps: # still has deps? recheck during next pass
                next_pending.append(entry) 
            else: # no more deps? time to emit
                yield name 
                emitted.append(name) # <-- not required, but helps preserve original ordering
                next_emitted.append(name) # remember what we emitted for difference_update() in next pass
        if not next_emitted: # all entries have unmet deps, one of two things is wrong...
            raise ValueError("cyclic or missing dependancy detected: %r" % (next_pending,))
        pending = next_pending
        emitted = next_emitted

旁注:有可能鞋拔一个cmp()函数转换成key=xxx ,如在本蟒错误跟踪概述消息 。



Answer 2:

我做了拓扑排序是这样的:

def topological_sort(items):
    provided = set()
    while items:
         remaining_items = []
         emitted = False

         for item, dependencies in items:
             if dependencies.issubset(provided):
                   yield item
                   provided.add(item)
                   emitted = True
             else:
                   remaining_items.append( (item, dependencies) )

         if not emitted:
             raise TopologicalSortFailure()

         items = remaining_items

我认为它有点更直截了当比利的版本,我不知道效率。



Answer 3:

在寻找坏的格式和这个陌生的Set类型...(我已经把他们作为元组和正确分隔列表项...)...和使用networkx库使事情方便...

x = [
    ('business', ('fleet','address')),
    ('device', ('business','model','status','pack')),
    ('txn', ('device','business','operator'))
]

import networkx as nx

g = nx.DiGraph()
for key, vals in x:
    for val in vals:
        g.add_edge(key, val)

print nx.topological_sort(g)


Answer 4:

这是温斯顿的建议,与文档字符串和一个小的调整,扭转dependencies.issubset(provided)provided.issuperset(dependencies) 。 这种变化允许你通过dependencies中的每个输入对作为任意迭代而不一定一set

我用例涉及dict的键是项目的字符串,每个键是一个数值list上说关键要看该项目的名称。 一旦我确定, dict非空,我可以通过它的iteritems()的改进算法。

再次感谢温斯顿。

def topological_sort(items):
    """
    'items' is an iterable of (item, dependencies) pairs, where 'dependencies'
    is an iterable of the same type as 'items'.

    If 'items' is a generator rather than a data structure, it should not be
    empty. Passing an empty generator for 'items' (zero yields before return)
    will cause topological_sort() to raise TopologicalSortFailure.

    An empty iterable (e.g. list, tuple, set, ...) produces no items but
    raises no exception.
    """
    provided = set()
    while items:
         remaining_items = []
         emitted = False

         for item, dependencies in items:
             if provided.issuperset(dependencies):
                   yield item
                   provided.add(item)
                   emitted = True
             else:
                   remaining_items.append( (item, dependencies) )

         if not emitted:
             raise TopologicalSortFailure()

         items = remaining_items


文章来源: Python: sorting a dependency list