Dividing connected nodes into groups of three

2019-04-10 23:21发布

问题:

I have a collection of objects (I'll call them nodes) that are connected to each other. Every node is connected to at least one other node, and the entire collection of nodes is one big blob (no outliers). The number of nodes is a multiple of three.

What I want to do is divide the nodes into groups of three connected nodes (with no overlapping between groups).


Every node has a unique ID. The nodes are stored in a document wherein every line is the ID of a node followed by the nodes it is connected to. It looks something like this:

30:240
40:210/240/230/60
50:80/220/230/70/210/190/200
60:240/40/230/220
70:90/200/50/220
80:50/170/190/210
90:200/70/180
100:110/160/190/170/200/180
110:100/180
160:170/100
170:80/160/190/100
180:90/200/110/100
190:50/80/200/170/100
200:90/70/100/50/190/180
210:50/80/40/230
220:50/230/70/60
230:50/210/60/40/220
240:40/30/60

I've made some images to help visualize it:

I want to divide my nodes into groups of three, but they must be connected. There are many possible combinations of grouping the nodes. One possibility is the following image:

However, I am struggling to come up with an algorithm that can ensure that every node is in a group of three connected nodes (and no overlapping). In order to avoid overlapping, I "claim" nodes while making groups, which is my way of checking if a node already belongs to another group.

I have come up with the following code:

nodes = dict()
with open("input.txt") as f:
    for line in f.readlines():
        split = line.split(":")
        node = int(split[0])
        nodes[node] = []
        for connection in split[1].split("/"):
            nodes[node].append(int(connection.strip()))
groups = dict()
claimed = []
for node in nodes:
    if node in claimed:
        continue
    connections = nodes[node]
    if len(connections) > 1:
        for i in range(len(connections)):
            if node in groups and len(groups[node]) == 2:
                break
            if not connections[i] in claimed:
                if not node in groups:
                    groups[node] = []
                groups[node].append(connections[i])
                claimed.append(connections[i])
                if not node in claimed:
                    claimed.append(node)
f = open("output.txt",'w')
for group in groups:
    line = str(group) + ":"
    for node in groups[group]:
        line += str(node) + "/"
    line = line[:-1] + "\n"
    f.write(line)
f.close()

However, this produces the following:

160:170/100
80:190
230:50/210
70:90/200
110:180
240:40/30
220:60

Which can be visualized like this:

As you can see, there are four groups of three (in blue), and three groups of two (in orange). This is because I claim nodes as I am making the groups, causing some nodes to no longer have two unclaimed connected nodes for a group.


So my question is: How can I have my nodes claim connected nodes in such a way that other nodes can still form their own groups of three?

Or more basically: How can I divide connected nodes into groups of three connected nodes with no overlapping? Every node needs to be in a group of three.

My question is more about the algorithm than code. I am also open to an entirely different algorithm.

回答1:

data.py

node_string = """
30:240
40:210/240/230/60
50:80/220/230/70/210/190/200
60:240/40/230/220
70:90/200/50/220
80:50/170/190/210
90:200/70/180
100:110/160/190/170/200/180
110:100/180
160:170/100
170:80/160/190/100
180:90/200/110/100
190:50/80/200/170/100
200:90/70/100/50/190/180
210:50/80/40/230
220:50/230/70/60
230:50/210/60/40/220
240:40/30/60
        """

node_string_1 = """
(238,140,141):(208,134,191)/(117,119,222)/(229,134,102)/(167,234,229)
(176,120,113):(150,201,147)/(111,237,177)
(113,171,164):(237,128,198)/(159,172,151)/(202,171,225)/(240,168,211)/(111,218,139)
(208,134,191):(238,140,141)/(229,134,102)
(131,150,110):(212,209,198)/(155,133,217)/(168,140,187)/(173,204,120)/(147,210,106)/(207,149,101)/(228,172,108)
(229,134,102):(208,134,191)/(238,140,141)/(167,234,229)
(212,209,198):(109,129,191)/(155,133,217)/(147,210,106)/(131,150,110)
(105,199,125):(150,201,147)/(175,118,238)/(240,168,211)
(175,118,238):(105,199,125)/(240,168,211)
(114,178,129):(164,233,166)/(227,123,219)/(135,156,161)/(224,183,104)/(228,172,108)
(147,210,106):(212,209,198)/(173,204,120)/(131,150,110)/(143,147,114)
(222,128,188):(135,156,161)/(237,128,198)/(159,172,151)/(111,237,177)/(150,201,147)
(166,218,221):(207,149,101)/(143,114,198)/(159,172,151)/(202,171,225)
(143,147,114):(173,204,120)/(107,184,168)/(147,210,106)
(111,218,139):(132,138,234)/(202,171,225)/(113,171,164)
(117,119,222):(109,129,191)/(155,133,217)/(238,140,141)/(167,234,229)
(107,184,168):(164,233,166)/(173,204,120)/(143,147,114)/(227,123,219)
(224,183,104):(164,233,166)/(114,178,129)/(111,237,177)/(135,156,161)
(228,172,108):(227,123,219)/(173,204,120)/(135,156,161)/(159,172,151)/(207,149,101)/(131,150,110)/(114,178,129)
(227,123,219):(173,204,120)/(164,233,166)/(107,184,168)/(228,172,108)/(114,178,129)
(168,140,187):(155,133,217)/(207,149,101)/(176,204,213)/(143,114,198)/(131,150,110)/(159,166,127)
(207,149,101):(168,140,187)/(159,172,151)/(166,218,221)/(143,114,198)/(131,150,110)/(228,172,108)
(159,172,151):(207,149,101)/(135,156,161)/(237,128,198)/(202,171,225)/(113,171,164)/(166,218,221)/(222,128,188)/(228,172,108)
(143,114,198):(168,140,187)/(207,149,101)/(202,171,225)/(166,218,221)/(184,163,168)/(159,166,127)
(164,233,166):(227,123,219)/(114,178,129)/(107,184,168)/(224,183,104)
(150,201,147):(176,120,113)/(240,168,211)/(237,128,198)/(105,199,125)/(111,237,177)/(222,128,188)
(111,237,177):(135,156,161)/(224,183,104)/(222,128,188)/(176,120,113)/(150,201,147)
(159,166,127):(184,163,168)/(168,140,187)/(176,204,213)/(143,114,198)
(155,133,217):(212,209,198)/(168,140,187)/(167,234,229)/(176,204,213)/(109,129,191)/(117,119,222)/(131,150,110)
(109,129,191):(212,209,198)/(117,119,222)/(155,133,217)
(132,138,234):(184,163,168)/(202,171,225)/(111,218,139)
(240,168,211):(150,201,147)/(237,128,198)/(105,199,125)/(175,118,238)/(113,171,164)
(167,234,229):(155,133,217)/(117,119,222)/(238,140,141)/(176,204,213)/(229,134,102)
(173,204,120):(227,123,219)/(147,210,106)/(143,147,114)/(107,184,168)/(131,150,110)/(228,172,108)
(135,156,161):(114,178,129)/(224,183,104)/(159,172,151)/(111,237,177)/(222,128,188)/(228,172,108)
(237,128,198):(150,201,147)/(159,172,151)/(222,128,188)/(240,168,211)/(113,171,164)
(176,204,213):(155,133,217)/(168,140,187)/(159,166,127)/(167,234,229)
(202,171,225):(132,138,234)/(184,163,168)/(159,172,151)/(113,171,164)/(166,218,221)/(143,114,198)/(111,218,139)
(184,163,168):(143,114,198)/(132,138,234)/(159,166,127)/(202,171,225)
"""

node_grouper.py

import itertools
import math
from collections import deque
from ast import literal_eval as totuple 

import data

class NodeGrouper():

    finished = False

    steps = 0

    nodes = {}
    groups = {}
    weights  = {}

    best_group = []
    returned_results_cache = []

    # parse node string 
    # works with two different conventions: 
    # - node names that are numbers
    # - node names that are tuples
    def parse_node(self, string):
        try:
            return totuple(string)
        except:
            return int(string)

    # Get a dictionary from the source string.
    # receives node_string (str): the node string passed in settings
    # returns {160: [170, 100], 240: [40, 30, 60] ... }
    def parse_string(self, node_string):
        nodes = {}
        for line in node_string.split():
            split = line.split(":")
            node = self.parse_node(split[0])
            if node not in nodes:
                nodes[node] = set()
            for connection in split[1].split("/"):
                connection = self.parse_node(connection.strip())
                if connection not in nodes:
                    nodes[connection] = set()
                nodes[node].add(connection)
                nodes[connection].add(node)
        self.nodes = nodes

        for node in self.nodes:
            self.nodes[node] = sorted(self.nodes[node])
        return nodes


    # Get all possible combinations.
    # Build a tuple with every possible 3 node connection.
    # receives node (dict) as created by self.parse_string()
    # returns ((90, 180, 200), (80, 100, 190), (60, 70, 220) ... )
    def possible_combinations(self, nodes):
        possible_links = {}
        for node, links in nodes.items():
            link_possibilities = itertools.combinations(links, self.nodes_per_group - 1)
            for link_possibility in link_possibilities:
                if node not in possible_links:
                    possible_links[node] = []
                possible_links[node] += [link_possibility]
        groups = set()
        for node, possible_groups in possible_links.items():
            for possible_group in possible_groups:
                groups.add(tuple(sorted(possible_group + (node,))))
        return tuple(groups)



    # Handle status messages (if settings['log_status'] == True).
    def log_status(self):
        # log how many steps were needed
        if self.settings['log_status']:
            print 'Error margin (unused nodes): %s' % (len(self.nodes) -(len(self.best_group) * 3))
            print '%s steps' % self.steps
            print self.format_output(self.best_group)


    # Create an output string in the format of the Stack Overflow ticket.
    def format_output(self, group_combination):

        out_dict = {}

        for group in group_combination:
            for node in group:
                links = set([n for n in group if n != node])

                if set(links).issubset(set(self.nodes[node])):
                    if node not in out_dict:
                        out_dict[node] = set()
                    out_dict[node] = links
                    break

        output = ''
        for node, links in out_dict.items():
            output += '%s:' % str(node)
            output += '/'.join([str(link) for link in links])
            output += '\n'

        return output



    # Start with groups with nodes harder to match.
    def sort_possible_groups(self, queried_group):
        gid = self.groups.index(queried_group) + 1

        for queried_group in queried_group:
            if queried_group in self.preferred_nodes:
                self.weights[gid - 1] -= 3

        return self.weights[gid - 1]


    def initial_weights(self):
        weights = []
        groups_with_this_node = {}
        for group_id, group in enumerate(self.groups):
            ordinarity = 0
            for node in group:
                if node not in groups_with_this_node:
                    groups_with_this_node[node] = sum(x.count(node) for x in self.groups)
                ordinarity += groups_with_this_node[node] * 100
            weights += [ordinarity]
        return weights


    def selected_group_available(self, group_1, group_pool):
        for group_2 in group_pool:
            for node in group_1:

                self.steps += 1

                if self.settings['log_status']:

                    # log status each 500000 steps, if specified in the settings
                    if self.steps % self.settings['log_period'] == 0:
                        self.log_status()

                if node in group_2:
                    return False

        return True                                  

    def get_results(self, i = None, weights = None):

        group_length = len(self.groups)
        group_length_iterator = range(group_length)

        groups = deque(sorted(self.groups, key = self.sort_possible_groups))

        output = [groups[i]]
        for j in group_length_iterator:

            if i == j:
                continue

            selected_group = tuple(groups[j])

            if self.selected_group_available(selected_group, output):
                output += [selected_group]


        match = sorted(tuple(output))

        if len(output) > len(self.best_group):
            self.best_group = output


        if len(output) >= self.maximum_groups or \
            (
                self.settings['maximum_steps'] > 0 and \
                self.steps > self.settings['maximum_steps']
            ):
            self.finished = True


        # print len(output)
        if match not in self.returned_results_cache:
            self.returned_results_cache += [match]
            self.preferred_nodes = [n for n in self.nodes]
            for g in match:
                for n in g:
                    if n in self.preferred_nodes:
                        self.preferred_nodes.remove(n)

        return len(output)




    def __init__(self, **settings):

        # save settings in a property
        self.settings = {}
        for key, value in settings.iteritems(): 
            self.settings[key] = value

        self.nodes_per_group = 3
        self.nodes = self.parse_string(self.settings['node_string'])
        self.preferred_nodes = [node for node in self.nodes]
        self.groups = self.possible_combinations(self.nodes)        


        self.groupable_nodes = list(set(sum(self.groups, ())))
        self.groupable_nodes_total = len(self.groupable_nodes)

        self.maximum_groups = int(math.floor(self.groupable_nodes_total / self.nodes_per_group))

        self.weights = self.initial_weights()


    def calculate(self):

        for i in range(len(ng.groups)):
            self.get_results(i)
            if self.finished:
                break


if __name__ == "__main__":

    ng = NodeGrouper(

        # 0 is for unlimitted 
        # (try until all nodes are used)
        maximum_steps = 10000000,

        # Log number of iterations
        log_status = False,
        # Log each x iterations
        log_period = 5000000,

        node_string = data.node_string_3,

    )

    ng.calculate()

    print ng.format_output(ng.best_group)

It handled this example in 1285000000 steps:



回答2:

I'm not sure if it works in every edge case, but if you can find a path through the graph that passes through every node once, you can just group that path into sections of three. Depending on the size of the graph, you could potentially do that with a simple DFS.

EDIT: For this to work you need to trim off all tails (paths of no return) but that just means you'll need to count off the nodes slightly differently when you're grouping the ones adjacent to the tail.



回答3:

Create adjacency lists such as below based on the connected nodes.

90 -> 180, 200, 70

70 -> 90, 200, 220

180 -> 90, 110, 100, 200

110 -> 180, 100

100 -> 110, 180, 200, 190, 160, 170

Navigation should be as follows.

  1. Go to 90 and assign to group 1. Mark 90 as done.
  2. Go to 180 as it is the first connected node. Assign to group 1 and mark 180 as done.
  3. Go to 180 list
  4. First one is 90 which is already marked, so skip it.
  5. Go to next node in the same list. It is 110. Assign to group 1 and mark 110 as done.
  6. Go to 110 list.

Increment group number after the group is full.

And so on......

Once the condition of (number of groups * 3 ) == total number of nodes is satisfied, break out of loop.



回答4:

The bigger question is how many nodes you will have in your search. If the # nodes remains low, you could always iter over all possible combinations. And valide each possible solution.

But if you have serveral 100 nodes, this will be become impossible. Then you should like to for AI code that does the work for you.