Assigning Group ID to components in networkx

2019-07-16 07:22发布

问题:

I have a graph which consists of nodes having "parentid" of hotels and "phone_search" stored in them. My main aim to build this graph was to connect all "parentid" which have similar "phone_search" (recursively), eg, if parentid A has phone_search 1,2; B has 2,3; C has 3,4; D has 5,6 and E has 6,7, then A,B, C will be grouped in 1 cluster and D and E in another cluster.

This is my code to build the nework:

from pymongo import MongoClient  # To import client for MongoDB
import networkx as nx
import pickle

G = nx.Graph()

#Defining variables
hotels = []
phones = []
allResult = []
finalResult = []

#dictNx = {}

# Initializing MongoDB client
client = MongoClient()

# Connection
db = client.hotel
collection = db.hotelData

for post in collection.find():
    hotels.append(post)

for hotel in hotels:
    try:
        phones = hotel["phone_search"].split("|")
        for phone in phones:
            if phone == '':
                pass
            else:
                G.add_edge(hotel["parentid"],phone)
    except:
        phones = hotel["phone_search"]
        if phone == '':
            pass
        else:
            G.add_edge(hotel["parentid"],phone)

# nx.write_gml(G,"export.gml")
pickle.dump(G, open('/home/justdial/newHotel/graph.txt', 'w'))

What I want to do: I want to assign a group ID to each component and store it into a dictionary so that I can access them with ease every time directly from the dictionary.

Example : Gid 1 will contain some parentids and phone_searches which are in the same cluster. Similarly Gid 2 will contain nodes from another cluster and so on...

I have one more doubt. Is accessing the nodes from dictionary using group ID faster than performing a bfs on networkx graph?

回答1:

You want basically a list of nodes based on their component (not cluster), which is fairly straightforward. You need connected_component_subgraphs().

G = nx.caveman_graph(3, 4)  # generate example with 3 components of four members each
components = nx.connected_component_subgraphs(G)

comp_dict = {idx: comp.nodes() for idx, comp in enumerate(components)}
print comp_dict
# {0: [0, 1, 2, 3], 1: [4, 5, 6, 7], 2: [8, 9, 10, 11]}

In case you want the component IDs as node attributes:

attr = {n: comp_id for comp_id, nodes in comp_dict.items() for n in nodes}

nx.set_node_attributes(G, "component", attr)
print G.nodes(data=True)
# [(0, {'component': 0}), (1, {'component': 0}), (2, {'component': 0}), (3, {'component': 0}), (4, {'component': 1}), (5, {'component': 1}), (6, {'component': 1}), (7, {'component': 1}), (8, {'component': 2}), (9, {'component': 2}), (10, {'component': 2}), (11, {'component': 2})]


回答2:

This is a comment due to lack of reputation.

The "set_node_attributes" functions changed the order of the arguments between v1.x and v2.0 to allow more options for loading attributes. The order is: (G, values, name) instead of (G, name, values)

If using the keyword argument then order is not important:

nx.set_node_attributes(G, name='component', values=attr)