I have a graph which consists of nodes having "parentid" of hotels and "phone_search" stored in them. My main aim to build this graph was to connect all "parentid" which have similar "phone_search" (recursively), eg, if parentid A has phone_search 1,2; B has 2,3; C has 3,4; D has 5,6 and E has 6,7, then A,B, C will be grouped in 1 cluster and D and E in another cluster.
This is my code to build the nework:
from pymongo import MongoClient # To import client for MongoDB
import networkx as nx
import pickle
G = nx.Graph()
#Defining variables
hotels = []
phones = []
allResult = []
finalResult = []
#dictNx = {}
# Initializing MongoDB client
client = MongoClient()
# Connection
db = client.hotel
collection = db.hotelData
for post in collection.find():
hotels.append(post)
for hotel in hotels:
try:
phones = hotel["phone_search"].split("|")
for phone in phones:
if phone == '':
pass
else:
G.add_edge(hotel["parentid"],phone)
except:
phones = hotel["phone_search"]
if phone == '':
pass
else:
G.add_edge(hotel["parentid"],phone)
# nx.write_gml(G,"export.gml")
pickle.dump(G, open('/home/justdial/newHotel/graph.txt', 'w'))
What I want to do: I want to assign a group ID to each component and store it into a dictionary so that I can access them with ease every time directly from the dictionary.
Example : Gid 1 will contain some parentids and phone_searches which are in the same cluster. Similarly Gid 2 will contain nodes from another cluster and so on...
I have one more doubt. Is accessing the nodes from dictionary using group ID faster than performing a bfs on networkx graph?
This is a comment due to lack of reputation.
The "set_node_attributes" functions changed the order of the arguments between v1.x and v2.0 to allow more options for loading attributes. The order is: (G, values, name) instead of (G, name, values)
If using the keyword argument then order is not important:
nx.set_node_attributes(G, name='component', values=attr)
You want basically a list of nodes based on their component (not cluster), which is fairly straightforward. You need
connected_component_subgraphs()
.In case you want the component IDs as node attributes: