In need a find the degree of every protein in the input file which is as shown below
A B
a b
c d
a c
c b
I have used networkx to get the nodes. How do I create the edges using my input file on the created nodes?
Code:
import pandas as pd
df = pd.read_csv('protein.txt',sep='\t', index_col =0)
df = df.reset_index()
df.columns = ['a', 'b']
distinct = pd.concat([df['a'], df['b']]).unique()
import networkx as nx
G=nx.Graph()
nodes= []
for i in distinct:
node=G.add_node(1)
nodes.append(node)
From networkx
documentation, use add_edge
in the loop or collect edges first then use add_edges_from
:
>>> G = nx.Graph() # or DiGraph, MultiGraph, MultiDiGraph, etc
>>> e = (1,2)
>>> G.add_edge(1, 2) # explicit two-node form
>>> G.add_edge(*e) # single edge as tuple of two nodes
>>> G.add_edges_from( [(1,2)] ) # add edges from iterable container
Then G.degree()
gives you the degree of nodes.
At first, the function read_csv
was used incorrectly to read the input file. The columns are separated by spaces, not tab, thus sep
should be '\s+'
instead of '\t'
. Also, there is no index column in the input file, thus the parameter index_col
should not be set to 0
.
After having correctly read the input file into a DataFrame
, we can convert it to a networkx
graph using the function from_pandas_edgelist
.
import networkx as nx
import pandas as pd
df = pd.read_csv('protein.txt', sep='\s+')
g = nx.from_pandas_edgelist(df, 'A', 'B')