How to group a pandas dataframe which has a list o

2019-08-12 02:26发布

I have a pandas dataframe which has results of record similarity. For example, rowid 123 is similar to rowid 512 and rowid 123 is similar to 681. Technically, all three rows are similar. How can I group similar rows?

Note that my data has combinations - Example (123,512) and (512,123)

import pandas as pd
df = pd.DataFrame({'A': [123,123,512,412,412,536], 'B': [512,681,123,536,919,412]})
df

A   B
123 512
123 681
512 123
412 536
412 919
536 412

Expected Output

Group1  123
Group1  512
Group1  681
Group2  412
Group2  536
Group2  919

1条回答
SAY GOODBYE
2楼-- · 2019-08-12 03:18

You could use networkx to determine connected groups.

In [750]: import networkx as nx

In [751]: G = nx.from_pandas_dataframe(df, 'A', 'B')  # Create the graph

In [752]: Gcc = nx.connected_components(G)

In [753]: pd.DataFrame([{'id': i, 'group': 'group%s' % (g+1)}
     ...:               for g, ids in enumerate(Gcc) for i in ids])
Out[753]:
    group   id
0  group1  512
1  group1  681
2  group1  123
3  group2  536
4  group2  412
5  group2  919
查看更多
登录 后发表回答