I've a dataset that denotes the list of authors of many technical reports. Each report can be authored by one or multiple people:
a = [
['John', 'Mark', 'Jennifer'],
['John'],
['Joe', 'Mark'],
['John', 'Anna', 'Jennifer'],
['Jennifer', 'John', 'Mark']
]
I've to find the most frequent pairs, that is, people that had most collaborations in the past:
['John', 'Jennifer'] - 3 times
['John', 'Mark'] - 2 times
['Mark', 'Jennifer'] - 2 times
etc...
How to do this in Python?
Output:
You can use a set comprehension to create a set of all numbers then use a list comprehension to count the occurrence of the pair names in your sub list :
Use a
collections.Counter
dict withitertools.combinations
:most_common()
will return the pairings in order of most common to least, of you want the firstn
most common just passn
d.most_common(n)