How to achieve an inner join in pandas [duplicate]

2020-03-31 05:34发布

This question already has answers here:

I need to effectively do an inner join implemented in Python.

I have 2 data sets which come from separate sources but share a common key.

Lets say (for the sake of argument) that they look like this:

person_likes = [{'person_id': '1', 'food': 'ice_cream', 'pastimes': 'swimming'},
                {'person_id': '2', 'food': 'paella', 'pastimes': 'banjo'}]

person_accounts = [{'person_id': '1', 'blogs': ['swimming digest', 'cooking puddings']},
                   {'person_id': '2', 'blogs': ['learn flamenca']}]

How best can I join these two sets of data. I have something like this:

joins = []
for like in person_likes:
    for acc in person_accounts:
        if like['person_id'] == acc['person_id']:
            join = {}
            join.update(like)
            join.update(acc)
            joins.append(join)

print(joins)

This appears to work fine (I haven't tested it extensively), and at first glance looks like the best we can do - but I wonder if there is a know algorithm which is more performant and also if there is a more idiomatic or Pythonic way of doing this?

import pandas as pd accounts = pd.DataFrame(person_accounts) likes = pd.DataFrame(person_likes) pd.merge(accounts, likes, on='person_id') blogs person_id food pastimes # 0 [swimming digest, cooking puddings] 1 ice_cream swimming # 1 [learn flamenca] 2 paella banjo