How can a pandas merge preserve order?

2020-02-09 06:10发布

I have two DataFrames in pandas, trying to merge them. But pandas keeps changing the order. I've tried setting indexes, resetting them, no matter what I do, I can't get the returned output to have the rows in the same order. Is there a trick? Note we start out with the loans order 'a,b,c' but after the merge, it's "a,c,b".

import pandas
loans = [  'a',  'b', 'c' ]
states = [  'OR',  'CA', 'OR' ]
x = pandas.DataFrame({ 'loan' : loans, 'state' : states })
y = pandas.DataFrame({ 'state' : [ 'CA', 'OR' ], 'value' : [ 1, 2]})
z = x.merge(y, how='left', on='state')

But now the order is no longer the original 'a,b,c'. Any ideas? I'm using pandas version 11.

标签: python pandas
5条回答
神经病院院长
2楼-- · 2020-02-09 06:42

The fastest way I've found to merge and restore order - if you are merging "left" - is to include the original order as a column in the left dataframe before merging, then use that to restore the order after merging:

import pandas
loans = [  'a',  'b', 'c' ]
states = [  'OR',  'CA', 'OR' ]
x = pandas.DataFrame({ 'loan' : loans, 'state' : states })
y = pandas.DataFrame({ 'state' : [ 'CA', 'OR' ], 'value' : [ 1, 2]})

import numpy as np
x["Order"] = np.arange(len(x))

z = x.merge(y, how='left', on='state').set_index("Order").ix[np.arange(len(x)), :]

This method is faster than sorting. Here it is as a function:

def mergeLeftInOrder(x, y, on=None):
    x = x.copy()
    x["Order"] = np.arange(len(x))
    z = x.merge(y, how='left', on=on).set_index("Order").ix[np.arange(len(x)), :]
    return z
查看更多
贪生不怕死
3楼-- · 2020-02-09 06:47

Use pd.merge_ordered(), documentation here.

For your example,

z = pd.merge_ordered(x, y, how='left', on='state')

EDIT: Just wanted to point out that default behavior for this function is an outer merge, different from the default behavior of the more common .merge()

查看更多
▲ chillily
4楼-- · 2020-02-09 06:48

I might have a much more simple solution:

df_z = df_x.join(df_y.set_index('state'), on = 'state')

Hope it helps

查看更多
你好瞎i
5楼-- · 2020-02-09 06:50

Hopefully someone will provide a better answer, but in case no one does, this will definitely work, so…

Zeroth, I'm assuming you don't want to just end up sorted on loan, but to preserve whatever original order was in x, which may or may not have anything to do with the order of the loan column. (Otherwise, the problem is easier, and less interesting.)

First, you're asking it to sort based on the join keys. As the docs explain, that's the default when you don't pass a sort argument.


Second, if you don't sort based on the join keys, the rows will end up grouped together, such that two rows that merged from the same source row end up next to each other, which means you're still going to get a, c, b.

You can work around this by getting the rows grouped together in the order they appear in the original x by just merging again with x (on either side, it doesn't really matter), or by reindexing based on x if you prefer. Like this:

x.merge(x.merge(y, how='left', on='state', sort=False))

Alternatively, you can cram an x-index in there with reset_index, then just sort on that, like this:

x.reset_index().merge(y, how='left', on='state', sort=False).sort('index')

Either way obviously seems a bit wasteful, and clumsy… so, as I said, hopefully there's a better answer that I'm just not seeing at the moment. But if not, that works.

查看更多
祖国的老花朵
6楼-- · 2020-02-09 06:57

Pandas v0.8.0 introduced new merge function that takes order into consideration - ordered_merge, so your solution is now as simple as:

z = pandas.ordered_merge(x, y, on='state')
查看更多
登录 后发表回答