resample Pandas dataframe and merge strings in col

2019-05-23 01:43发布

I want to resample a pandas dataframe and apply different functions to different columns. The problem is that I cannot properly process a column with strings. I would like to apply a function that merges the string with a delimiter such as " - ". This is a data example:

import pandas as pd
import numpy as np
idx = pd.date_range('2017-01-31', '2017-02-03')
data=list([[1,10,"ok"],[2,20,"merge"],[3,30,"us"]])
dates=pd.DatetimeIndex(['2017-01-31','2017-02-03','2017-02-03'])
d=pd.DataFrame(data, index=,columns=list('ABC'))

            A   B          C
2017-01-31  1  10         ok
2017-02-03  2  20      merge
2017-02-03  3  30         us 

Resampling the numeric columns A and B with a sum and mean aggregator works. Column C however kind of works with sum (but it gets placed on the second place, which might mean that something fails).

d.resample('D').agg({'A': sum, 'B': np.mean, 'C': sum})

              A               C     B
2017-01-31  1.0               a  10.0
2017-02-01  NaN               0   NaN
2017-02-02  NaN               0   NaN
2017-02-03  5.0        merge us  25.0

I would like to get this:

...
2017-02-03  5.0      merge - us  25.0

I tried using lambda in different ways but without success (not shown).

If I may ask a second related question: I can do some post processing for this, but how to fill missing cells in different columns with zeros or ""?

1条回答
贪生不怕死
2楼-- · 2019-05-23 02:47

Your agg function for column 'C' should be a join

d.resample('D').agg({'A': sum, 'B': np.mean, 'C': ' - '.join})

              A     B           C
2017-01-31  1.0  10.0          ok
2017-02-01  NaN   NaN            
2017-02-02  NaN   NaN            
2017-02-03  5.0  25.0  merge - us
查看更多
登录 后发表回答