I have data in the following general format that I would like to resample to 30 day time series windows:
'customer_id','transaction_dt','product','price','units'
1,2004-01-02,thing1,25,47
1,2004-01-17,thing2,150,8
2,2004-01-29,thing2,150,25
3,2017-07-15,thing3,55,17
3,2016-05-12,thing3,55,47
4,2012-02-23,thing2,150,22
4,2009-10-10,thing1,25,12
4,2014-04-04,thing2,150,2
5,2008-07-09,thing2,150,43
I would like the 30 day windows to start on 2014-01-01 and end on 12-31-2018. It is NOT guaranteed that every customer will have records in every window. If a customer has multiple transactions in a window, then it takes the weighted average of the price, sums the units, and concat the product names to create one record per customer per window.
What I have so far is something like this:
wa = lambda x:np.average(x, weights=df.loc[x.index, 'units'])
con = lambda x: '/'.join(x))
agg_funcs = {'customer_id':'first',
'product':'con',
'price':'wa',
'transaction_dt':'first',
'units':'sum'}
df_window = df.groupby(['customer_id', pd.Grouper(freq='30D')]).agg(agg_funcs)
df_window_final = df_window.unstack('customer_id', fill_value=0)
If anyone knows some better ways to approach this problem (particularly with an in-place and/or vectorized method), I would appreciate it. Ideally, I would also like to add the window start and stop dates as columns to the rows as well.
The final output would look like this ideally:
'customer_id','transaction_dt','product','price','units','window_start_dt','window_end_dt'
1,2004-01-02,thing1/thing2,(weighted average price),(total units),(window_start_dt),(window_end_dt)
2,2004-01-29,thing2,(weighted average price),(total units),(window_start_dt),(window_end_dt)
3,2017-07-15,thing3,(weighted average price),(total units),(window_start_dt),(window_end_dt)
3,2016-05-12,thing3,(weighted average price),(total units),(window_start_dt),(window_end_dt)
4,2012-02-23,thing2,(weighted average price),(total units),(window_start_dt),(window_end_dt)
4,2009-10-10,thing1,(weighted average price),(total units),(window_start_dt),(window_end_dt)
4,2014-04-04,thing2,(weighted average price),(total units),(window_start_dt),(window_end_dt)
5,2008-07-09,thing2,(weighted average price),(total units),(window_start_dt),(window_end_dt)