I have time series data, i.e. by date (YYYY-MM-DD), returns, pnl, # of trades:
date returns pnl no_trades
1998-01-01 0.01 0.05 5
1998-01-02 -0.04 0.12 2
...
2010-12-31 0.05 0.25 3
Now I would like to show horizontal bar charts with a) the average of the returns b) sum of the pnls
by:
1) year, i.e. 1998, 1999, ..., 2010
2) quarter across all years, i.e. Q1 (YYYY-01-01 to YYYY-03-31), Q2, .., Q4
Additionally, the sum of # of trades per 1) and 2) should denote a number next to each of the horizontal bars.
So in my opinion there needs to be two separate steps:
1) Get the data in the right format
2) Feed the data to the plot and then with overlay of multiple plots.
Sample data:
start = datetime(1998, 1, 1)
end = datetime(2001, 12, 31)
dates = pd.date_range(start, end, freq = 'D')
df = pd.DataFrame(np.random.randn(len(dates), 3), index = dates,
columns = ['returns', 'pnl', 'no_trades'])
So that could be two horizontal bar charts for year and quarter each:
1) one for returns: bar chart, number in the middle of the bar, sum of no_trades at the end of the bar
2) one for pnl: bar chart, number in the middle of the bar, sum of no_trades at the end of the bar
Plus a dotted line vertical line across the going across the bars showing the average returns and pnl.
I could do it in excel (which in fact is adding columns with the respective view and then pivot chart it), but would prefer an "automatized" way with the possibility to reproduce (or understand how it's done) via python.
edit: as discussed in below comment, this is how far I've got; however, I am not sure whether this is the most the fastest approach with regards to 1). I am currently working on 2).
df_ret_year = df[['date', 'returns']].groupby(df['date'].dt.year).mean()
df_ret_quarter = df[['date', 'returns']].groupby(df['date'].dt.quarter).mean()
df_pnl_year = df[['date', 'pnl']].groupby(df['date'].dt.year).sum()
df_pnl_quarter = df[['date', 'pnl']].groupby(df['date'].dt.quarter).sum()
df_trades_year = df[['date', 'pnl']].groupby(df['date'].dt.year).sum()
df_trades_quarter = df[['date', 'pnl']].groupby(df['date'].dt.quarter).sum()
Create the DataFrame with a MultiIndex - (year,quarter)
Then you can group by year, quarter or year and quarter: