I would like to calculate the mean
and standard deviation
of a timedelta
by bank from a dataframe
with two columns shown below. When I run the code (also shown below) I get the below error:
pandas.core.base.DataError: No numeric types to aggregate
My dataframe:
bank diff
Bank of Japan 0 days 00:00:57.416000
Reserve Bank of Australia 0 days 00:00:21.452000
Reserve Bank of New Zealand 55 days 12:39:32.269000
U.S. Federal Reserve 8 days 13:27:11.387000
My code:
means = dropped.groupby('bank').mean()
std = dropped.groupby('bank').std()
You need to convert timedelta
to some numeric value, e.g. int64
by values
what is most accurate, because convert to ns
is what is the numeric representation of timedelta
:
dropped['new'] = dropped['diff'].values.astype(np.int64)
means = dropped.groupby('bank').mean()
means['new'] = pd.to_timedelta(means['new'])
std = dropped.groupby('bank').std()
std['new'] = pd.to_timedelta(std['new'])
Another solution is to convert values to seconds
by total_seconds
, but that is less accurate:
dropped['new'] = dropped['diff'].dt.total_seconds()
means = dropped.groupby('bank').mean()
No need to convert timedelta
back and forth. Numpy and pandas can seamlessly do it for you with a faster run time. Using your dropped
DataFrame
:
import numpy as np
grouped = dropped.groupby('bank')['diff']
mean = grouped.apply(lambda x: np.mean(x))
std = grouped.apply(lambda x: np.std(x))
Pandas mean()
and other aggregation methods support numeric_only=False
parameter.
dropped.groupby('bank').mean(numeric_only=False)
Found here: Aggregations for Timedelta values in the Python DataFrame