I am ignoring the warnings and trying to subclass a pandas DataFrame. My reasons for doing so are as follows:
- I want to retain all the existing methods of
DataFrame
. - I want to set a few additional attributes at class instantiation, which will later be used to define additional methods that I can call on the subclass.
Here's a snippet:
class SubFrame(pd.DataFrame):
def __init__(self, *args, **kwargs):
freq = kwargs.pop('freq', None)
ddof = kwargs.pop('ddof', None)
super(SubFrame, self).__init__(*args, **kwargs)
self.freq = freq
self.ddof = ddof
self.index.freq = pd.tseries.frequencies.to_offset(self.freq)
@property
def _constructor(self):
return SubFrame
Here's a use example. Say I have the DataFrame
print(df)
col0 col1 col2
2014-07-31 0.28393 1.84587 -1.37899
2014-08-31 5.71914 2.19755 3.97959
2014-09-30 -3.16015 -7.47063 -1.40869
2014-10-31 5.08850 1.14998 2.43273
2014-11-30 1.89474 -1.08953 2.67830
where the index has no frequency
print(df.index)
DatetimeIndex(['2014-07-31', '2014-08-31', '2014-09-30', '2014-10-31',
'2014-11-30'],
dtype='datetime64[ns]', freq=None)
Using SubFrame
allows me to specify that frequency in one step:
sf = SubFrame(df, freq='M')
print(sf.index)
DatetimeIndex(['2014-07-31', '2014-08-31', '2014-09-30', '2014-10-31',
'2014-11-30'],
dtype='datetime64[ns]', freq='M')
The issue is, this modifies df
:
print(df.index.freq)
<MonthEnd>
What's going on here, and how can I avoid this?
Moreover, I profess to using copied code that I don't understand all that well. What is happening within __init__
above? Is it necessary to use args/kwargs with pop
here? (Why can't I just specify params as usual?)
I'll add to the warnings. Not that I want to discourage you, I actually applaud your efforts.
However, this won't the last of your questions as to what is going on.
That said, once you run:
self
is a bone-fide dataframe. You created it by passing another dataframe to the constructor.Try this as an experiment
So the observed behavior is consistent, in that when you construct one dataframe by passing another dataframe to the constructor, you end up pointing to the same objects.
To answer your question, subclassing isn't what is allowing the mutating of the original object... its the way pandas constructs a dataframe from a passed dataframe.
Avoid this by instantiating with a copy
What's going on in the
__init__
You want to pass on all the
args
andkwargs
topd.DataFrame.__init__
with the exception of the specifickwargs
that are intended for your subclass. In this case,freq
andddof
.pop
is a convenient way to grab the values and delete the key fromkwargs
before passing it on topd.DataFrame.__init__
How I'd implement
pipe