Background
I just upgraded my Pandas from 0.11 to 0.13.0rc1. Now, the application is popping out many new warnings. One of them like this:
E:\FinReporter\FM_EXT.py:449: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
quote_df['TVol'] = quote_df['TVol']/TVOL_SCALE
I want to know what exactly it means? Do I need to change something?
How should I suspend the warning if I insist to use quote_df['TVol'] = quote_df['TVol']/TVOL_SCALE
?
The function that gives errors
def _decode_stock_quote(list_of_150_stk_str):
"""decode the webpage and return dataframe"""
from cStringIO import StringIO
str_of_all = "".join(list_of_150_stk_str)
quote_df = pd.read_csv(StringIO(str_of_all), sep=',', names=list('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefg')) #dtype={'A': object, 'B': object, 'C': np.float64}
quote_df.rename(columns={'A':'STK', 'B':'TOpen', 'C':'TPCLOSE', 'D':'TPrice', 'E':'THigh', 'F':'TLow', 'I':'TVol', 'J':'TAmt', 'e':'TDate', 'f':'TTime'}, inplace=True)
quote_df = quote_df.ix[:,[0,3,2,1,4,5,8,9,30,31]]
quote_df['TClose'] = quote_df['TPrice']
quote_df['RT'] = 100 * (quote_df['TPrice']/quote_df['TPCLOSE'] - 1)
quote_df['TVol'] = quote_df['TVol']/TVOL_SCALE
quote_df['TAmt'] = quote_df['TAmt']/TAMT_SCALE
quote_df['STK_ID'] = quote_df['STK'].str.slice(13,19)
quote_df['STK_Name'] = quote_df['STK'].str.slice(21,30)#.decode('gb2312')
quote_df['TDate'] = quote_df.TDate.map(lambda x: x[0:4]+x[5:7]+x[8:10])
return quote_df
More error messages
E:\FinReporter\FM_EXT.py:449: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
quote_df['TVol'] = quote_df['TVol']/TVOL_SCALE
E:\FinReporter\FM_EXT.py:450: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
quote_df['TAmt'] = quote_df['TAmt']/TAMT_SCALE
E:\FinReporter\FM_EXT.py:453: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
quote_df['TDate'] = quote_df.TDate.map(lambda x: x[0:4]+x[5:7]+x[8:10])
For me this issue occured in a following >simplified< example. And I was also able to solve it (hopefully with a correct solution):
old code with warning:
This printed the warning for the line
old_row[field] = new_row[field]
Since the rows in update_row method are actually type
Series
, I replaced the line with:i.e. method for accessing/lookups for a
Series
. Eventhough both works just fine and the result is same, this way I don't have to disable the warnings (=keep them for other chain indexing issues somewhere else).I hope this may help someone.
Pandas dataframe copy warning
When you go and do something like this:
pandas.ix
in this case returns a new, stand alone dataframe.Any values you decide to change in this dataframe, will not change the original dataframe.
This is what pandas tries to warn you about.
Why
.ix
is a bad ideaThe
.ix
object tries to do more than one thing, and for anyone who has read anything about clean code, this is a strong smell.Given this dataframe:
Two behaviors:
Behavior one:
dfcopy
is now a stand alone dataframe. Changing it will not changedf
Behavior two: This changes the original dataframe.
Use
.loc
insteadThe pandas developers recognized that the
.ix
object was quite smelly[speculatively] and thus created two new objects which helps in the accession and assignment of data. (The other being.iloc
).loc
is faster, because it does not try to create a copy of the data..loc
is meant to modify your existing dataframe inplace, which is more memory efficient..loc
is predictable, it has one behavior.The solution
What you are doing in your code example is loading a big file with lots of columns, then modifying it to be smaller.
The
pd.read_csv
function can help you out with a lot of this and also make the loading of the file a lot faster.So instead of doing this
Do this
This will only read the columns you are interested in, and name them properly. No need for using the evil
.ix
object to do magical stuff.In general the point of the
SettingWithCopyWarning
is to show users (and especially new users) that they may be operating on a copy and not the original as they think. There are false positives (IOW if you know what you are doing it could be ok). One possibility is simply to turn off the (by default warn) warning as @Garrett suggest.Here is another option:
You can set the
is_copy
flag toFalse
, which will effectively turn off the check, for that object:If you explicitly copy then no further warning will happen:
The code the OP is showing above, while legitimate, and probably something I do as well, is technically a case for this warning, and not a false positive. Another way to not have the warning would be to do the selection operation via
reindex
, e.g.Or,
You could avoid the whole problem like this, I believe:
Using Assign. From the documentation: Assign new columns to a DataFrame, returning a new object (a copy) with all the original columns in addition to the new ones.
See Tom Augspurger's article on method chaining in pandas: https://tomaugspurger.github.io/method-chaining
To remove any doubt, my solution was to make a deep copy of the slice instead of a regular copy. This may not be applicable depending on your context (Memory constraints / size of the slice, potential for performance degradation - especially if the copy occurs in a loop like it did for me, etc...)
To be clear, here is the warning I received:
Illustration
I had doubts that the warning was thrown because of a column I was dropping on a copy of the slice. While not technically trying to set a value in the copy of the slice, that was still a modification of the copy of the slice. Below are the (simplified) steps I have taken to confirm the suspicion, I hope it will help those of us who are trying to understand the warning.
Example 1: dropping a column on the original affects the copy
We knew that already but this is a healthy reminder. This is NOT what the warning is about.
It is possible to avoid changes made on df1 to affect df2
Example 2: dropping a column on the copy may affect the original
This actually illustrates the warning.
It is possible to avoid changes made on df2 to affect df1
Cheers!
This post is meant for readers who,
Setup
What is the
SettingWithCopyWarning
?To know how to deal with this warning, it is important to understand what it means and why it is raised in the first place.
When filtering DataFrames, it is possible slice/index a frame to return either a view, or a copy, depending on the internal layout and various implementation details. A "view" is, as the term suggests, a view into the original data, so modifying the view may modify the original object. On the other hand, a "copy" is a replication of data from the original, and modifying the copy has no effect on the original.
As mentioned by other answers, the
SettingWithCopyWarning
was created to flag "chained assignment" operations. Considerdf
in the setup above. Suppose you would like to select all values in column "B" where values in column "A" is > 5. Pandas allows you to do this in different ways, some more correct than others. For example,And,
These return the same result, so if you are only reading these values, it makes no difference. So, what is the issue? The problem with chained assignment, is that it is generally difficult to predict whether a view or a copy is returned, so this largely becomes an issue when you are attempting to assign values back. To build on the earlier example, consider how this code is executed by the interpreter:
With a single
__setitem__
call todf
. OTOH, consider this code:Now, depending on whether
__getitem__
returned a view or a copy, the__setitem__
operation may not work.In general, you should use
loc
for label-based assignment, andiloc
for integer/positional based assignment, as the spec guarantees that they always operate on the original. Additionally, for setting a single cell, you should useat
andiat
.More can be found in the documentation.
Just tell me how to suppress the warning!
Consider a simple operation on the "A" column of
df
. Selecting "A" and dividing by 2 will raise the warning, but the operation will work.There are a few ways of silencing this warning, shown below:
Make a
(deep)copy
Set
is_copy=False
Turn off the check for that particular DataFrame by setting
is_copy=False
.Change
pd.options.mode.chained_assignment
Can be set to
None
,"warn"
, or"raise"
."warn"
is the default.None
will suppress the warning entirely, and"raise"
will throw aSettingWithCopyError
, preventing the operation from going through.@Peter Cotton came up with a nice way of non-intrusively changing the mode (see this gist) using a context manager, to set the mode only as long as it is required, and the reset it back to the original state when finished.
The usage is as follows:
Or, to raise the exception
The "XY Problem": What am I doing wrong?
A lot of the time, users attempt to look for ways of suppressing this exception without fully understanding why it was raised in the first place. This is a good example of an XY problem, where users attempt to solve a problem "Y" that is actually a symptom of a deeper rooted problem "X". Questions will be raised based on common problems that encounter this warning, and solutions will then be presented.
Wrong way to do this:
Right way using
loc
:You can use any of the following methods to do this.
This is actually probably because of code higher up in your pipeline. Did you create
df2
from something larger, like? In this case, boolean indexing will return a view, so
df2
will reference the original. What you'd need to do is assigndf2
to a copy:Or,
This is because
df2
must have been created as a view from some other slicing operation, such asThe solution here is to either make a
copy()
ofdf
, or useloc
, as before.