Do I have to deviate from PEP 8 style conventions

2019-08-15 16:33发布

问题:

I used to the following when altering a dataframe column based on a condition (in this case, every woman gets a wage of 200).

import pandas as pd
df = pd.DataFrame([[False,100],[True,100],[True,100]],columns=['female','wage'])
df.loc[df['female'] == True,'wage'] = 200

The PEP 8 Style convention checker (in Spyder) recommends in line 3:

comparison to True should be 'if cond is True:' or 'if cond:'

Changing the last row to

df.loc[df['female'] is True,'wage'] = 200

yields

KeyError: 'cannot use a single bool to index into setitem'

because now the statement is evaluated to a single boolean value and not to a Series.

Is this a case where one has to deviate from styling conventions?

回答1:

You should use df['female'] with no comparison, rather than comparing to True with any operator. df['female'] is already the mask you need.

Comparison to True with == is almost always a bad idea, even in NumPy or Pandas.



回答2:

Just do

df.loc[df['female'], 'wage'] = 200 

In fact df['female'] as a Boolean series has exactly the same values as the Boolean series returned by evaluating df['female'] == True, which is also a Boolean series. (A Series is the Pandas term like a single column in a dataframe).

By the way, the last statement is precisely why df['female'] is True should never work. In Python, the is operator is reserved for object identity, not for comparing values for equality. df['female'] will always be a Series (if df is a Pandas dataframe) and a Series will never be the same (object) as the single

To understand this better think of the difference, in English, between 'equal' and 'same'. In German, this is the difference between 'selbe' (identity) and 'gleiche' (equality). In other languages, this distinction is not as explicit.

Thus, in Python, you can compare a (reference to an) object to (the special object) None with : if obj is None : ... or even check that two variables ('names' in Python terminology) point to the exact same object with if a is b. But this condition holding is a much stronger assertion than just comparing for equality a == b. In fact the result of evaluating the expression a == b might be anything, not just a single Boolean value. It all depends on what class a belongs to, that is, what its type is. In your context a == b actually yields a boolean Series, provided both a and b are also a Pandas Series.

By the way if you want to check that all values agree between two Series a and b then you should evaluate (a == b).all() which reduces the whole series to a single Boolean value, which will be True if and only if a[i] == b[i] for every value of i.