I'm working with boolean index in Pandas. The question is why the statement:
a[(a['some_column']==some_number) & (a['some_other_column']==some_other_number)]
works fine whereas
a[(a['some_column']==some_number) and (a['some_other_column']==some_other_number)]
exists with error?
Example:
a=pd.DataFrame({'x':[1,1],'y':[10,20]})
In: a[(a['x']==1)&(a['y']==10)]
Out: x y
0 1 10
In: a[(a['x']==1) and (a['y']==10)]
Out: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
When you say
You are implicitly asking Python to convert
(a['x']==1)
and(a['y']==10)
to boolean values.NumPy arrays (of length greater than 1) and Pandas objects such as Series do not have a boolean value -- in other words, they raise
when used as a boolean value. That's because its unclear when it should be True or False. Some users might assume they are True if they have non-zero length, like a Python list. Others might desire for it to be True only if all its elements are True. Others might want it to be True if any of its elements are True.
Because there are so many conflicting expectations, the designers of NumPy and Pandas refuse to guess, and instead raise a ValueError.
Instead, you must be explicit, by calling the
empty()
,all()
orany()
method to indicate which behavior you desire.In this case, however, it looks like you do not want boolean evaluation, you want element-wise logical-and. That is what the
&
binary operator performs:returns a boolean array.
By the way, as alexpmil notes, the parentheses are mandatory since
&
has a higher operator precedence than==
. Without the parentheses,a['x']==1 & a['y']==10
would be evaluated asa['x'] == (1 & a['y']) == 10
which would in turn be equivalent to the chained comparison(a['x'] == (1 & a['y'])) and ((1 & a['y']) == 10)
. That is an expression of the formSeries and Series
. The use ofand
with two Series would again trigger the sameValueError
as above. That's why the parentheses are mandatory.