I am trying to use some boolean logic in a function on a dataframe, but get an error:
In [4]:
data={'level':[20,19,20,21,25,29,30,31,30,29,31]}
frame=DataFrame(data)
frame
Out[4]:
level
0 20
1 19
2 20
3 21
4 25
5 29
6 30
7 31
8 30
9 29
10 31
In [35]:
def calculate(x):
baseline=max(frame['level'],frame['level'].shift(1))#doesnt work
#baseline=x['level']+4#works
difftobase=x['level']-baseline
return baseline, difftobase
frame['baseline'], frame['difftobase'] = zip(*frame.apply(calculate, axis=1))#works
However, this throws the following error at:
baseline=max(frame['level'],frame['level'].shift(1))#doesnt work
ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', u'occurred at index 0')
I read How to look back at previous rows from within Pandas dataframe function call? and http://pandas.pydata.org/pandas-docs/stable/gotchas.html but can't figure out how to apply this to my problem?
Inadequate use of the function max. np.maximum (perhaps np.ma.max as well as per numpy documentation) works. Apparently regular max can not deal with arrays (easily). Replacing
with
does the trick. I removed the other part to make it easier to read:
PS the original problem is hiding another issue that shows up when taking out the shift portion of the function. The return shape doesn't match, but thats another problem, just mentioning it here for full disclosure