I have a pandas data frame, sample
, with one of the columns called PR
to which am applying a lambda function as follows:
sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)
I then get the following syntax error message:
sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)
^
SyntaxError: invalid syntax
What am I doing wrong?
You need mask
:
sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)
Another solution with loc
and boolean indexing
:
sample.loc[sample['PR'] < 90, 'PR'] = np.nan
Sample:
import pandas as pd
import numpy as np
sample = pd.DataFrame({'PR':[10,100,40] })
print (sample)
PR
0 10
1 100
2 40
sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)
print (sample)
PR
0 NaN
1 100.0
2 NaN
sample.loc[sample['PR'] < 90, 'PR'] = np.nan
print (sample)
PR
0 NaN
1 100.0
2 NaN
EDIT:
Solution with apply
:
sample['PR'] = sample['PR'].apply(lambda x: np.nan if x < 90 else x)
Timings len(df)=300k
:
sample = pd.concat([sample]*100000).reset_index(drop=True)
In [853]: %timeit sample['PR'].apply(lambda x: np.nan if x < 90 else x)
10 loops, best of 3: 102 ms per loop
In [854]: %timeit sample['PR'].mask(sample['PR'] < 90, np.nan)
The slowest run took 4.28 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 3.71 ms per loop
You need to add else in your lambda function Because you are telling what to do in case your condition(here x < 90) is met, but you are not telling what to do if in case the condition is not met.
sample['PR'] = sample['PR'].apply(lambda x: 'NaN' if x < 90 else x)