I have a function that I wish to apply to a dataframe:
def DetermineMid(data, ts):
if data['U'] == 0 and data['D'] > 0:
mid = data['C'] + ts / 2
elif data['U'] > 0 and data['D'] == 0:
mid = data['C'] - ts / 2
else:
diff = data['A'] - data['B']
if diff == 0:
mid = data['C'] + 1
else:
mid = data['C']
return mid
My df columns are A, B, C, D, U.
My call is as follows:
df = df.apply(DetermineMid, args=(5, ), axis=1).
On smaller dataframes this works just fine, but for this dataframe:
DatetimeIndex: 2561527 entries, 2016-11-30 17:00:01 to 2017-11-29 16:00:00 Data columns (total 6 columns):
Z float64
A float64
B float64
C float64
U int64
D int64
dtypes: float64(5), int64(2)
memory usage: 156.3 MB
None
I receive a MemoryError. Am I using apply incorrectly? I would have thought apply is just iterating through the rows and creating a value mid based on row values, then dropping all the old values as I do not care about them anymore.
Is there a better way to do that?
Use
np.select
i.e