I know it might be old debate, but out of pandas.drop
and python del
function which is better in terms of performance over large dataset?
I am learning machine learning using python 3
and not sure which one to use. My data is in pandas
data frame format. But python del
function is in built-in function
for python.
Using randomly generated data of about 1.6 GB, it appears that
df.drop
is faster thandel
, especially over multiple columns:0.9118959903717041
Compared to:
4.052732944488525
@Inder's comparison is not quite the same since it doesn't use
inplace=True
.Summarizing a few points about functionality:
drop
operates on both columns and rows;del
operates on column only.drop
can operate on multiple items at a time;del
operates only on one at a time.drop
can operate in-place or return a copy;del
is an in-place operation only.The documentation at https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html has more details on
drop
's features.tested it on a 10Mb data of stocks, got the following results:
for drop with the following code
0.003617525100708008
for del with the following code on the same column:
time i got was:
0.0045168399810791016
reruns on different datasets and columns didn't make any significant difference
In drop method using "inplace=False" you have option to create Subset DF and keep un-touch the original DF, But in del I believe this option is not available.