I am raising this question for my self learning. As far as I know, followings are the different methods to remove columns in pandas dataframe.
Option - 1:
df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]})
del df['a']
Option - 2:
df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]})
df=df.drop('a',1)
Option - 3:
df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]})
df=df[['b','c']]
- What is the best approach among these?
- Any other approaches to achieve the same?
The recommended way to delete a column or row in pandas dataframes is using drop.
To delete a column,
To delete a row,
You can refer this post to see a detailed conversation about column delete approaches.
From a speed perspective, option 1 seems to be the best. Obviously, based on the other answers, that doesn't mean it's actually the best option.
In my opinion the best is use 2. and 3. option, because first has limits - you can remove only one column and cannot use dot notation -
del df.a
.3.solution is not deleting, but selecting and piRSquared create nice answer for multiple possible solutions with same idea.
Follow the doc:
And
pandas.DataFrame.drop
:So, I think we should stick with
df.drop
. Why? I think the pros are:It gives us more control of the remove action:
It can handle more complicated cases with it's args. E.g. with
level
, we can handle MultiIndex deletion. And witherrors
, we can prevent some bugs.It's a more unified and object oriented way.
And just like @jezrael noted in his answer:
Option 1: Using key word
del
is a limited way.Option 3: And
df=df[['b','c']]
isn't even a deletion in essence. It first select data by indexing with[]
syntax, then unbind the namedf
with the original DataFrame and bind it with the new one (i.e.df[['b','c']]
).