I have a number of columns that I would like to remove from a data frame. I know that we can delete them individually using something like:
df$x <- NULL
But I was hoping to do this with fewer commands.
Also, I know that I could drop columns using integer indexing like this:
df <- df[ -c(1, 3:6, 12) ]
But I am concerned that the relative position of my variables may change.
Given how powerful R is, I figured there might be a better way than dropping each column one by one.
Provide the data frame and a string of comma separated names to remove:
Usage:
You can use a simple list of names :
Or, alternatively, you can make a list of those to keep and refer to them by name :
EDIT : For those still not acquainted with the
drop
argument of the indexing function, if you want to keep one column as a data frame, you do:drop=TRUE
(or not mentioning it) will drop unnecessary dimensions, and hence return a vector with the values of columny
.Another
dplyr
answer. If your variables have some common naming structure, you might trystarts_with()
. For exampleIf you want to drop a sequence of variables in the data frame, you can use
:
. For example if you wanted to dropvar2
,var3
, and all variables in between, you'd just be left withvar1
:Dplyr Solution
I doubt this will get much attention down here, but if you have a list of columns that you want to remove, and you want to do it in a
dplyr
chain I useone_of()
in theselect
clause:Here is a simple, reproducable example:
Documentation can be found by running
?one_of
or here:http://genomicsclass.github.io/book/pages/dplyr_tutorial.html
Find the index of the columns you want to drop using
which
. Give these indexes a negative sign (*-1
). Then subset on those values, which will remove them from the dataframe. This is an example.There's a function called
dropNamed()
in Bernd Bischl'sBBmisc
package that does exactly this.The advantage is that it avoids repeating the data frame argument and thus is suitable for piping in
magrittr
(just like thedplyr
approaches):