I'm familiar with being able to extract columns from an R data frame (or matrix) like so:
df.2 <- df[, c("name1", "name2", "name3")]
But can one use a !
or other tool to select all but those listed columns?
For background, I have a data frame with quite a few column vectors and I'd like to avoid:
- Typing out the majority of the names when I could just remove a minority
- Using the much shorter
df.2 <- df[, c(1,3,5)]
because when my .csv file changes, my code goes to heck since the numbering isn't the same anymore. I'm new to R and think I've learned the hard way not to use number vectors for larger df's that might change.
I tried:
df.2 <- df[, !c("name1", "name2", "name3")]
df.2 <- df[, !=c("name1", "name2", "name3")]
And just as I was typing this, found out that this works:
df.2 <- df[, !names(df) %in% c("name1", "name2", "name3")]
Is there a better way than this last one?
An alternative to grep
is which
:
df.2 <- df[, -which(names(df) %in% c("name1", "name2", "name3"))]
You can make a shorter call that is also more generalizable with negative-grep:
df.2 <- df[, -grep("^name[1:3]$", names(df) )]
Since grep returns numerics you can use the negative vector indexing to remove columns. You could add further number or more complex patterns.
dplyr::select()
has several options for dropping specific columns:
library(dplyr)
drop_columns <- c('cyl','disp','hp')
mtcars %>%
select(-one_of(drop_columns)) %>%
head(2)
mpg drat wt qsec vs am gear carb
Mazda RX4 21 3.9 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21 3.9 2.875 17.02 0 1 4 4
Negating specific column names, the following drops the column "hp" and the columns from "qsec" through "gear":
mtcars %>%
select(-hp, -(qsec:gear)) %>%
head(2)
mpg cyl disp drat wt carb
Mazda RX4 21 6 160 3.9 2.620 4
Mazda RX4 Wag 21 6 160 3.9 2.875 4
You could also negate contains()
, starts_with()
, ends_with()
, or matches()
:
mtcars %>%
select(-contains('t')) %>%
select(-starts_with('a')) %>%
select(-ends_with('b')) %>%
select(-matches('^m.+g$')) %>%
head(2)
cyl disp hp qsec vs gear
Mazda RX4 6 160 110 16.46 0 4
Mazda RX4 Wag 6 160 110 17.02 0 4
You could make a custom function to do this if you're using it for your own use to manipulate data. I may do something like this:
rm.col <- function(df, ...) {
x <- substitute(...())
z <- Trim(unlist(lapply(x, function(y) as.character(y))))
df[, !names(df) %in% z]
}
rm.col(mtcars, hp, mpg)
The first argument is the dataframe name. the following ...
are the names of any columns you wish to remove.
Old thread, but here's another solution:
df.2 <- subset(df, select=-c(name1, name2, name3))
This was posted in another similar thread (though I can't find it right now). Should be sustainable code in the situation you describe, and is probably easier to read and edit than some of the other options.
The easiest way that comes to my mind:
filtered_df<-df[, setdiff(names(df),c("name1","name2") ]
essentially you are computing the set difference between full list of column names and the subset you want to filter out (name1 and name2 above).