Looking for help writing a function to automatically subset data frames based on the value of a column? For example,
df$x contains values a, b, c, d
I want to make separate data frames named a, b, c, d that contain all values x == 'a', or x == 'b', etc. I know several methods to do this manually but am hoping for guidance on how to automate this? Thank you!
maybe not the best way to do it, but will get the job done.
vars_df = unique(df$x)
for (i in 1:length(vars_df)) {
assign(paste0(vars_df[i]), df %>% filter(x == vars_df[i]), envir = .GlobalEnv)
}
The split
function returns a list of subsetted data frames:
split(df, df$x)
EDIT:
If you want a new object for each subsetted data frame:
for (i in levels(df$x)) {
command <- paste0(i, "<-subset(df, x=='", i, "')")
eval(parse(text=command))
}
EDIT 2:
To split by two or more variables, a more automated solution would be to create a function that takes as input a data frame and column names with which to subset the dataframe:
create_new_df <- function (dataframe, vars) {
# Creates a new data frame in the global environment based on names of variables in 'vars'
split(dataframe, as.list(dataframe[, vars]), drop = TRUE) %>%
lapply(function (subset_dataframe) {
new_object_name <- paste(as.character(subset_dataframe[1, vars])
# The double arrowed '<<-' creates a new object in the global environment
command <- paste0(new_object_name, collapse="_"), "<<-subset_dataframe")
eval(parse(text=command))
}) %>%
invisible()
}
This function can then be used to create new objects with any combination of variables:
variables <- c("x", "y", "z")
create_new_df(df, variables)