Automatically subset data frame by factor

2019-09-24 18:36发布

问题:

Looking for help writing a function to automatically subset data frames based on the value of a column? For example,

df$x contains values a, b, c, d

I want to make separate data frames named a, b, c, d that contain all values x == 'a', or x == 'b', etc. I know several methods to do this manually but am hoping for guidance on how to automate this? Thank you!

回答1:

maybe not the best way to do it, but will get the job done.

vars_df = unique(df$x)

for (i in 1:length(vars_df)) {
assign(paste0(vars_df[i]), df %>% filter(x == vars_df[i]), envir = .GlobalEnv)
}


回答2:

The split function returns a list of subsetted data frames:

split(df, df$x)

EDIT:

If you want a new object for each subsetted data frame:

for (i in levels(df$x)) {
    command <- paste0(i, "<-subset(df, x=='", i, "')")
    eval(parse(text=command))
}

EDIT 2:

To split by two or more variables, a more automated solution would be to create a function that takes as input a data frame and column names with which to subset the dataframe:

create_new_df <- function (dataframe, vars) {
    # Creates a new data frame in the global environment based on names of variables in 'vars'
    split(dataframe, as.list(dataframe[, vars]), drop = TRUE) %>%
        lapply(function (subset_dataframe) {
            new_object_name <- paste(as.character(subset_dataframe[1, vars])
            # The double arrowed '<<-' creates a new object in the global environment
            command <- paste0(new_object_name, collapse="_"), "<<-subset_dataframe")
            eval(parse(text=command))
        }) %>%
        invisible()
}

This function can then be used to create new objects with any combination of variables:

variables <- c("x", "y", "z")
create_new_df(df, variables)