Merge multiple variables in R

2020-02-07 07:06发布

I have a dataset such that the same variable is contained in difference columns for each subject. I want to merge them to the same columns.

E.g.:, I have this dataframe, and there are three DVs, but they are in different columns (A,B,C) for different subjects.

data.frame(ID = c(1,2,3), DV1_A=c(1,NA,NA), DV1_B= c(NA,4,NA), DV1_C = c(NA,NA,5), DV2_A=c(3,NA,NA), DV2_B=c(NA,3,NA), DV2_C=c(NA,NA,5), FACT = c("A","B","C"))

How can I merge them to just two columns? so the result is:

data.frame(ID = c(1,2,3), DV1_A=c(1,NA,NA), DV1_B= c(NA,4,NA), DV1_C = c(NA,NA,5), DV2_A=c(3,NA,NA), DV2_B=c(NA,3,NA), DV2_C=c(NA,NA,5), FACT = c("A","B","C"), DV_1 = c(1,4,5), DV_2 = c(3,3,5))

标签: r dataframe
6条回答
爷、活的狠高调
2楼-- · 2020-02-07 07:22

You could also do this via gather and spread with tidyr and dplyr. Less concise than @useR's solution, but might be useful if you need to do any intermediate manipulation.

library(dplyr)
library(tidyr)

df %>% 
  gather(variable, value, -ID, -FACT, na.rm = TRUE) %>% 
  mutate(variable = gsub("\\_[A-Z]", "", variable)) %>% 
  spread(variable, value) %>% 
  left_join(df)

  ID FACT DV1 DV2 DV1_A DV1_B DV1_C DV2_A DV2_B DV2_C
1  1    A   1   3     1    NA    NA     3    NA    NA
2  2    B   4   3    NA     4    NA    NA     3    NA
3  3    C   5   5    NA    NA     5    NA    NA     5
查看更多
淡お忘
3楼-- · 2020-02-07 07:23

For the sake of completeness, here is also a data.table solution using melt() to reshape two measure variables simultaneously:

library(data.table)
cols <- c("DV1", "DV2")
melt(setDT(DF), measure.vars = patterns(cols), value.name = cols, na.rm = TRUE)[
  , -"variable"]
   ID FACT DV1 DV2
1:  1    A   1   3
2:  2    B   4   3
3:  3    C   5   5

Now, the six columns have been merged to just two columns as requested by the OP.

However, the OP has given a data.frame with the expected result where the new columns are appended to the existing columns. This can be achieved by joining above result with the original data frame:

 setDT(DF)[melt(DF, measure.vars = patterns(cols), value.name = cols, na.rm = TRUE)[
  , -"variable"], on = .(ID, FACT)]
   ID DV1_A DV1_B DV1_C DV2_A DV2_B DV2_C FACT DV1 DV2
1:  1     1    NA    NA     3    NA    NA    A   1   3
2:  2    NA     4    NA    NA     3    NA    B   4   3
3:  3    NA    NA     5    NA    NA     5    C   5   5
查看更多
Rolldiameter
4楼-- · 2020-02-07 07:35

The base transform will do this:

d <- transform(d, 
               DV1 = rowSums(d[c("DV1_A", "DV1_B", "DV1_C")], na.rm=T),
               DV2 = rowSums(d[c("DV2_A", "DV2_B", "DV2_C")], na.rm=T)
          )
查看更多
我想做一个坏孩纸
5楼-- · 2020-02-07 07:36

This will work, though not a very elegant solution when you could use the coalesce function already mentioned:

library(dplyr)
test <- df %>% group_by(ID) %>% summarise(DV1 = ifelse(!is.na(DV1_A),paste(DV1_A),ifelse(!is.na(DV1_B),paste(DV1_B),ifelse(!is.na(DV1_C),paste(DV1_C),""))), DV2 = ifelse(!is.na(DV2_A),paste(DV2_A),ifelse(!is.na(DV2_B),paste(DV2_B),ifelse(!is.na(DV2_C),paste(DV2_C),""))))
查看更多
\"骚年 ilove
6楼-- · 2020-02-07 07:41

You can use coalesce from dplyr:

library(dplyr)

df %>%
  mutate(DV_1 = coalesce(DV1_A, DV1_B, DV1_C),
         DV_2 = coalesce(DV2_A, DV2_B, DV2_C))

If you have a lot of DV columns to combine, you might not want to type all the column names. In this case, you can first grep the column names for each DV, parse each name to symbols with rlang::syms, then splice (!!!) the symbols in coalesce (Advice from @hadley):

library(rlang)
var_quo1 = syms(grep("DV1", names(df), value = TRUE))
var_quo2 = syms(grep("DV2", names(df), value = TRUE))

df %>%
  mutate(DV_1 = coalesce(!!! var_quo1),
         DV_2 = coalesce(!!! var_quo2))

If instead, you have a ton of DV's, you might not even want to type all the coalesce lines, in this case, you can create a function that outputs one DV column given an input number and lapply + bind_col all of them together:

DV_combine = function(num_DVs){

  DV_name = sym(paste0("DV", num_DVs))
  DV_syms = syms(grep(paste0("DV", num_DVs), names(df), value = TRUE))

  df %>%
    transmute(!!DV_name := coalesce(!!! DV_syms))
}

bind_cols(df, lapply(1:2, DV_combine))

Result:

  ID DV1_A DV1_B DV1_C DV2_A DV2_B DV2_C FACT DV_1 DV_2
1  1     1    NA    NA     3    NA    NA    A    1    3
2  2    NA     4    NA    NA     3    NA    B    4    3
3  3    NA    NA     5    NA    NA     5    C    5    5

Note:

This method will work for both numeric and character class columns, but not factor's. One should first convert the factor columns to character before using this method.

Data:

df = structure(list(ID = c(1, 2, 3), DV1_A = c(1, NA, NA), DV1_B = c(NA, 
4, NA), DV1_C = c(NA, NA, 5), DV2_A = c(3, NA, NA), DV2_B = c(NA, 
3, NA), DV2_C = c(NA, NA, 5), FACT = structure(1:3, .Label = c("A", 
"B", "C"), class = "factor")), .Names = c("ID", "DV1_A", "DV1_B", 
"DV1_C", "DV2_A", "DV2_B", "DV2_C", "FACT"), row.names = c(NA, 
-3L), class = "data.frame")
查看更多
家丑人穷心不美
7楼-- · 2020-02-07 07:43

Another solution similar to @userR, but rather than creating each column individually, this creates a list of expressions that get evaluated all at once. It may still suffer the same "don't splice data frames into calls with !!!" fault that was mentioned in the comments since it uses select(.), but I thought I would post anyways.


library(rlang)
library(dplyr)

df <- data.frame(ID = c(1,2,3), DV1_A=c(1,NA,NA), 
                 DV1_B= c(NA,4,NA), DV1_C = c(NA,NA,5), 
                 DV2_A=c(3,NA,NA), DV2_B=c(NA,3,NA), 
                 DV2_C=c(NA,NA,5), FACT = c("A","B","C"))

create_DV <- function(num) {
  DV_name <- sym(paste0("DV_", num))
  DV_char <- paste0("DV", num)

  expr(!! DV_name := select(., contains(!! DV_char)) %>% rowSums(na.rm = TRUE))
}

DV_expr_list <- c(1,2) %>% 
  lapply(create_DV)

df %>%
  mutate(
    !!! DV_expr_list
  )
#>   ID DV1_A DV1_B DV1_C DV2_A DV2_B DV2_C FACT DV_1 DV_2
#> 1  1     1    NA    NA     3    NA    NA    A    1    3
#> 2  2    NA     4    NA    NA     3    NA    B    4    3
#> 3  3    NA    NA     5    NA    NA     5    C    5    5
查看更多
登录 后发表回答