I have a dataset such that the same variable is contained in difference columns for each subject. I want to merge them to the same columns.
E.g.:, I have this dataframe, and there are three DVs, but they are in different columns (A,B,C) for different subjects.
data.frame(ID = c(1,2,3), DV1_A=c(1,NA,NA), DV1_B= c(NA,4,NA), DV1_C = c(NA,NA,5), DV2_A=c(3,NA,NA), DV2_B=c(NA,3,NA), DV2_C=c(NA,NA,5), FACT = c("A","B","C"))
How can I merge them to just two columns? so the result is:
data.frame(ID = c(1,2,3), DV1_A=c(1,NA,NA), DV1_B= c(NA,4,NA), DV1_C = c(NA,NA,5), DV2_A=c(3,NA,NA), DV2_B=c(NA,3,NA), DV2_C=c(NA,NA,5), FACT = c("A","B","C"), DV_1 = c(1,4,5), DV_2 = c(3,3,5))
You could also do this via
gather
andspread
withtidyr
anddplyr
. Less concise than @useR's solution, but might be useful if you need to do any intermediate manipulation.For the sake of completeness, here is also a
data.table
solution usingmelt()
to reshape two measure variables simultaneously:Now, the six columns have been merged to just two columns as requested by the OP.
However, the OP has given a data.frame with the expected result where the new columns are appended to the existing columns. This can be achieved by joining above result with the original data frame:
The base
transform
will do this:This will work, though not a very elegant solution when you could use the coalesce function already mentioned:
You can use
coalesce
fromdplyr
:If you have a lot of
DV
columns to combine, you might not want to type all the column names. In this case, you can firstgrep
the column names for eachDV
, parse each name to symbols withrlang::syms
, then splice (!!!
) the symbols incoalesce
(Advice from @hadley):If instead, you have a ton of
DV
's, you might not even want to type all thecoalesce
lines, in this case, you can create a function that outputs oneDV
column given an input number andlapply
+bind_col
all of them together:Result:
Note:
This method will work for both
numeric
andcharacter
class columns, but notfactor
's. One should first convert thefactor
columns to character before using this method.Data:
Another solution similar to @userR, but rather than creating each column individually, this creates a list of expressions that get evaluated all at once. It may still suffer the same "don't splice data frames into calls with
!!!
" fault that was mentioned in the comments since it usesselect(.)
, but I thought I would post anyways.