Reshape data into long columns based on variable i

2019-08-24 18:07发布

问题:

There are thousands of answers describing how to reshape from wide to long and organize by certain variables. I do not know what I'm not wrapping my head around. I need to organize rows that originally begin as rater, obs, val1, val2, etc. Into columns under rater for IRR.

Given a format similar to my data that can be created with:

r1 <- c('bob', 'sally', "george", "bob", "sally", "george")
r2 <- c(1,1,1,2,2,2)
r3 <- c("bad", "good", "good", "good", "good", "bad")
r4 <- c("bad", "bad", "good", "good", "good", "bad")
df=data.frame(r1,r2,r3,r4)
df = setNames(df,  c('rater','obs', 'val1', 'val2'))

I need to organize the data into columns based on 'rater'. Anything that works would be great, especially if 'obs' (observation number) could be preserved, e.g., obs1_val1, obs1_val2, etc.

For something along the lines of:

dcast(df, obs ~ rater)

Which creates:

   obs   bob   george sally
1   1    bad   good   bad
2   2    good  bad    good

However, this aggregates and removes the values for val2.

Rather, I need something along the lines of:

              bob   sally   george
  obs1_val1   bad   good    good
  obs1_val2   bad   bad     good
  obs2_val1   good  good   bad
  obs2_val2   good  good   bad

Looking at similar responses, I see the recommendation to melt and then dcast [I don't actually want to aggregate - but rather just stack in columns].

As the strings for val1 and val2 should be considered factors I've tried:

df$"val1" <- factor(df$val1, levels=c("bad","good"))
df$"val2" <- factor(df$val2, levels=c("bad","good"))

without any effect. Getting:

Aggregation function missing: defaulting to length

    obs bob  george sally
1   1   2      2     2
2   2   2      2     2

which is not helpful.

?

回答1:

Consider rbinding dcast() calls for both val1 and val2 columns. Additionally, add a column to capture the corresponding val value (since it is dropped during dcast). Hence, the use of data.frame():

rdf <- rbind(data.frame(val=c("va1"), dcast(df, obs ~ rater, value.var="val1")),
             data.frame(val=c("va2"), dcast(df, obs ~ rater, value.var="val2")))

#   val obs  bob george sally
# 1 va1   1  bad   good  good
# 2 va1   2 good    bad  good
# 3 va2   1  bad   good   bad
# 4 va2   2 good    bad  good

Should there be many val columns, iterate with lapply() and then do.call(rbind, ...) on list:

valcols <- names(df)[grep("val", names(df))] 

dfList <- lapply(valcols, function(v) {
  data.frame(val=c(v), dcast(df, obs ~ rater, value.var=v))
})    
rdf <- do.call(rbind, dfList)

Finally to render the character variables to factors call as.factor() in an sapply():

rdf <- data.frame(sapply(rdf, as.factor))
str(rdf)

# 'data.frame': 4 obs. of  5 variables:
# $ val   : Factor w/ 2 levels "val1","val2": 1 1 2 2
# $ obs   : Factor w/ 2 levels "1","2": 1 2 1 2
# $ bob   : Factor w/ 2 levels "bad","good": 1 2 1 2
# $ george: Factor w/ 2 levels "bad","good": 2 1 2 1
# $ sally : Factor w/ 2 levels "bad","good": 2 2 1 2


回答2:

The tidyverse option.

library(tidyverse)
df %>% 
   gather(val1, val2, key = "eval", value = "value") %>% 
   spread(key = rater, value = value)

You can then choose to either drop the 'obs' column completely or merge 'obs' and 'eval' into one using unite().



标签: r dcast