Reshape Data Long to Wide - understanding reshape

2020-01-29 17:14发布

I have a long format dataframe dogs that I'm trying to reformat to wide using the reshape() function. It currently looks like so:

dogid  month  year  trainingtype  home  school  timeincomp
12345  1      2014  1             1     1       340
12345  2      2014  1             1     1       360
31323  12     2015  2             7     3       440
31323  1      2014  1             7     3       500
31323  2      2014  1             7     3       520

The dogid column is a bunch of ids, one for each dog. The month column varies for 1 to 12 for the 12 months, and year from 2014 to 2015. Trainingtype varies for 1 to 2. Each dog has a timeincomp value for every month-year-trainingtype combination, so 48 entries per dog. Home and school vary from 1-8 and are constant per dog (every entry for the same dog has the same school and home). Time in comp is my response variable.

I would like my table to look like so:

dogid  home  school  month1year2014trainingtype1  month2year2014trainingtype1
12345  1     1       340                          360
31323  7     3       500                          520

etc. (with columns for each month-year-trainingtype combination)

What parameters should I use in reshape to achieve this?

3条回答
聊天终结者
2楼-- · 2020-01-29 17:46

You can use the function dcast from package reshape2. It's easier to understand. The left side of the formula is the one that stays long, while the right side is the one that goes wide.

The fun.aggregate is the function to apply in case that there is more than 1 number per case. If you're sure you don't have repeated cases, you can use mean or sum

dcast(data, formula= dogid + home + school ~ month + year + trainingtype,
value.var = 'timeincomp',
fun.aggregate = sum)

I hope it works:

  dogid home school 1_2014_1 2_2014_1 12_2015_2
1 12345    1      1      340      360         0
2 31323    7      3      500      520       440
查看更多
我命由我不由天
3楼-- · 2020-01-29 17:55

In this case, using base reshape, you essentially want an interaction() of the three time variables to define your wide variables, so:

idvars  <- c("dogid","home","school")
grpvars <- c("year","month","trainingtype")
outvar  <- "timeincomp"
time    <- interaction(dat[grpvars])

reshape(
  cbind(dat[c(idvars,outvar)],time),
  idvar=idvars,
  timevar="time",
  direction="wide"
)

#  dogid home school timeincomp.2014.1.1 timeincomp.2014.2.1 timeincomp.2015.12.2
#1 12345    1      1                 340                 360                   NA
#3 31323    7      3                 500                 520                  440
查看更多
一夜七次
4楼-- · 2020-01-29 17:57

You can do the same thing using the new replacement for reshape2, tidyr:

library(tidyr)
library(dplyr)
data %>% unite(newcol, c(year, month, trainingtype)) %>%
         spread(newcol, timeincomp)

  dogid home school 2014_1_1 2014_2_1 2015_12_2
1 12345    1      1      340      360        NA
2 31323    7      3      500      520       440

First, we unite the year, month and trainingtype columns into a new column called newcol, then we spread the data with timeincomp as our value variable.

The NA is there as we have no value, you can give it one by changing fill = NA in the spread function.

查看更多
登录 后发表回答