可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
This question already has an answer here:
-
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
6 answers
I need to reshape my wide table into long format but keeping multiple fields for each record, for example:
dw <- read.table(header=T, text='
sbj f1.avg f1.sd f2.avg f2.sd blabla
A 10 6 50 10 bA
B 12 5 70 11 bB
C 20 7 20 8 bC
D 22 8 22 9 bD
')
# Now I want to melt this table, keeping both AVG and SD as separate fields for each measurement, to get something like this:
# sbj var avg sd blabla
# A f1 10 6 bA
# A f2 50 10 bA
# B f1 12 5 bB
# B f2 70 11 bB
# C f1 20 7 bC
# C f2 20 8 bC
# D f1 22 8 bD
# D f2 22 9 bD
I have basic knowledge of using melt
and reshape
, but it is not obvious for me how to apply such reshaping in my case.
I would be grateful for any hints or point to another SO post if something similar have been asked already.
回答1:
reshape
does this with the appropriate arguments.
varying
lists the columns which exist in the wide format, but are split into multiple rows in the long format. v.names
is the long format equivalents. Between the two, a mapping is created.
From ?reshape
:
Also, guessing is not attempted if v.names is given explicitly. Notice that the order of variables in varying is like x.1,y.1,x.2,y.2.
Given these varying
and v.names
arguments, reshape
is smart enough to see that I've specified that the index is before the dot here (i.e., order 1.x, 1.y, 2.x, 2.y). Note that the original data has the columns in this order, so we can specify varying=2:5
for this example data, but that is not safe in general.
Given the values of times
and v.names
, reshape
splits the varying
columns on a .
character (the default sep
argument) to create the columns in the output.
times
specifies values that are to be used in the created var
column, and v.names
are pasted onto these values to get column names in the wide format for mapping to the result.
Finally, idvar
is specified to be the sbj
column, which identifies individual records in the wide format (thanks @thelatemail).
reshape(dw, direction='long',
varying=c('f1.avg', 'f1.sd', 'f2.avg', 'f2.sd'),
timevar='var',
times=c('f1', 'f2'),
v.names=c('avg', 'sd'),
idvar='sbj')
## sbj blabla var avg sd
## A.f1 A bA f1 10 6
## B.f1 B bB f1 12 5
## C.f1 C bC f1 20 7
## D.f1 D bD f1 22 8
## A.f2 A bA f2 50 10
## B.f2 B bB f2 70 11
## C.f2 C bC f2 20 8
## D.f2 D bD f2 22 9
回答2:
Another option using Hadley's new tidyr
package.
library(tidyr)
library(dplyr)
dw <- read.table(header=T, text='
sbj f1.avg f1.sd f2.avg f2.sd blabla
A 10 6 50 10 bA
B 12 5 70 11 bB
C 20 7 20 8 bC
D 22 8 22 9 bD
')
dw %>%
gather(v, value, f1.avg:f2.sd) %>%
separate(v, c("var", "col")) %>%
arrange(sbj) %>%
spread(col, value)
回答3:
This seems to do what you want except that the f
is removed from elements in time
.
reshape(dw, idvar = "sbj", varying = list(c(2,4),c(3,5)), v.names = c("ave", "sd"), direction = "long")
sbj blabla time ave sd
A.1 A bA 1 10 6
B.1 B bB 1 12 5
C.1 C bC 1 20 7
D.1 D bD 1 22 8
A.2 A bA 2 50 10
B.2 B bB 2 70 11
C.2 C bC 2 20 8
D.2 D bD 2 22 9
回答4:
To add to the options available here, you can also consider merged.stack
from my "splitstackshape" package:
library(splitstackshape)
merged.stack(dw, var.stubs = c("avg", "sd"), sep = "var.stubs", atStart = FALSE)
# sbj blabla .time_1 avg sd
# 1: A bA f1. 10 6
# 2: A bA f2. 50 10
# 3: B bB f1. 12 5
# 4: B bB f2. 70 11
# 5: C bC f1. 20 7
# 6: C bC f2. 20 8
# 7: D bD f1. 22 8
# 8: D bD f2. 22 9
You can also do a little more cleanup on the ".time_1"
variable, like this.
merged.stack(dw, var.stubs = c("avg", "sd"),
sep = "var.stubs", atStart = FALSE)[, .time_1 := sub(
".", "", .time_1, fixed = TRUE)][]
# sbj blabla .time_1 avg sd
# 1: A bA f1 10 6
# 2: A bA f2 50 10
# 3: B bB f1 12 5
# 4: B bB f2 70 11
# 5: C bC f1 20 7
# 6: C bC f2 20 8
# 7: D bD f1 22 8
# 8: D bD f2 22 9
You would note the use of the atStart = FALSE
argument. This is because your names are in a little bit of a different order than reshape-related functions seem to like. In general, the "stub" is expected to come first, and then the "times", like this:
dw2 <- dw
setnames(dw2, gsub("(.*)\\.(.*)", "\\2.\\1", names(dw2)))
names(dw2)
# [1] "sbj" "avg.f1" "sd.f1" "avg.f2" "sd.f2" "blabla"
If the names were in that format, then both base R's reshape
and merged.stack
benefit from more direct syntax:
merged.stack(dw2, var.stubs = c("avg", "sd"), sep = ".")
reshape(dw2, idvar = c("sbj", "blabla"), varying = 2:5,
sep = ".", direction = "long")
回答5:
melt
from the >=1.9.6 version of data.table
, does this by specifying the column index in measure.vars
as a list
.
melt(setDT(dw), measure.vars=list(c(2,4), c(3,5)),
variable.name='var', value.name=c('avg', 'sd'))[,
var:= paste0('f',var)][order(sbj)]
# sbj blabla var avg sd
#1: A bA f1 10 6
#2: A bA f2 50 10
#3: B bB f1 12 5
#4: B bB f2 70 11
#5: C bC f1 20 7
#6: C bC f2 20 8
#7: D bD f1 22 8
#8: D bD f2 22 9
Or you could use the new patterns
function:
melt(setDT(dw),
measure = patterns("avg", "sd"),
variable.name = 'var', value.name = c('avg', 'sd'))
# sbj blabla var avg sd
# 1: A bA 1 10 6
# 2: B bB 1 12 5
# 3: C bC 1 20 7
# 4: D bD 1 22 8
# 5: A bA 2 50 10
# 6: B bB 2 70 11
# 7: C bC 2 20 8
# 8: D bD 2 22 9