这个问题已经在这里有一个答案:
- 重塑多组测量列(宽格式)的成单柱(长形) 7分的答案
我具有R数据帧,我使用从互联网刮下readHTMLTable()
中的XML
包。 该表类似于以下摘录与人口和一年多的变量/列。 (请注意,这些年来没有跨列复制并表示人口的唯一标识符)。
year1 pop1 year2 pop2 year3 pop3
1
2 16XX 4675,0 1900 6453,0 1930 9981,2
3 17XX 4739,3 1901 6553,5 1931 ...
4 17XX 4834,0 1902 6684,0 1932
5 180X 4930,0 1903 6818,0 1933
6 180X 5029,0 1904 6955,0 1934
7 181X 5129,0 1905 7094,0 1935
8 181X 5231,9 1906 7234,7 1936
9 182X 5297,0 1907 7329,0 1937
10 182X 5362,0 1908 7422,0 1938
我想将数据重新组织成只是两列,一为一年,一个人口,看起来像下面这样:
year pop
1
2 16XX 4675,0
3 17XX 4739,3
4 17XX 4834,0
5 180X 4930,0
6 180X 5029,0
7 181X 5129,0
8 181X 5231,9
9 182X 5297,0
10 182X 5362,0
11 1900 6453,0
12 1901 6553,5
13 1902 6684,0
... ... ...
21 1930 9981,2
22 ...
从变量/列中的值year2
和year3
被附加以下year1
,因为是相应人口值。
我已经考虑了以下几点:
(1)循环以上的人口和年份列( n>2
),以及将这些值作为新的观测和YEAR1将population1工作,但这似乎不必要的繁琐。
(2)我试图熔体如下,但任何它不能在多个列处理id变量分裂,或我不正确地执行它。
df.melt <- melt(df, id=c("year1", "year2",...)
(3)最后,我认为拉出每年列作为自己的载体,并且每个附加的载体一起在这里:
year.all <- c(df$year1, df$year2,...)
然而,上述返回的year.all以下
[1] 1 2 3 3 4 4 5 5 6 6 7 8 8 9 9 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 1 1 2 ...
而比这
[1] 16XX 17XX 17XX 180X 180X 181X 181X 182X 182X 1900 1901 1902...
如果实现这一改组的直接方式我很愿意去学习它。 非常感谢您的帮助。
Answer 1:
如果“年”,“流行”,列交替,我们可以与子集c(TRUE, FALSE)
以获得列1,3,5,...等。 和c(FALSE, TRUE)
得到2,4,6,...由于回收。 然后,我们unlist
栏,并创建一个新的“data.frame。
df2 <- data.frame(year=unlist(df1[c(TRUE, FALSE)]),
pop=unlist(df1[c(FALSE, TRUE)]))
row.names(df2) <- NULL
head(df2)
# year pop
#1
#2 16XX 4675,0
#3 17XX 4739,3
#4 17XX 4834,0
#5 180X 4930,0
#6 180X 5029,0
或者另一种选择是
library(splitstackshape)
merged.stack(transform(df1, id=1:nrow(df1)), var.stubs=c('year', 'pop'),
sep='var.stubs')[order(.time_1), 3:4, with=FALSE]
数据
df1 <- structure(list(year1 = c("", "16XX", "17XX", "17XX", "180X",
"180X", "181X", "181X", "182X", "182X"), pop1 = c("", "4675,0",
"4739,3", "4834,0", "4930,0", "5029,0", "5129,0", "5231,9", "5297,0",
"5362,0"), year2 = c(NA, 1900L, 1901L, 1902L, 1903L, 1904L, 1905L,
1906L, 1907L, 1908L), pop2 = c("", "6453,0", "6553,5", "6684,0",
"6818,0", "6955,0", "7094,0", "7234,7", "7329,0", "7422,0"),
year3 = c(NA, 1930L, 1931L, 1932L, 1933L, 1934L, 1935L, 1936L,
1937L, 1938L), pop3 = c("", "9981,2", "", "", "", "", "",
"", "", "")), .Names = c("year1", "pop1", "year2", "pop2",
"year3", "pop3"), class = "data.frame", row.names = c(NA, -10L))
Answer 2:
使用新的功能,在melt
从data.table v1.9.5+
:
require(data.table) # v1.9.5+
melt(setDT(df), measure = patterns("^year", "^pop"), value.name = c("year", "pop"))
你可以找到护身符其余这里 。
Answer 3:
另一种选择是使用split.default
在dataframes列表拆分数据框,然后绑定在一起:
lst <- lapply(split.default(df1, sub('.*(\\d)', '\\1', names(df1))),
setNames, c('year','pop'))
do.call(rbind, lst)
这使所期望的结果:
year pop 1.1 16XX 4675,0 1.2 17XX 4739,3 1.3 17XX 4834,0 1.4 180X 4930,0 1.5 180X 5029,0 1.6 181X 5129,0 1.7 181X 5231,9 1.8 182X 5297,0 1.9 182X 5362,0 2.1 1900 6453,0 2.2 1901 6553,5 2.3 1902 6684,0 2.4 1903 6818,0 2.5 1904 6955,0 2.6 1905 7094,0 2.7 1906 7234,7 2.8 1907 7329,0 2.9 1908 7422,0 3.1 1930 9981,2 3.2 1931 10583,5 3.3 1932 8671,0 3.4 1933 9118,0 3.5 1934 9625,0 3.6 1935 8097,0 3.7 1936 7984,7 3.8 1937 8729,0 3.9 1938 10462,0
你也可以使用rbindlist
从data.table
包的最后一步:
library(data.table)
rbindlist(lst)
二手数据:
df1 <- structure(list(year1 = c("16XX", "17XX", "17XX", "180X", "180X", "181X", "181X", "182X", "182X"),
pop1 = c("4675,0", "4739,3", "4834,0", "4930,0", "5029,0", "5129,0", "5231,9", "5297,0", "5362,0"),
year2 = c(1900L, 1901L, 1902L, 1903L, 1904L, 1905L, 1906L, 1907L, 1908L),
pop2 = c("6453,0", "6553,5", "6684,0", "6818,0", "6955,0", "7094,0", "7234,7", "7329,0", "7422,0"),
year3 = c(1930L, 1931L, 1932L, 1933L, 1934L, 1935L, 1936L, 1937L, 1938L),
pop3 = c("9981,2", "10583,5", "8671,0", "9118,0", "9625,0", "8097,0", "7984,7", "8729,0", "10462,0")),
.Names = c("year1", "pop1", "year2", "pop2", "year3", "pop3"), class = "data.frame", row.names = c(NA, -9L))
文章来源: Reshape a dataframe to long format with multiple sets of measure columns [duplicate]