可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
This question already has an answer here:
-
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
6 answers
I have the below data formed using code
test <- data.frame(dis = c(10,20,30,40),dur=c(30,40,60,90),method=c("car","car","Bicycle","Bicycle"),to_lon=c(-1.980,-1.5678,-1.324,-1.456),to_lat=c(55.3009,55.3416,55.1123,55.2234),from_lon=c(-1.4565,-1.3424,-1.4566,-1.1111),from_lat=c(76.8888,65.8999,76.9088,25.3344))
dis dur method to_lon to_lat from_lon from_lat
1 10 30 car -1.9800 55.3009 -1.4565 76.8888
2 20 40 car -1.5678 55.3416 -1.3424 65.8999
3 30 60 Bicycle -1.3240 55.1123 -1.4566 76.9088
4 40 90 Bicycle -1.4560 55.2234 -1.1111 25.3344
I want to convert this data frame such that it has one row for to_lat and to_lon and in the next row it has from_lat and from_lon. The rest of the details do not need to change and can be replicated. The desired result should be as below
dis dur method longitude latitude
from 10 30 car -1.98 55.3009
to 10 30 car -1.4565 76.8888
from 20 40 car -1.5678 55.3416
to 20 40 car -1.3424 65.8999
from 30 60 Bicycle -1.324 55.1123
to 30 60 Bicycle -1.4566 76.9088
from 40 90 Bicycle -1.456 55.2234
to 40 90 Bicycle -1.1111 25.3344
Any help will be much appreciated.
Thanks.
回答1:
We can use melt
from data.table
which can take multiple measure
columns.
library(data.table)
dM <- melt(setDT(test), measure=patterns('lon', 'lat'),
value.name=c('longitude', 'latitude'))
#change the 'variable' column from numeric index to 'from/to'
dM[, variable:= c('from', 'to')[variable]]
#create a sequence column grouped by 'variable'
dM[,i1:= 1:.N ,variable]
#order based on the 'i1'
res <- dM[order(i1)][,i1:=NULL]
res
# dis dur method variable longitude latitude
#1: 10 30 car from -1.9800 55.3009
#2: 10 30 car to -1.4565 76.8888
#3: 20 40 car from -1.5678 55.3416
#4: 20 40 car to -1.3424 65.8999
#5: 30 60 Bicycle from -1.3240 55.1123
#6: 30 60 Bicycle to -1.4566 76.9088
#7: 40 90 Bicycle from -1.4560 55.2234
#8: 40 90 Bicycle to -1.1111 25.3344
回答2:
This may not be the most elegant solution, but it should work and is hopefully understandable:
We split the data into two dataframes: one with the 'from' longitude and latitude data (call it testF) and the other with the 'to' data (call it test). We then use rbind to insert the rows of 'testF' in the appropriate places in 'test'.
test <- data.frame(dis = c(10,20,30,40),dur=c(30,40,60,90),method=c("car","car","Bicycle","Bicycle"),to_lon=c(-1.980,-1.5678,-1.324,-1.456),to_lat=c(55.3009,55.3416,55.1123,55.2234),from_lon=c(-1.4565,-1.3424,-1.4566,-1.1111),from_lat=c(76.8888,65.8999,76.9088,25.3344))
testF <- test[,c(1:3,6,7)]
names(testF)[4:5] <- c("lonitude", "latitude")
test <- test[,1:5]
names(test)[4:5] <- c("lonitude", "latitude")
for(i in dim(test)[1]:1) {
test <- rbind(test[1:i,], testF[i,], test[-(1:i),])
}
回答3:
Here is an alternative approach using package tidyr
(a popular package for data munging), which avoids for
loop.
library(tidyr)
test <- data.frame(dis = c(10,20,30,40),dur=c(30,40,60,90),method=c("car","car","Bicycle","Bicycle"),to_lon=c(-1.980,-1.5678,-1.324,-1.456),to_lat=c(55.3009,55.3416,55.1123,55.2234),from_lon=c(-1.4565,-1.3424,-1.4566,-1.1111),from_lat=c(76.8888,65.8999,76.9088,25.3344))
test$id <- 1:dim(test)[1]
# gather latitude columns
d1 <- gather(data = test,
key = direction,
value = latitude,
to_lat, from_lat)
# gather longitude columns
d2 <- gather(data = test,
key = direction,
value = longitude,
to_lon, from_lon)
d3 <- cbind(d1[,c("direction","dis","dur","method","latitude")],d2[,c("longitude","id"),drop=FALSE])
# Create names
dir <- unlist(strsplit(d3$direction,"_"))
dir <- dir[seq(from = 1, to = length(dir), by = 2)]
# Factor and sort
d3$direction <- factor(dir)
d3[order(d3$id),]