Excel or R: Merge time series with missing values

This question already has an answer here:

How to join (merge) data frames (inner, outer, left, right) 13 answers

I have multiple somewhat irregular time series (each in a CSV file) like so:

X.csv

date,time,value
01/01/04,00:15:00,4.98
01/01/04,00:25:00,4.981
01/01/04,00:35:00,4.983
01/01/04,00:55:00,4.986

and so:

Y.csv

date,time,value
01/01/04,00:05:00,9.023
01/01/04,00:15:00,9.022
01/01/04,00:35:00,9.02
01/01/04,00:45:00,9.02
01/01/04,00:55:00,9.019

Notice how there's basically a granularity of 10 mins in both files, but each has some missing entries.

I would now like to merge these two time series achieve the following:

date,time,X,Y
01/01/04,00:05:00,NA,9.023
01/01/04,00:15:00,4.98,9.022
01/01/04,00:25:00,4.981,NA
01/01/04,00:35:00,4.983,9.02
01/01/04,00:45:00,NA,9.02
01/01/04,00:55:00,4.986,9.019

Is there an easy way of achieving this? Since I have multiple files (not just two), is there a way of doing this for a batch of files?

标签： r excel csv merge time-series

2条回答

再贱就再见

2楼-- · 2019-09-03 06:54

You can use dplyr to do this. First read in all the files from group X and group Y using a do loop, so that you end up with just one file for each. Then full_join the results.

0人赞添加讨论(0) 举报

Root（大扎）

3楼-- · 2019-09-03 06:59

Getting your data :

X <- read.table(pipe("pbpaste"), sep=",", header=T)
X$date <- as.POSIXct(paste(as.Date(X$date, format='%m/%d/%y'),X$time))

gets us

> X
                 date     time value
1 2004-01-01 00:15:00 00:15:00 4.980
2 2004-01-01 00:25:00 00:25:00 4.981
3 2004-01-01 00:35:00 00:35:00 4.983
4 2004-01-01 00:55:00 00:55:00 4.986

same with Y:

> Y
                 date     time value
1 2004-01-01 00:05:00 00:05:00 9.023
2 2004-01-01 00:15:00 00:15:00 9.022
3 2004-01-01 00:35:00 00:35:00 9.020
4 2004-01-01 00:45:00 00:45:00 9.020
5 2004-01-01 00:55:00 00:55:00 9.019

now convert X,Y to xts-objects and merge the 2 objects with an outer join to get all the data points.

result <- merge(as.xts(X[,3],order.by = X$date),as.xts(Y[,3],order.by = Y$date),join='outer’)

names(result) <- c('x','y')

The last step is to sum the values by rows:

result$bothXY <- rowSums(result,na.rm=T)

If you don’t need the x,y columns anymore:

result <- result[,3]

and you get:

> result
                    bothXY
2004-01-01 00:05:00  9.023
2004-01-01 00:15:00 14.002
2004-01-01 00:25:00  4.981
2004-01-01 00:35:00 14.003
2004-01-01 00:45:00  9.020
2004-01-01 00:55:00 14.005

0人赞添加讨论(0) 举报

Excel or R: Merge time series with missing values

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间