ggplot multiple time series of unequal time

2019-07-18 16:27发布

问题:

I know there are a few answered questions relating to timeseries and multiple dataframes, but I cant seem to figure this out.

I would like to plot time stamped data of 4 different pressure senors against time (column pa). I have 4 dfs of time stamped pressure readings from the same experiment. However, the sensors collected data at unequal times and the length of the columns are unequal due to sensor failures and other blips in the data.

These two aspects have prevented me from successfully creating a graph containing all 4 sensors' data. All of the df are of unequal number of observations but within the same range, but they differ at the seconds level. Would the time resolution need to be changed to hours, for example?

This is what the df looks like: PA_1 n=1097361

      time               pa       wifi
1 2014-09-01 16:21:00   100.620    1   
2 2014-09-01 17:20:33   100.572    1 
3 2014-09-01 18:20:05   100.561    0
4 2014-09-01 19:19:38   100.523    0
5 2014-09-01 20:19:11   100.511    1    
6 2014-09-01 21:18:43   100.534    1

PA_2: n=914364
       time              pa        wifi
1 2014-09-01 15:25:05   NA         1 
2 2014-09-01 15:25:09   100.798    1
3 2014-09-01 15:25:11   100.792    0              
4 2014-09-01 15:25:15   100.791    0              
5 2014-09-01 15:25:18   100.790    1             
6 2014-09-01 15:25:20   100.791    1  

PA_3 n=963527
       time              pa        wifi
1 2014-09-01 15:25:02   100.832    1
2 2014-09-01 15:25:05   100.832    1
3 2014-09-01 15:25:08   100.825    0
4 2014-09-01 15:25:11   100.831    0
5 2014-09-01 15:25:14   100.830    1
6 2014-09-01 15:25:17   100.836    1   

PA_4: n = 1061117
       time              pa        wifi
1 2014-09-01 15:25:00   100.690    1
2 2014-09-01 15:25:04   100.683    1
3 2014-09-01 15:25:07   100.685    0
4 2014-09-01 15:25:11   100.687    0
5 2014-09-01 15:25:14   100.682    1
6 2014-09-01 15:25:18   100.684    1       

Also, a dichotomous variable "wifi" was added to the df to denote when wifi was on or off during the experiment.Two of the sensors were exposed to wifi while two were outside of the wifi signal. I would like to display this in a graph as well. Perhaps by shading the region or increasing the size of the lines when wifi was on during the experiment, but I am not too sure how to do this. To illustrate this, I edited the middle 2 wifi entries in the examples, but wifi was on for periods of 10 days at a time, not a few seconds.

Thanks

edit: added examples of each df and added a few explinations

回答1:

It's not totally clear to me what you are asking, but (if this what you are trying to do) you can combine the data.frames then plot them all on one chart, using color to differentiate sensors, and alpha/shape settings to differentiate wifi status. Then it's no problem that the series start and end at different times, and have different measurement resolutions.

Something like this:

library(ggplot2)
ggplot(dat, 
       aes(x=time, y=pa, group=sensor,  
           color=factor(sensor),  alpha=factor(wifi))) +
  geom_point(aes(shape=factor(wifi)), size=3) +
  geom_line() +
  scale_alpha_manual(values=c(.3, 1))

Which (using totally random data) looks like this:

To generate random data, I did this:

library(lubridate)

# fake data
set.seed(123)
n <- 40

dat <-
  data.frame(sensor=sample(1:4, n, replace=T),
             hr=sample(1:24, n, replace=T), 
             min=sample(1:60, n, replace=T),
             sec=sample(1:60, n, replace=T),
             wifi=rbinom(n, 1, .5),
             pa=100+rnorm(n))

dat$time <- with(dat, ymd_hms(paste('2014-09-01', 
                                    paste(hr, min, sec, sep=':'))))


回答2:

I'm guessing you may be getting hung up with group=1 -- you must use aes(group=1) so that ggplot() knows to connect the data together in a line.

library(ggplot2)

# Create some data
set.seed(1)
PA_1 <- data.frame(time = Sys.Date()+rnorm(20, 0, 1),
                   pa   = 100 + rnorm(20, 0, 2),
                   wifi = sample(0:1, 20, 2),
                   dset = 1)

PA_2 <- data.frame(time = Sys.Date()+rnorm(15, 0, 1),
                   pa   = 100 + rnorm(15, 0, 2),
                   wifi = sample(0:1, 15, 2),
                   dset = 2)

PA_3 <- data.frame(time = Sys.Date()+rnorm(25, 0, 1),
                   pa   = 100 + rnorm(25, 0, 2),
                   wifi = sample(0:1, 25, 2),
                   dset = 3)

PA_4 <- data.frame(time = Sys.Date()+rnorm(20, 0, 1),
                   pa   = 100 + rnorm(20, 0, 2),
                   wifi = sample(0:1, 20, 2),
                   dset = 4)

# Combine the dataframes
df <- do.call(rbind, list(PA_1, PA_2, PA_3, PA_4))
head(df)
#         time        pa wifi dset
# 1 2015-01-11 101.83795    0    1
# 2 2015-01-12 101.56427    1    1
# 3 2015-01-11 100.14913    0    1
# 4 2015-01-13  96.02130    0    1
# 5 2015-01-12 101.23965    1    1
# 6 2015-01-11  99.88774    0    1


# Variation 1
p1 <- ggplot(df, aes(x=time, y=pa, group=1)) +
  geom_line()

# Variation 2
p2 <- ggplot(df, aes(x=time, y=pa, group=wifi, color=factor(wifi))) +
  geom_line()

# Variation 3
p3 <- ggplot(df, aes(x=time, y=pa, group=1)) +
  geom_line() +
  facet_wrap(~wifi)

library(gridExtra)
grid.arrange(p1, p2, p3, ncol=1)

Alternatively, if you choose to keep the datasets "separate" you could do one of the following:

ggplot(df, aes(x=time, y=pa, group=dset, color=factor(dset))) +
  geom_line()

ggplot(df, aes(x=time, y=pa, color=factor(dset))) +
  geom_line() +
  facet_grid(wifi~dset)