extract hours and seconds from POSIXct for plottin

2019-01-13 05:21发布

问题:

Suppose I have the following data.frame foo

           start.time duration
1 2012-02-06 15:47:00      1
2 2012-02-06 15:02:00      2
3 2012-02-22 10:08:00      3
4 2012-02-22 09:32:00      4
5 2012-03-21 13:47:00      5

And class(foo$start.time) returns

[1] "POSIXct" "POSIXt" 

I'd like to create a plot of foo$duration v. foo$start.time. In my scenario, I'm only interested in the time of day rather than the actual day of the year. How does one go about extracting the time of day as hours:seconds from POSIXct class of vector?

回答1:

This is a good question, and highlights some of the difficulty in dealing with dates in R. The lubridate package is very handy, so below I present two approaches, one using base (as suggested by @RJ-) and the other using lubridate.

Recreate the (first two rows of) the dataframe in the original post:

foo <- data.frame(start.time = c("2012-02-06 15:47:00", 
                                 "2012-02-06 15:02:00",
                                 "2012-02-22 10:08:00"),
                  duration   = c(1,2,3))

Convert to POSIXct and POSIXt class (two ways to do this)

# using base::strptime
t.str <- strptime(foo$start.time, "%Y-%m-%d %H:%M:%S")

# using lubridate::ymd_hms
library(lubridate)
t.lub <- ymd_hms(foo$start.time)

Now, extract time as decimal hours

# using base::format
h.str <- as.numeric(format(t.str, "%H")) +
               as.numeric(format(t.str, "%M"))/60

# using lubridate::hour and lubridate::minute
h.lub <- hour(t.lub) + minute(t.lub)/60

Demonstrate that these approaches are equal:

identical(h.str, h.lub)

Then choose one of above approaches to assign decimal hour to foo$hr:

foo$hr <- h.str

# If you prefer, the choice can be made at random:
foo$hr <- if(runif(1) > 0.5){ h.str } else { h.lub }

then plot using the ggplot2 package:

library(ggplot2)
qplot(foo$hr, foo$duration) + 
             scale_x_datetime(labels = "%S:00")


回答2:

You could rely on base R:

# Using R 2.14.2
# The same toy data
foo <- data.frame(start.time = c("2012-02-06 15:47:00", 
                                 "2012-02-06 15:02:00",
                                 "2012-02-22 10:08:00"),
                  duration   = c(1,2,3))

Since class POSIXct contains date-time information in a structured manner, you can rely on substr to extract the characters in time positions within the POSIXct vector. That is, given you know the format of your POSIXct (how it would be presented when printed), you can extract hours and minutes:

# Extract hour and minute as a character vector, of the form "%H:%M"
substr(foo$start.time, 12, 16)

And then paste it to an arbitrary date to convert it back to POSIXct. In the example I use January first 2012, but if you don't specify a date and instead use format R uses the current date.

# Store time information as POSIXct, using an arbitrary date
foo$time <- as.POSIXct(paste("2012-01-01", substr(foo$start.time, 12, 16)))

And both plot and ggplot2 know how to format times in POSIXct out of the box.

# Plot it using base graphics
plot(duration~time, data=foo)

# Plot it using ggplot2 (0.9.2.1)
library(ggplot2)
qplot(x=time, y=duration, data=foo)


回答3:

This code is much faster than converting to string and back to numeric

time <- c("1979-11-13T08:37:19-0500", "2014-05-13T08:37:19-0400");
time.posix <- as.POSIXct(time, format = "%Y-%m-%dT%H:%M:%S%z");
time.epoch <- as.vector(unclass(time.posix));
time.poslt <- as.POSIXlt(time.posix, tz = "America/New_York");
time.hour.new.york <- time.poslt$hour + time.poslt$min/60 + time.poslt$sec/3600;

> time;
[1] "1979-11-13T08:37:19-0500" "2014-05-13T08:37:19-0400"
> time.posix;
[1] "1979-11-13 15:37:19 IST" "2014-05-13 15:37:19 IDT"
> time.poslt;
[1] "1979-11-13 08:37:19 EST" "2014-05-13 08:37:19 EDT"
> time.epoch;
[1]  311348239 1399984639
> time.hour.new.york;
[1] 8.621944 8.621944


回答4:

Lubridate doesn't handle time of day data, so Hadley recommends the hms package for this type of data. Something like this would work:

library(lubridate)
foo <- data.frame(start.time = parse_datetime(c("2012-02-06 15:47:00", 
                                 "2012-02-06 15:02:00",
                                 "2012-02-22 10:08:00")),
                  duration   = c(1,2,3))


foo<-foo %>% mutate(time_of_day=hms::hms(second(start.time),minute(start.time),hour(start.time)))

Watch out for 2 potential issues - 1) lubridate has a different function called hms and 2) hms::hms takes the arguments in the opposite order to that suggested by its name (so that just seconds may be supplied)