可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Wish fastPOSIXct
works - but not working in this case.
Here is my time data (which does not have dates) - and I need to get the hours-part from them.
times <- c("9:46","11:06", "14:17", "19:53", "0:03", "3:56")
Here is the wrong output from fastPOSIXct
:
fastPOSIXct(times, "GMT")
[1] "1970-01-01 00:00:00 GMT" "1970-01-01 00:00:00 GMT"
[3] "1970-01-01 00:00:00 GMT" "1970-01-01 00:00:00 GMT"
[5] "1970-01-01 00:00:00 GMT" "1970-01-01 00:00:00 GMT"
It does not recognize the times without the presence of dates correctly.
The hour
method from data.table
with as.ITime
solves the purpose, but looks like slow on large times arrays.
library(data.table)
hour(as.ITime(times))
# [1] 9 11 14 19 0 3
Wondering if there is some faster way (just like fastPOSIXct
, but works without the need for date).
fastPOSIXct
really works like snap, but just wrong.
回答1:
You may also try substr
: as.integer(substr(vals, start = 1, stop = nchar(vals) - 3))
In a benchmark on a vector with 10e6 elements, stringi::stri_sub
is fastest, and substr
number two.
vals <- sample(c("9:46", "11:06", "14:17", "19:53", "0:03", "3:56"), 1e6, replace = TRUE)
fun_substr <- function(vals) as.integer(substr(vals, start = 1, stop = nchar(vals) - 3))
grab.hrs <- function(vals) as.integer(sub(pattern = ":.*", replacement = "", x = vals))
fun_strtrim <- function(vals) as.integer(strtrim(vals, nchar(vals) - 3))
library(chron)
fun_chron <- function(vals) hours(times(paste0(vals, ":00")))
fun_lt <- function(vals) as.POSIXlt(vals, format="%H:%M")$hour
library(stringi)
fun_stri_sub <- function(vals) as.integer(stri_sub(vals, from = 1, to = -4))
library(microbenchmark)
microbenchmark(fun_substr(vals),
fun_stri_sub(vals),
grab.hrs(vals),
fun_strtrim(vals),
fun_lt(vals),
fun_chron(vals),
unit = "relative", times = 5)
# Unit: relative
# expr min lq mean median uq max neval
# fun_substr(vals) 2.186714 1.902074 2.015082 1.968542 1.945007 2.090236 5
# fun_stri_sub(vals) 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 5
# grab.hrs(vals) 2.656630 2.397918 2.687133 2.426223 2.446902 3.263962 5
# fun_strtrim(vals) 31.177869 27.601380 26.009818 27.423562 17.902507 29.426989 5
# fun_lt(vals) 47.296929 41.122287 42.266556 40.647465 30.539030 52.710992 5
# fun_chron(vals) 5.594931 5.159192 5.961775 7.746242 5.286944 6.189742 5
回答2:
You can also do this with the times
function from the chron
package:
library(chron)
vals <- c("9:46","11:06", "14:17", "19:53", "0:03", "3:56")
dat <- times(paste0(vals, ":00"))
hours(dat)
# [1] 9 11 14 19 0 3
If speed is important, you could extract the hours more quickly with a string manipulation:
grab.hrs <- function(vals) as.numeric(sub(pattern = ":.*", replacement = "",
x = vals))
grab.hrs(vals)
# [1] 9 11 14 19 0 3
times
and as.POSIXlt
(from @tonytonov's solution) seem to be somewhat quicker than as.ITime
, and the string manipulation is much quicker:
library(microbenchmark)
library(data.table)
microbenchmark(hours(times(paste0(vals, ":00"))),
hours(as.ITime(vals)),
as.POSIXlt(vals, format="%H:%M")$hour,
grab.hrs(vals))
# Unit: microseconds
# expr min lq median uq max neval
# hours(times(paste0(vals, ":00"))) 174.544 184.9485 193.5630 204.6950 5047.195 100
# hours(as.ITime(vals)) 665.833 678.8790 705.6445 735.0525 3030.574 100
# as.POSIXlt(vals, format = "%H:%M")$hour 158.264 169.8880 171.9670 180.1800 301.840 100
# grab.hrs(vals) 10.637 15.4540 20.0995 21.1285 55.985 100
回答3:
Is this an option? This is a base
solution.
as.POSIXlt(times, format="%H:%M")$hour
#[1] 9 11 14 19 0 3
回答4:
To really speed up, you can also just trim off the lsat 3 chars from the strings. It's faster than using regex
.
as.numeric(strtrim(times, nchar(times) - 3))
## [1] 9 11 14 19 0 3
Here are benchmark results
Unit: microseconds
expr min lq median uq max neval
hours(times(paste0(vals, ":00"))) 200.670 212.9720 218.7960 221.8420 352.370 100
hours(as.ITime(vals)) 453.174 478.9680 487.3805 496.7885 1607.321 100
as.POSIXlt(vals, format = "%H:%M")$hour 41.278 46.4945 49.7310 51.3115 56.453 100
grab.hrs(vals) 12.352 15.4295 18.3850 20.3390 31.349 100
as.numeric(gsub("(.*):.*", "\\\\1", times)) 14.528 17.7225 20.6390 23.4530 53.683 100
as.numeric(strtrim(times, nchar(times) - 3)) 9.621 11.6605 12.7435 13.2520 147.446 100
回答5:
You can use the stri_sub
function from the stringi package and trim the last 3 characters like this:
require(stringi)
times <- c("9:46", "11:06", "14:17", "19:53", "0:03", "3:56")
stri_sub(times, from = 1, to = -4)
## [1] "9" "11" "14" "19" "0" "3"
If from
and/or to
parameters are negative then counting is done from the end of a string. So in this example the substring is from the first character to the fourth one but counting from the end of string.
回答6:
str_sub
or substr
will always be handy in this situation. For example, the following code is for substr
:
times <- c("9:46", "11:06", "14:17", "19:53", "0:03", "3:56")
times1 <- str_pad(times,5,pad='0')
times1
## [1]"09:46", "11:06", "14:17", "19:53", "00:03", "03:56"
Substr(times1,1,2)
## [1] "09" "11" "14" "19" "00" "03"