applying strptime to local data frame

2019-08-05 01:29发布

问题:

I think I have a problem related to \ that I fail to handle.

Here is an excerpt from a DateTime column of a data.frame I have read with read_csv:

earthquakes[1:20,1]
Source: local data frame [20 x 1]
                 DateTime
                    (chr)
1  1964/01/01 12:21:55.40
2  1964/01/01 14:16:27.60
3  1964/01/01 14:18:53.90
4  1964/01/01 15:49:47.90
5  1964/01/01 17:26:43.50

My goal is to extract the years here. Manully doing

> format(strptime(c("1964/01/01 12:21:55.40","1964/01/01 12:21:55.40","1964/01/01 14:16:27.60"), "%Y/%m/%d %H:%M:%OS"), "%Y")
[1] "1964" "1964" "1964"

works as intended. However,

> strptime(earthquakes[1:5,1], "%Y/%m/%d %H:%M:%OS")
DateTime 
      NA 

My hunch is that the problem is related to

as.character(earthquakes[1:5,1])
[1] "c(\"1964/01/01 12:21:55.40\", \"1964/01/01 14:16:27.60\", \"1964/01/01 14:18:53.90\", \"1964/01/01 15:49:47.90\", \"1964/01/01 17:26:43.50\")"

So, that the column in the data frame does also contain the " via the escape \". But I do not know how to handle this from here.

Given that the years are the first four entries, it would also seem OK (but less elegant, imho) to do

substr(earthquakes[1:5,1],1,4)

but that then accordingly just gives

[1] "c(\"1"

Clearly, I could do

substr(earthquakes[1:5,1],4,7)

but that would only work for the first row.

回答1:

Apparently you have a dplyr::tbl_df and by default in those, [ never simplifies a single column to an atomic vector (in contrast to [ applied to a base R data.frame). Hence, you could use either [[ or $ to extract the column which will then be simplified to atomic vector.

Some examples:

data(iris)
library(dplyr)
x <- tbl_df(iris)
x[1:5, 1]
#Source: local data frame [5 x 1]
#
#  Sepal.Length
#         (dbl)
#1          5.1
#2          4.9
#3          4.7
#4          4.6
#5          5.0
iris[1:5, 1]
#[1] 5.1 4.9 4.7 4.6 5.0
x[[1]][1:5]
#[1] 5.1 4.9 4.7 4.6 5.0
x$Sepal.Length[1:5]
#[1] 5.1 4.9 4.7 4.6 5.0