Convert date string that contains time zone to POS

2019-08-25 07:50发布

I have a vector with dates in this format (example of the first 6 rows):

 Dates<-c(
   "Sun Oct 04 20:33:05 EEST 2015",
   "Sun Oct 04 20:49:23 EEST 2015",
   "Sun Oct 04 21:05:25 EEST 2015",
   "Mon Sep 28 10:02:38 IDT 2015", 
   "Mon Sep 28 10:17:50 IDT 2015",
   "Mon Sep 28 10:39:48 IDT 2015")

I tried to read this variable Dates to R using as.Date() function:

as.Date(Dates,format = "%a %b %d %H:%M:%S %Z %Y")

but the process failed as %Z parameter is not supported in the input. The time zones differ throughout the vector. What are the alternatives to read data correctly with respect to the time zone?

1条回答
2楼-- · 2019-08-25 08:37

This solution requires some simplifying assumptions. Assuming you have many elements in your vector, the best approach is to use a database of timezone offsets to figure out what each time is (in a chosen locale, such as GMT). The timezone data I used is the timezone.csv file from https://timezonedb.com/download

#Create sample data
Dates<-c(
  "Sun Oct 04 20:33:05 EEST 2015",
  "Sun Oct 04 20:49:23 EEST 2015",
  "Sun Oct 04 21:05:25 EEST 2015",
  "Mon Sep 28 10:02:38 IDT 2015", 
  "Mon Sep 28 10:17:50 IDT 2015",
  "Mon Sep 28 10:39:48 IDT 2015")

#separate timezone string from date/time info
no_timezone <- paste(substr(Dates, 1, 19), substr(Dates, nchar(Dates)-3, nchar(Dates)))
timezone <- as.data.frame(substr(Dates, 21, nchar(Dates)-5))
colnames(timezone) <- "abbreviation"

#reference timezone database to get offsets from GMT
timezone_db <- read.csv(file="timezonedb/timezone.csv", header=FALSE)
colnames(timezone_db) <- c("zone_id", "abbreviation", "time_start", "gmt_offset", "dst")
timezone_db <- timezone_db[timezone_db$dst == 0, ]
timezone_db <- unique(timezone_db[,c("abbreviation", "gmt_offset")])
timezone_db <- timezone_db[!duplicated(timezone_db$abbreviation), ]

#adjust all time to GMT
time_adjust <- merge(timezone, timezone_db, all.x=TRUE, by="abbreviation")
gmt_time <- strptime(no_timezone, format = "%a %b %d %H:%M:%S %Y", tz="GMT")

#final data
Dates_final <- gmt_time - time_adjust$gmt_offset

Depending on how exact your data needs to be, be careful to adjust for daylight savings if necessary. Also, I don't know much about time zones, but I noticed that for some reason, certain time zones can have multiple offsets. In the original database, CLT (Chilean time) can vary from 3-5 hours from GMT, for some reason.

For this exercise, my code simply takes the first of each time zone's offset from the database and assumes no daylight savings day. This may be sufficient if your work doesn't require such precision, but you should QA and validate your work either way.

Also, note that this solution should be robust for date changes as well. For example, if the time is adjusted from 1am to 11pm, then the date should revert back one day.

查看更多
登录 后发表回答