Reading in dates from Excel into R

2019-08-06 01:39发布

问题:

I have multiple csv files which I need to read into R. The first column of the files contain dates and times, which I am converting into POSIXlt when I have loaded the data frame. Each of my csv files have the dates and times formatted in the same way in Excel, however, some files are read in differently.

For example,

My file looks like this once imported:

  date                value
1 2011/01/01 00:00:00 39
2 2011/01/01 00:15:00 35
3 2011/01/01 00:30:00 38
4 2011/01/01 00:45:00 39
5 2011/01/01 01:00:00 38
6 2011/01/01 01:15:00 38

Therefore, the code I use to amend the format is:

DATA$date <- as.POSIXlt(DATA$date,format="%Y/%m/%d %H:%M:%S")

However, some files are being read in as:

  date             value
1 01/01/2011 00:00 39
2 01/01/2011 00:15 35
3 01/01/2011 00:30 38
4 01/01/2011 00:45 39
5 01/01/2011 01:00 38
6 01/01/2011 01:15 38

Which means my format section of my code does not work and gives an error. Therefore, is there anyway to automatically detect which format the date column is in? Or, is there a way of knowing how it will be read, since the format of the column in Excel is the same on both.

回答1:

When using the wrong formatting string for your date input, I seem to get NA values. If this be the case, you solve this problem in two steps. First, format the dates from Excel assuming that you have all three of hours, minutes, and seconds:

date.original <- DATA$date
DATA$date <- as.POSIXlt(DATA$date,format="%Y/%m/%d %H:%M:%S")

This should leave NA values in the date column for those dates which be missing seconds. Then you can try this:

DATA$date[is.na(DATA$date)] <- as.POSIXlt(date.original, format="%Y/%m/%d %H:%M")

This should cover the remaining data.

Data

DATA <- data.frame(date=c('2011/01/01 00:00:00', '2011/01/01 00:15',
                          '2011/01/01 00:30:00', '2011/01/01 00:45'),
                   value=c(39, 35, 38, 39))