I do not understand how the "ymd" function from the library "lubridate" works in R. I am trying to build a feature which converts the date correctly without having to specify the format. I am checking for the minimum number of NA's occurring as a result of dmy(), mdy() and ymd() functions.
So ymd() is giving NA sometimes and sometimes not for the same Date value. Are there any other functions or packages in R, which will help me get over this problem.
> data$DTTM[1:5]
[1] "4-Sep-06" "27-Oct-06" "8-Jan-07" "28-Jan-07" "5-Jan-07"
> ymd(data$DTTM[1])
[1] NA
Warning message:
All formats failed to parse. No formats found.
> ymd(data$DTTM[2])
[1] "2027-10-06 UTC"
> ymd(data$DTTM[3])
[1] NA
Warning message:
All formats failed to parse. No formats found.
> ymd(data$DTTM[4])
[1] "2028-01-07 UTC"
> ymd(data$DTTM[5])
[1] NA
Warning message:
All formats failed to parse. No formats found.
>
> ymd(data$DTTM[1:5])
[1] "2004-09-06 UTC" "2027-10-06 UTC" "2008-01-07 UTC" "2028-01-07 UTC"
[5] "2005-01-07 UTC"
Thanks
@user1317221_G has already pointed out that you dates are in day-month-year format, which suggests that you should use dmy
instead of ymd
. Furthermore, because your month is in %b
format ("Abbreviated month name in the current locale"; see ?strptime
), your problem may have something to do with your locale
. The month names you have seem to be English, which may differ from how they are spelled in the locale you are currently using.
Let's see what happens when I try dmy
on the dates in my locale
:
date_english <- c("4-Sep-06", "27-Oct-06", "8-Jan-07", "28-Jan-07", "5-Jan-07")
dmy(date_english)
# [1] "2006-09-04 UTC" NA "2007-01-08 UTC" "2007-01-28 UTC" "2007-01-05 UTC"
# Warning message:
# 1 failed to parse.
"27-Oct-06" failed to parse. Let's check my time locale
:
Sys.getlocale("LC_TIME")
# [1] "Norwegian (Bokmål)_Norway.1252"
dmy does not recognize "oct" as a valid %b
month in my locale.
One way to deal with this issue would be to change "oct" to the corresponding Norwegian abbreviation, "okt":
date_nor <- c("4-Sep-06", "27-Okt-06", "8-Jan-07", "28-Jan-07", "5-Jan-07" )
dmy(date_nor)
# [1] "2006-09-04 UTC" "2006-10-27 UTC" "2007-01-08 UTC" "2007-01-28 UTC" "2007-01-05 UTC"
Another possibility is to use the original dates (i.e. in their original 'locale'), and set the locale
argument in dmy
. Exactly how this is done is platform dependent (see ?locales
. Here is how I would do it in Windows:
dmy(date_english, locale = "English")
[1] "2006-09-04 UTC" "2006-10-27 UTC" "2007-01-08 UTC" "2007-01-28 UTC" "2007-01-05 UTC"
Using the guess_formats function in the lubridate package would be the closest to what you are after.
library(lubridate)
x <- c("4-Sep-06", "27-Oct-06","8-Jan-07" ,"28-Jan-07","5-Jan-2007")
format <- guess_formats(x, c("mdY", "BdY", "Bdy", "bdY", "bdy", "mdy", "dby"))
strptime(x, format)
HTH
from the documentation on ymd
on page 70
As long as the order of formats is
correct, these functions will parse dates correctly even when the input vectors contain differently
formatted dates
ymd()
expects year-month-day, you have day-month-year
x <- c("2009-01-01", "2009-01-02", "2009-01-03")
ymd(x)
maybe you need something like
y <- c("4-Sep-06", "27-Oct-06", "8-Jan-07", "28-Jan-07", "5-Jan-07" )
as.POSIXct(y, format = "%d-%b-%y")
PS the reason I think you get NA
s for some is that you only have a single digit for year and ymd
doesn't know what to do with that, but it works when you have two digits for year e.g. "27-Oct-06" "28-Jan-07"
but fails for "5-Jan-07"
etc