I have a dataset imported from a large group of .csv file. The date imports as a factor, but the data is in the following format
, 11, 4480, - 4570,NE, 12525,LB, , 10, , , , 0, 7:26A,26OC11,
, 11, 7090, - 7290,NE, 5250,LB, , 9, , , , 0, 7:28A,26OC11,
, 11, 5050, - 5065,NE, 50,LB, , 7, , , , 0, 7:31A,26OC11,
, 12, 5440, - 5530,NE, 13225,LB, , 6, , , , 0, 8:10A,26OC11,
, 12, 1020, - 1220,NE, 12020,LB, , 14, , , , 0, 8:12A,26OC11,
, 12, 50, - 25,NE, 12040,LB, , 15, , , , 0, 8:13A,26OC11,
4
For example would be 26 Oct 2011. How would I convert these factors to a date and the time to a time. I need to be able to use the time to generate a time interval between records.
Are you sure there are only two letters for the month? That doesn't make any sense!, how do you tell between JUNE and JULY?. If you can get three letters you could do something simple like this.
as.Date(as.character(mydata$mydate), format = '%d%b%y')
You could also use levels()[] instead of as.character(), but this should be simpler for now
Now if you also want the time. You can put it all together with this command
as.POSIXct(strptime(paste(as.character(mydata$mydate), paste(as.character(mydata$mytime), "M", sep = "")), "%d%b%y %I:%M%p"))
You have to be specially careful with the format. You can see a list of what %I, %d and so, means... here http://stat.ethz.ch/R-manual/R-devel/library/base/html/strptime.html
a <- c("26OC11", "01JA12")
month.abb.2 <- toupper(substr(month.abb, 0, 2))
for (i in seq_along(month.abb.2))
a <- sub(month.abb.2[i], month.abb[i], a)
as.Date(a, format="%d%b%y")
# [1] "2011-10-26" "2012-01-01"
However it would be interesting to see how Jul & Jun differ when you got only 2 characters for the month name. Looks unusual.
As mentioned , It is unsual to get 2 letters for a month, but you can add the missing letter using some regular expressions. Then you use dmy
from lubridate
to convert dates. Here I am using gsubfn
.
library(lubridate)
library(gsubfn)
dmy(gsubfn("OC|JA",list(OC="OCT",JA="JAN"), ## You can extend here for other months
c("26OC11","26JA12")))
[1] "2011-10-26 UTC" "2012-01-26 UTC"
This is how I ended up creating the date i needed
Day<-substring(Date,1,2)
Month<-substring(Date,3,4)
Year<-substring(Date,5,6)
Month<-replace(Month,Month=="AU",8)
Month<-replace(Month,Month=="JA",1)
Month<-replace(Month,Month=="FE",2)
Month<-replace(Month,Month=="MR",3)
Month<-replace(Month,Month=="AP",4)
Month<-replace(Month,Month=="MY",5)
Month<-replace(Month,Month=="JN",6)
Month<-replace(Month,Month=="JL",7)
Month<-replace(Month,Month=="SE",9)
Month<-replace(Month,Month=="OC",10)
Month<-replace(Month,Month=="NO",11)
Month<-replace(Month,Month=="DE",12)
Date2 <- as.Date( paste( Month , Day , Year, sep = "." ) , format = "%m.%d.%y" )
dataset$Day<-Day
dataset$Month<-Month
dataset$Year<-Year
dataset$Date2<-Date2
Weekday<-weekdays(Date2)
dataset$Weekday<-as.factor(Weekday)
Thanks for all the help