I am using sqldf library to return a data frame with distinct values and also only the max of the date column. The data frame looks like this
+------+----------+--------+-----------------+
| NAME | val1 | val2 | DATE |
+------+----------+--------+-----------------+
| A | 23.7228 | 0.5829 | 11/19/2014 8:17 |
| A | 23.7228 | 0.5829 | 11/12/2014 8:16 |
+------+----------+--------+-----------------+
When I try to run the below code to get the distinct values with max date
df <- sqldf("SELECT DISTINCT NAME, val1, val2, MAX(DATE) FROM Table")
I get this as the output.
+------+----------+--------+-----------------+
| NAME | val1 | val2 | MAX(DATE) |
+------+----------+--------+-----------------+
| A | 23.7228 | 0.5829 | 1416406625 |
+------+----------+--------+-----------------+
Please let me know how do I convert the last column, which is an integer to get back my datetime format.
Next time please provide your input in reproducible form. I have done it this time for you below. Also the SQL code in the question has an SQLite syntax error which I have fixed below.
The easiest way to get this right is to use the name DATE
for the output column in which case sqldf will figure that its of the same type as the DATE
input column. SQLite has no date and time types so there is no way for sqldf to know that what is being returned is a datetime when using sqldf with SQLite. sqldf uses some heuristics to guess such as the one just discussed.
library(sqldf)
Lines <- "NAME,val1,val2,DATE
A,23.7228,0.5829,11/19/2014 8:17
A,23.7228,0.5829,11/12/2014 8:16"
Table <- read.csv(text = Lines, as.is = TRUE)
Table$DATE <- as.POSIXct(Table$DATE, format = "%m/%d/%Y %H:%M")
sqldf("SELECT DISTINCT NAME, val1, val2, MAX(DATE) DATE FROM 'Table'")
giving:
NAME val1 val2 DATE
1 A 23.7228 0.5829 2014-11-19 08:17:00
If we used H2 with sqldf then we would not have these problems since H2 does support date and time types so sqldf does not have to guess. Also the syntax of your SQL query works as is in H2. Using the Table
data.frame shown above:
library(RH2)
library(sqldf)
sqldf("SELECT DISTINCT NAME, val1, val2, MAX(DATE) DATE FROM Table")
gives:
NAME val1 val2 MAX(DATE)
1 A 23.7228 0.5829 2014-11-19 08:17:00
Try:
> as.POSIXct(1416406625, origin = "1970-01-01", tz = "GMT")
[1] "2014-11-19 14:17:05 GMT"
You may need to change the timezone (tz
) to get the right time.