I'm trying to pull data from a PostgreSQL database and the results for a timestamp field are inconsistent. I'm not sure if I'm handling POSIXct results properly. Otherwise, I think I found a bug in the RPostgreSQL package. Here is the way to replicate the issue:
Suppose there is a table in a postgres database with one field (run this in PostgreSQL):
CREATE DATABASE mydb;
CREATE TABLE test_table
(
"DateTime" timestamp without time zone NOT NULL,
CONSTRAINT "pk_test_table" PRIMARY KEY ("DateTime")
)
WITH (
OIDS=FALSE
);
ALTER TABLE test_table
OWNER TO postgres;
And let’s say there are a few hundred records. I will populate them in R. Here is the code:
library(RPostgreSQL)
# Let's feed the table with some sequence of date/time values
date_values <- as.chron(seq(10000, 10500, 1/24))
format.chron <- function(z) {
sprintf("%04.0f-%02.0f-%02.0f %02.0f:%02.0f:00",
as.numeric(as.character(years(z))),
months(z),
as.numeric(as.character(days(z))),
as.numeric(as.character(hours(z))),
as.numeric(as.character(minutes(z))))
}
.generateInsertQuery <- function(date_values, field_name, table_name) {
insert_val <- paste(paste0("(", sQuote(format(date_values)), ")"), collapse=',')
qry <- paste("INSERT INTO", dQuote(table_name), paste0("(", dQuote(field_name), ")"), "VALUES", insert_val)
qry
}
drv <- dbDriver('PostgreSQL')
con <- dbConnect(drv, user='postgres', dbname='mydb')
qry <- .generateInsertQuery(date_values, "DateTime", "test_table")
dbSendQuery(con, qry)
If I try to get the values, the time component gets stripped out of the resulting data
res <- dbGetQuery(con, "SELECT * FROM test_table")
res[1:20,1]
The class of the result, however, is POSIXct
class(res[,1])
If the result is fetched one record at a time, the values with hour:min equal to 00:00 loose the time component:
rs <- dbSendQuery(con, "SELECT \"DateTime\" FROM test_table")
res_list <- list()
for(i in 1:100) res_list[i] <- fetch(rs,1)
res_list
As a workaround, I'm fetching the result 1 record at a time, fixing, and aggregating them into a data.frame. But this is very time-consuming, especially for large data sets. Any ideas of why this is happening and how to deal with this issue?
Thanks in advance!
RPostgreSQL's
dbWriteTable
with any posixct field will create database field of typetimestamp with timezone
always with tz+00
no matter what posixct timezone it will be. I believe more precise would be to createtimestamp without timezone
instead.Best solution for both
dbReadTable
anddbWriteTable
is to useSys.setenv(TZ = "UTC")
. In my opinion it is too deep dependency because many other processes in R session may require proper timezone setting.Much more specific and not to deep dependent is to define own
dbReadTable
anddbWriteTable
which wraps theDBI
version with appropriate preprocess/postprocess of posixct types. But still it is not an option if you are developing DBI-compliant code/package (not only postgres related).It would be great to have RPostgreSQL migrated to github for easier contribution.
First off, the RPostgreSQL project has a mailing list; I suggest you post there.
PostgreSQL has two datetime types: with and without timezone. As I recall, R only maps the latter. I did write some early regression tests for this (see the package source) but have not been that involved with the project of late. But I do recall that POSIXct maps back and forth to the PostgreSQL datetime type just fine.