I have the following redshift table:
DROP TABLE IF EXISTS "logs";
CREATE TABLE "logs" (
"source" varchar(255) DEFAULT NULL,
"method" varchar(255) DEFAULT NULL,
"path" varchar(1023) DEFAULT NULL,
"format" varchar(255) DEFAULT NULL,
"controller" varchar(255) DEFAULT NULL,
"action" varchar(255) DEFAULT NULL,
"status" integer DEFAULT NULL,
"duration" float DEFAULT NULL,
"view" float DEFAULT NULL,
"db" float DEFAULT NULL,
"ip" varchar(255)DEFAULT NULL,
"route" varchar(255) DEFAULT NULL,
"request_id" varchar(255) DEFAULT NULL,
"user" INTEGER DEFAULT NULL,
"school" varchar(255) DEFAULT NULL,
"timestamp" datetime DEFAULT NULL
);
So far so good.
The only problem is that the datetime in my source file on s3 is the following: "2015-01-13T11:13:08.869941+00:00"
. This looks like rfc822 (or rfc3339 or rfc2822).
A few timeformats are supported by the COPY command (see doc: http://docs.aws.amazon.com/redshift/latest/dg/r_DATEFORMAT_and_TIMEFORMAT_strings.html). But not my rfc822 format.
I've tried the following:
TRUNCATE logs;
COPY "logs" FROM 's3://path/to/logstash_logfile.gz'
CREDENTIALS 'aws_access_key_id=THE_KEY;aws_secret_access_key=THE_SECRET'
TIMEFORMAT AS 'MM-DD-YYYYTHH:MI:SS'
JSON 's3://path/to/jsonpath.json' GZIP;
But I'm getting:
SELECT * FROM stl_load_errors;
Invalid timestamp format or value [MM-DD-YYYYTHH:MI:SS]
Use
TIMEFORMAT 'auto'
instead.It's able to import
2015-01-13T11:13:08.869941+00:00
as
2015-01-13 11:13:08.869941
.I assume this method just discards the timezone information, but at least you can get the data in this way.
If you have various timezones in the data, maybe you need to do some preprocessing to convert everything into UTC, for example.
Unfortunately I think the
COPY
with a provided time format is rather strict and doesn't support timezone parts.We had exactly the same issue and found a workaround:
Or, alternatively:
You can quickly test with this:
Which gives these results:
Tested using Redshift 1.0.2610 Note that your SQL client or driver may do some timezone conversions that may be misleading, so the best is to test this using UTC as the time zone of your computer / driver / SQL client. Also, some SQL clients remove the sub-second part of the timestamps.