Hive not detecting timestamp format

2019-09-07 21:36发布

问题:

I have a PIG script that

  • Loads and transforms the data from a csv
  • Replaces some characters

Calls a java program (JAR) to convert the date-time in csv from 06/02/2015 18:52 to 2015-6-2 18:52 (mm/DD/yyyy to yyyy-MM-dd)

REGISTER /home/cloudera/DateTime.jar;

A = Load '/user/cloudera/Data.csv' using PigStorage(',') as (ac,datetime,amt,trace);

B = FOREACH A GENERATE ac, REPLACE(datetime, '\\/','-') as newdate,REPLACE(amt,'-','') as newamt,trace;

C = FOREACH B GENERATE ac,Converter.DateTime(newdate) as ConvDate,ConvAmt,trace;

Store C into '/user/cloudera/Output/' using PigStorage('\t');

Sample Input -- 21467245 06/02/2015 18:52 -9.59 518

Sample Output -- 21467245 2015-6-2 18:52 9.59 518

I am loading the output into hive, other fields seem fine during import, but the date-time field results null if loaded as timestamp and is intact when its string.

Where is this going wrong?

Am using Cloudera CDH 5

回答1:

From the hive docs:

Timestamps in text files have to use the format yyyy-mm-dd hh:mm:ss[.f...]. If they are in another format declare them as the appropriate type (INT, FLOAT, STRING, etc.) and use a UDF to convert them to timestamps.

So you need to either change your Converter to output this format, or use a UDF --- or just keep them as strings, which is what I usually do !