I have a PIG script that
- Loads and transforms the data from a csv
- Replaces some characters
Calls a java program (JAR) to convert the date-time in csv from 06/02/2015 18:52 to 2015-6-2 18:52 (mm/DD/yyyy to yyyy-MM-dd)
REGISTER /home/cloudera/DateTime.jar;
A = Load '/user/cloudera/Data.csv' using PigStorage(',') as (ac,datetime,amt,trace);
B = FOREACH A GENERATE ac, REPLACE(datetime, '\\/','-') as newdate,REPLACE(amt,'-','') as newamt,trace;
C = FOREACH B GENERATE ac,Converter.DateTime(newdate) as ConvDate,ConvAmt,trace;
Store C into '/user/cloudera/Output/' using PigStorage('\t');
Sample Input -- 21467245 06/02/2015 18:52 -9.59 518
Sample Output -- 21467245 2015-6-2 18:52 9.59 518
I am loading the output into hive, other fields seem fine during import, but the date-time field results null if loaded as timestamp and is intact when its string.
Where is this going wrong?
Am using Cloudera CDH 5
From the hive docs:
So you need to either change your
Converter
to output this format, or use a UDF --- or just keep them as strings, which is what I usually do !