I am trying to create the hive table so that the hdfs file system have UTF-8 Format, the problem is the query is giving error, not sure what I am doing wrong.
DROP TABLE IF EXISTS output_2057565014;
CREATE TABLE temp.output_2057565014
ROW FORMAT DELIMITED
FIELDS TERMINATED BY 'ธ'
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY '$'
with serdeproperties('serialization.encoding'='UTF-8')
LOCATION '/tmp/test-2057565014'
AS
SELECT * from temp.abc
"the query is giving error" > yeah, but what kind?? Maybe reading that error message would help. Without it, it's just guesswork.
So, let's guess.
ROW FORMAT DELIMITED
clause implicitly assumes that delimiter characters are single ASCII-7 characters, either defined explicitly (when printable) or by their octal code.Hence
FIELDS TERMINATED BY 'ธ'
is not valid.You can try different workarounds -- changing the delimiter in the upstream file creation process; changing the delimiter in situ before loading to HDFS (e.g. with a good old
sed
command); trying a hard-coded column mapping with RegExSerde (cf. Language Manual DLL / CREATE TABLE under "Row Formats & SerDe")...