I have a string column description
in a hive table which may contain tab characters '\t'
, these characters are however messing some views when connecting hive to an external application.
is there a simple way to get rid of all tab characters in that column?. I could run a simple python program to do it, but I want to find a better solution for this.
相关问题
- Spark on Yarn Container Failure
- enableHiveSupport throws error in java spark code
- spark select and add columns with alias
- Unable to generate jar file for Hadoop
-
hive: cast array
> into map
相关文章
- 在hive sql里怎么把"2020-10-26T08:41:19.000Z"这个字符串转换成年月日
- Java写文件至HDFS失败
- mapreduce count example
- SQL query Frequency Distribution matrix for produc
- Cloudera 5.6: Parquet does not support date. See H
- Could you give me any clue Why 'Cannot call me
- converting to timestamp with time zone failed on A
- Hive error: parseexception missing EOF
Custom SerDe might be a way to do it. Or you could use some kind of mediation process with regex_replace:
regexp_replace
UDF performs my task. Below is the definition and usage from apache Wiki.This returns the string resulting from replacing all substrings in
INITIAL_STRING
that match the java regular expression syntax defined inPATTERN
with instances ofREPLACEMENT
,e.g.:
regexp_replace("foobar", "oo|ar", "")
returnsfb
There is no OOTB feature at this moment which allows this. One way to achieve that could be to write a custom InputFormat and/or SerDe that will do this for you. You might this JIRA useful : https://issues.apache.org/jira/browse/HIVE-3751. (not related directly to your problem though).
You can also use translate(). If the third argument is too short, the corresponding characters from the second argument are deleted. Unlike regexp_replace() you don't need to worry about special characters. Source code.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions