I am looking for a in-built String split function in Hive?
E.g. if String is
A|B|C|D|E
then I want to have a function like
array split(string input, char delimiter)
so that I get back [A,B,C,D,E].
Does such a in-built split function exist in Hive.
I can only see regexp_extract and regexp_replace. I would love to see a indexOf() and split()
string functions.
Thanks
Ajay
There does exist a split function based on regular expressions. It's not listed in the tutorial, but it is listed on the language manual on the wiki:
split(string str, string pat)
Split str around pat (pat is a regular expression)
In your case, the delimiter "|
" has a special meaning as a regular expression, so it should be referred to as "\\|
".
Another interesting usecase for split in Hive is when, for example, a column ipname
in the table has a value "abc11.def.ghft.com" and you want to pull "abc11" out:
SELECT split(ipname,'[\.]')[0] FROM tablename;
Just a clarification on the answer given by Bkkbrad.
I tried this suggestion and it did not work for me.
For example,
split('aa|bb','\\|')
produced:
["","a","a","|","b","b",""]
But,
split('aa|bb','[|]')
produced the desired result:
["aa","bb"]
Including the metacharacter '|' inside the square brackets causes it to be interpreted literally, as intended, rather than as a metacharacter.
For elaboration of this behaviour of regexp, see: http://www.regular-expressions.info/charclass.html