How can I convert a string to an array of characters, for example
"abcd" -> ["a","b","c","d"]
I know the split methd:
SELECT split("abcd","");
#["a","b","c","d",""]
is a bug for the last whitespace? or any other ideas?
How can I convert a string to an array of characters, for example
"abcd" -> ["a","b","c","d"]
I know the split methd:
SELECT split("abcd","");
#["a","b","c","d",""]
is a bug for the last whitespace? or any other ideas?
This is not actually a bug. Hive split function simply calls the underlying Java String#split(String regexp, int limit)
method with limit
parameter set to -1
, which causes trailing whitespace(s) to be returned.
I'm not going to dig into implementation details on why it's happening since there is already a brilliant answer that describes the issue. Note that str.split("", -1)
will return different results depending on the version of Java you use.
A few alternatives:
"(?!\A|\z)"
as a separator regexp, e.g. split("abcd", "(?!\\A|\\z)")
. This will make the regexp matcher skip zero-width matches at the start and at the end positions of the string.String#toCharArray()
, or accepts limit
as an argument of the UDF so you can use it as: SPLIT("", 0)
I don't know if it is a bug or that's how it works. As an alternative, you could use explode
and collect_list
to exclude blanks from a where clause
SELECT collect_list(l)
FROM ( SELECT EXPLODE(split('abcd','') ) as l ) t
WHERE t.l <> '';